FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » reading files with accents in the filename from PHP
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: reading files with accents in the filename from PHP [message #183122 is a reply to message #183110] Thu, 10 October 2013 01:07 Go to previous messageGo to previous message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma:
Senior Member
Erwin Moller wrote:

> On 10/9/2013 1:18 PM, The Natural Philosopher wrote:
>> On 09/10/13 12:16, Erwin Moller wrote:
>>> On 10/9/2013 12:58 PM, Thomas Mlynarczyk wrote:
>>>> Erwin Moller schrieb:
>>>> > How can PHP open files on the local filesystem that contain certain
>>>> > characters, like umlauts, accents, etc?
>>>>
>>>> $path = __DIR__ . '\Eugène.txt';
>>>> var_dump( PHP_VERSION, file_exists( $path ) );
>>>
>>> That didn't help since my files are not stored in working dir.
>>>
>>>> Works on my Windows XP, PHP 5.4.8, *if* the PHP file is stored in ANSI
>>>> (="Windows") encoding. Doesn't work if stored in UTF8.
>>>
>>> Strange situation.
>>> I changed my PHP-files encoding to UTF-8, but the problem still
>>> occurred.
>>> […]
>>> I added a replace:
>>> $path = str_replace("è","\xE8",$path);
>>>
>>> and now it IS readable from PHP.
> […]
> I don't have a good feeling about my "fix".

And you should not.

> It worked, but I don't know exactly what is going on.

Exactly.

> I actually hoped PHP would handle such things 'the right way', whatever
> that might be. ;-)

PHP has no built-in support for character encodings (but it has extensions
for that). Your strings are read octet-wise from lowest to highest address
as they are, that is, as the *editor* encoded the characters between the
string delimiters. If you write “"è"” in an UTF-8 encoded source file, the
character between the delimiters will be encoded C3 A8. If you write the
*same* character in a Windows-1252-encoded source file, it will be encoded
E8.

If your filesystem is FAT32, it will probably expect 8A if its locale is
English (IBM437) or Central European (IBM850), for example. If your
filesystem is NTFS, it will expect E8 00 (UTF-16_LE_; my mistake); if you
omit the zero octet it *might* work, but it does not work reliably.

> Now I wonder what happens if my code happens to run on some *nix OS.

The operating system is not the issue; the filesystem is. However, usually
Linux will run on ext2 to ext4, where AFAIK any character encoding can be
used. So there is a good chance that your code will break there.

> Ideally my PHP code is OS agnostic.

In that case you will probably have to detect the filesystem, and its
encoding, and use the encoding that is expected by the filesystem. Or
prevent such filenames from occurring in the first place.

I suggest to encode PHP source files with UTF-8 _without BOM_. If you write
non-ASCII characters, you know what the encoding is, and you have a greater
character set so that fewer characters need to be escaped.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: PDO - Cannot retrieve warnings with emulated prepares disabled
Next Topic: Secure website
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu Nov 28 12:03:36 GMT 2024

Total time taken to generate the page: 0.04411 seconds