Re: reading files with accents in the filename from PHP [message #183562 is a reply to message #183121] |
Thu, 31 October 2013 20:34 |
Christoph Michael Bec
Messages: 207 Registered: June 2013
Karma:
|
Senior Member |
|
|
Thomas 'PointedEars' Lahn wrote:
> It would be interesting to see how this works with NTFS with characters
> outside the specified range whose Unicode code point is above U+007F. For
> example, U+0100 (“Ā”; LATIN CAPITAL LETTER A WITH MACRON) would be encoded
> in one UTF-16 code unit, 0100, which would be encoded in UTF-16LE as 00 10.
> Just stripping the zero-octets would result in <LF> (whose code point is
> 0x10 which is 020). Just reading the octet with the lower address would
> result in 0x00 which terminates a C string. If the result is _not_
> something equivalent to 't\020st' or 't', something else is happening.
U+0010 denotes <DLE>, <LF> is U+000A[1]. Anyway, I created a file
"tĀst" and did:
>>> glob('*')
Array
(
)
Apparently, something else is happening.
FWIW, I tried the following, too:
>>> touch("test")
true
>>> touch("t\x00\x10st")
Warning: touch() expects parameter 1 to be a valid path, string given
in ...
>>> touch("t\x10\x10st")
Warning: touch(): Unable to create file t►►st because Invalid
argument in ...
>>> file_exists("tAAst")
false
>>> touch("t\x41\x41st")
true
>>> file_exists("tAAst")
true
[1] <http://www.unicode.org/charts/PDF/U0000.pdf>
--
Christoph M. Becker
|
|
|