Re: Processing accented characters submitted from forms [message #184503 is a reply to message #184500] |
Fri, 03 January 2014 16:08 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma:
|
Senior Member |
|
|
Thomas 'Ingrid' Lahn wrote:
> JohnT wrote:
>> On Fri, 03 Jan 2014 13:53:04 +0100, Thomas 'PointedEars' Lahn wrote:
>>> JohnT wrote:
>>>> [UTF-8] is the PHP 5 default,
>>>
>>> How did you get this idea?
>>
>> http://uk1.php.net/manual/en/function.htmlentities.php
>>
>> says:
>> Like htmlspecialchars(), htmlentities() takes an optional third
>> argument encoding which defines encoding used in conversion. If omitted,
>> the default value for this argument is ISO-8859-1 in versions of PHP
>> prior to 5.4.0, and UTF-8 from PHP 5.4.0 onwards.
>
> […]
> That said, htmlentities() is insufficient to represent arbitrary Unicode
> characters, encoded with UTF-8 server-side, in an HTML document if the
> document encoding is not UTF-8; you would have to use htmlspecialchars()
> which has the same default parameter value since PHP 5.4.0.
>
> <http://php.net/htmlspecialchars>
Actually, it is worse. In such a document, to refer to even those Unicode
characters for which there is *not* a character entity reference in HTML,
you have to use mb_encode_numericentity():
$ php -r 'echo mb_encode_numericentity("∎", array(0x0, 0x10000, 0, 0xfffff),
"UTF-8") . PHP_EOL;'
∎
$ locale
LANG=de_CH.UTF-8
LANGUAGE=
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=
----------
<http://php.net/mb_encode_numericentity>
None of this is necessary if you use UTF-8 throughout.
PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
|
|
|