FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Rejecting Certain Non-ASCII Characters
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Rejecting Certain Non-ASCII Characters [message #181166 is a reply to message #181165] Fri, 19 April 2013 21:08 Go to previous messageGo to previous message
Christoph Becker is currently offline  Christoph Becker
Messages: 91
Registered: June 2012
Karma:
Member
Thomas 'PointedEars' Lahn wrote:
> Christoph Becker wrote:
>
>> Jim Higgins wrote:
>>> I have a problem with people entering a slashed zero vs a standard
>>> ASCII zero into HTML forms intended to store data in a MySQL database.
>>
>> Is it really a slashed zero (U+0030 U+0338) they're entering,
>
> Could also be U+0030 U+337 or any other (allowed composition) of the at
> least 65536 Unicode characters that look(s) similar. For example, it could
> be U+2205 EMPTY SET.

Which I was referring to in a abbreviated form in the rest of the
sentence. :)

>> or do they enter some similar looking character such as the Danish Ø?
>
> That character, U+00D8 LATIN CAPITAL LETTER O WITH STROKE *and* its
> lowercase counterpart, U+00F8 LATIN SMALL LETTER O WITH STROKE, are present
> at least in Danish, Norwegian, and Faroese; They are also used in the
> International Phonetic Alphabet (IPA).
>
>> In the former case you can simply replace the slashed zero with a standard
>> zero. Assuming UTF-8 encoding:
>>
>> $input = str_replace('\xCC\xB8', '', $input);
>
> You do not consider that, ambiguity aside, even in UTF-8 there are *several*
> ways to produce Unicode characters. See: Unicode Normalization Forms.

If I'm not mistaken here, this str_replace() will work for NFD and NFC,
but indeed I had not considered NFKD and NFKC. Thanks for pointing this
out.

>>> Is there a simple way in PHP to restrict input to the ASCII Character
>>> set, specifically hex 0x20 - 0x7E ? Or a simple way to detect
>>> characters outside this range before committing them to the database?
>>
>> If you're dealing with a numeric column, you may consider checking for
>> is_numeric().
>
> There are also regular expressions. Testing against '/[^\0x00-\x7F]/', and
> rejecting anything that matches, appears to be the best approach here.

For a numeric column? Why not at least trim down the possibilites of
setting illegal values for the SQL statement?

> If
> characters actually should be converted, iconv() should be used instead of a
> hardcoded conversion.
>
> However, it is better in the long term to convert the MySQL database (or the
> relevant table and column) to utf8_general_ci, and upgrade MySQL if
> necessary (character sets and collations were not supported before MySQL 5;
> the current stable version is 5.6).
>
> <http://php.net/pcre>
> <http://dev.mysql.com/>

--
Christoph M. Becker
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: googleapi problem
Next Topic: Undefined variable
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sun Nov 24 14:53:03 GMT 2024

Total time taken to generate the page: 0.03586 seconds