FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Rejecting Certain Non-ASCII Characters
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Rejecting Certain Non-ASCII Characters [message #181165 is a reply to message #181155] Fri, 19 April 2013 20:33 Go to previous messageGo to previous message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma:
Senior Member
Christoph Becker wrote:

> Jim Higgins wrote:
>> I have a problem with people entering a slashed zero vs a standard
>> ASCII zero into HTML forms intended to store data in a MySQL database.
>
> Is it really a slashed zero (U+0030 U+0338) they're entering,

Could also be U+0030 U+337 or any other (allowed composition) of the at
least 65536 Unicode characters that look(s) similar. For example, it could
be U+2205 EMPTY SET.

> or do they enter some similar looking character such as the Danish Ø?

That character, U+00D8 LATIN CAPITAL LETTER O WITH STROKE *and* its
lowercase counterpart, U+00F8 LATIN SMALL LETTER O WITH STROKE, are present
at least in Danish, Norwegian, and Faroese; They are also used in the
International Phonetic Alphabet (IPA).

> In the former case you can simply replace the slashed zero with a standard
> zero. Assuming UTF-8 encoding:
>
> $input = str_replace('\xCC\xB8', '', $input);

You do not consider that, ambiguity aside, even in UTF-8 there are *several*
ways to produce Unicode characters. See: Unicode Normalization Forms.

>> Is there a simple way in PHP to restrict input to the ASCII Character
>> set, specifically hex 0x20 - 0x7E ? Or a simple way to detect
>> characters outside this range before committing them to the database?
>
> If you're dealing with a numeric column, you may consider checking for
> is_numeric().

There are also regular expressions. Testing against '/[^\0x00-\x7F]/', and
rejecting anything that matches, appears to be the best approach here. If
characters actually should be converted, iconv() should be used instead of a
hardcoded conversion.

However, it is better in the long term to convert the MySQL database (or the
relevant table and column) to utf8_general_ci, and upgrade MySQL if
necessary (character sets and collations were not supported before MySQL 5;
the current stable version is 5.6).

<http://php.net/pcre>
<http://dev.mysql.com/>


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: googleapi problem
Next Topic: Undefined variable
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sun Nov 24 14:36:57 GMT 2024

Total time taken to generate the page: 0.04282 seconds