Re: Cannot write utf8 data into a utf8 column - SOLVED [message #170756 is a reply to message #170752] |
Fri, 19 November 2010 22:48 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma:
|
Senior Member |
|
|
Tony Marston wrote:
> "Peter H. Coffin" <hellsop(at)ninehells(dot)com> wrote in message
> news:slrniebqgp(dot)1g0(dot)hellsop(at)abyss(dot)ninehells(dot)com...
It is called attribution _line_, not attribution novel.
>> ["Followup-To:" header set to comp.databases.mysql.]
>> On Thu, 18 Nov 2010 16:54:47 -0000, Tony Marston wrote:
>>> [garbled posting due to missing character encoding declaration]
>>>
>>> it fails with the following error:
>>>
>>> "Incorrect string value: '\xA0\xA0 \xE6\x88\x91...' for column
>>> 'message_text' at row 1"
>>>
>>> When I try the SAME update through SQL-Front or phpMyAdmin it works! Why
>>> is this?
Those frontends are apparently set up to use UTF-8 for the input character
encoding, so whenever you type/paste a character there, it is regarded the
corresponding Unicode character upon form submit.
PHP, on the other hand, uses the character encoding of the source file: If
you create the source file with e.g. notepad.exe on Windows, most certainly
it will have a Windows-125x character encoding by default.
>> If it were the SAME, it would work the same. MySQL gives this error when
>> there's invalid UTF-8 byte sequences, like a continuation byte without
>> starting byte, or a starting byte that is not followed by a continuation
>> byte... Find out what the hex for what you're trying to stick in
>> message_text and I bet it won't be what it should be.
>
> The error message was reporting a problem with the hex value \xA0 (decimal
> 160) which represents ' ' or the non-breaking space.
Only in ISO-8859-1/Windows-1252. In UTF-8, A0 is one of the aforementioned
continuation bytes. See below.
> I discovered that instead of replacing ' ' with chr(160) that I
> needed to replace it with chr(194).chr(160). I don't now why the chr(194)
> is necessary, but it solves my problem.
(Ignorance must be bliss.) If you want to grow beyond the script-kiddie
trial-and-error approach (which seldom is successful in programming), you
should want to learn *how* and *why* things work.
chr(194) . chr(160)
"works" *because* chr(194) returns an _ISO-8859-1/Windows-1252_ character at
code point 0xC2¹ ─ Â ─, and chr(160) returns an ISO-8859-1/Windows-125x
character at code point 0xA0 ─ <NBSP> ─, which is, by concatenation,
combined to `Â<NBSP>' which is an ISO-8859-1/Windows-1252 representation of
the byte sequence
C2 A0
In UTF-8, this is a sequence consisting of two UTF-8 code units (hence
UTF-*8*: 8 bits, or 1 byte, per code unit), encoding the character at
Unicode code point U+00A0 (NO-BREAK SPACE). (C2 is one of the start bytes
of a 2-byte sequence.)
So you should have RTFM and called the equivalent of
mysql_query(utf8_encode($query));
instead. See also:
<http://unicode.org/faq/>
<http://en.wikipedia.org/wiki/UTF-8>
<http://people.w3.org/rishida/tools/conversion/>
<http://php.net/chr>
<http://php.net/utf8_encode>
HTH
PointedEars
___________
¹ Neither character is part of ASCII, despite the PHP manual stating that
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
|
|
|