encoding [message #186326] |
Thu, 26 June 2014 05:50 |
Denis McMahon
Messages: 634 Registered: September 2010
Karma:
|
Senior Member |
|
|
Tear my hair out time.
I have a csv file that contains text strings that I wish to display in a
web page.
The csv file is utf-8, and the text strings include the british pound
symbol encoded as two bytes 0xc2/0xa3
I'm using
setlocale(LC_CTYPE|LC_COLLATE,"en_GB.UTF8");
before reading the csv file, which I hope means that the csv file is read
as utf-8.
Then I feed the string through htmlentities() before adding it to the web
page.
However, the web page that arrives at the client has £
instead of just £.
I'm not sure where it's going wrong, partly because right now I may be
too tired to work out where and how I can inspect the string without
character encodings getting in the way.
If I print_r the data that has been read in to the web page, that shows
ok, but at that point it's still utf-8, not an html entity.
The following is at http://www.sined.co.uk/tmp/pound.php and seems to
demonstrate the issue:
<?php
setlocale(LC_CTYPE|LC_COLLATE, "en_GB.UTF8");
$str1 = "\xc2\xa3";
$str2 = htmlentities( "$str1" );
echo <<< EOT
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Broken Pound</title>
<head>
<body>
<pre>
Original: {$str1}
Entities: {$str2}
</pre>
</body>
</html>
EOT;
I'm not sure how to fix this. Ideas anyone?
--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
|
|
|