Useful function when there are still some ISO-8859 chars in the UTF8 database stored messages [message #163432] |
Sun, 31 October 2010 17:49 |
herve
Messages: 22 Registered: October 2010
Karma:
|
Junior Member |
|
|
Hello,
I have seen that when importing messages from a mailing-list or from the newsgroups, I still have messages with ISO-8859-15 charset that are not converted into UTF8. I suspect that this is because no charset was present in the headers of those messages.
So I found on the net a function that replaces the accented letters by the HTMLentities (for example : é replaced by é)
Here it is :
// In case there are still some ISO-8859-15 chars in your UTF8 stored messages
function ConvertEntities($string, $quotes = ENT_COMPAT, $slashesstriping = TRUE, $nl2brshape = TRUE)
{
if ( $slashesstriping )
$string = stripslashes($string);
// Check the real charset (UTF-8 -> ISO-8859-15)
$first = ( !preg_match('`é`i', mb_convert_encoding($string.'é', 'UTF-8', 'ISO-8859-15') ) ? 'UTF-8, ISO-8859-15' : 'ISO-8859-15, UTF-8');
// Check if a conversion is needed
if ( !mb_check_encoding($string,'ASCII') )
$string = htmlentities($string, $quotes, mb_detect_encoding($string,$first) );
else
$string = htmlentities($string,$quotes);
// Just in case, also convert the \ in ’
$string = strtr($string, array('\'' => '’'));
// Convert the \r\n
return ( $nl2brshape ) ? nl2br($string) : strtr($string,array(chr(10).chr(13) => ''));
}
I would like to use it just when a message is displayed on screen, but I don't know where to insert it and what to change in the php code.
The advantage of this function is that it doesn't change anything in the database, but just modify a little bit the message when producing the html code. So it is perfect for a test, without changing anything in the database.
Can you help me to test this function please ? I will really appreciate your help
Thousands of old imported messages are stored in UTF8 charset in my FUDforum database without any conversion, although there were in ISO-8859-15, just because no charset was defined in the headers of these messages.
The best thing would be a checker that would automatically test if the stored message is in real UTF8 or if it still contains some ISO-8859-15 chars. In this case, the message would be completely converted into UTF8. This would work for me because all the messages stored in the database are either in UTF8, or in ISO-8859-15 charset and nothing else.
PS : I am using FUDforum 3.0.2RC1
[Updated on: Sun, 31 October 2010 18:00] Report message to a moderator
|
|
|