FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Plugins and Code Hacks » Useful function when there are still some ISO-8859 chars in the UTF8 database stored messages (... but I don't know where to use it in the FUDforum source code)
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Useful function when there are still some ISO-8859 chars in the UTF8 database stored messages [message #163432] Sun, 31 October 2010 17:49 Go to next message
herve is currently offline  herve   France
Messages: 22
Registered: October 2010
Karma: 0
Junior Member
Hello,

I have seen that when importing messages from a mailing-list or from the newsgroups, I still have messages with ISO-8859-15 charset that are not converted into UTF8. I suspect that this is because no charset was present in the headers of those messages.

So I found on the net a function that replaces the accented letters by the HTMLentities (for example : é replaced by é)

Here it is :
// In case there are still some ISO-8859-15 chars in your UTF8 stored messages
function ConvertEntities($string, $quotes = ENT_COMPAT, $slashesstriping = TRUE, $nl2brshape = TRUE)
{
	if ( $slashesstriping )
      $string = stripslashes($string);
	
	// Check the real charset (UTF-8 -> ISO-8859-15)
	$first = ( !preg_match('`é`i', mb_convert_encoding($string.'é', 'UTF-8', 'ISO-8859-15') ) ? 'UTF-8, ISO-8859-15' : 'ISO-8859-15, UTF-8');
	
	// Check if a conversion is needed 
	if ( !mb_check_encoding($string,'ASCII') )
          $string = htmlentities($string, $quotes, mb_detect_encoding($string,$first) );
	else
	  $string = htmlentities($string,$quotes);
	
        // Just in case, also convert the \ in ’
	$string = strtr($string, array('\'' => '’'));

	// Convert the \r\n
	return ( $nl2brshape ) ? nl2br($string) : strtr($string,array(chr(10).chr(13) => ''));
}


I would like to use it just when a message is displayed on screen, but I don't know where to insert it and what to change in the php code.

The advantage of this function is that it doesn't change anything in the database, but just modify a little bit the message when producing the html code. So it is perfect for a test, without changing anything in the database.

Can you help me to test this function please ? I will really appreciate your help Razz

Thousands of old imported messages are stored in UTF8 charset in my FUDforum database without any conversion, although there were in ISO-8859-15, just because no charset was defined in the headers of these messages.

The best thing would be a checker that would automatically test if the stored message is in real UTF8 or if it still contains some ISO-8859-15 chars. In this case, the message would be completely converted into UTF8. This would work for me because all the messages stored in the database are either in UTF8, or in ISO-8859-15 charset and nothing else.

PS : I am using FUDforum 3.0.2RC1

[Updated on: Sun, 31 October 2010 18:00]

Report message to a moderator

Re : Useful function when there are still some ISO-8859 chars in the UTF8 database stored messages [message #163433 is a reply to message #163432] Sun, 31 October 2010 18:47 Go to previous message
herve is currently offline  herve   France
Messages: 22
Registered: October 2010
Karma: 0
Junior Member
By the way, could this function be used as a htmlentities.plugin where we could define the ISO charset in the plugin configuration ? That would probably be very useful to many other users Razz
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Dynamic forum META description
Next Topic: Mini-patch to better deal with mailing-list subject lines
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Nov 23 10:07:23 GMT 2024

Total time taken to generate the page: 0.02122 seconds