Codes to judge Simplified Chinese characters(GB2312) [message #7961] |
Fri, 03 January 2003 08:01 |
swinemania
Messages: 2 Registered: January 2003 Location: Beijing
Karma: 0
|
Junior Member |
|
|
Hi, all!
I wrote some php codes to check if there is any simplified chinese exists in a string. It works fine with GB2312 code-set.
I hope it will do some use to FUDforum.
/**
* This method is to check if a string contains Simplified Chinese (GB2312).
*
* @param $string
* @return boolean returns true, if it has GB2312 chars.
*
*/
function hasChnGBWord($string){
if (!($len=strlen($string))) return false;
for($i=0;$i<$len;$i++) {
if (isChnGBWord($string[$i],$string[$i+1])) return true;
}
return false;
}
/**
* This method is to check if a word is a Simplified Chinese (GB2312).
*
* Each GB2312 character contains two bytes, the higher byte and the lower byte.
*
* @param $higherByte
* @param $lowerByte
* @return boolean returns true, if it is a GB2312 word
*
*/
function isChnGBWord($higherByte,$lowerByte){
$higher=char2u_int($higherByte);
$lower=char2u_int($lowerByte);
$val=($higher & 0x80)?(($lower & 0x80)?1:-1):0; //If -1==$val, then error occurs! It is a half Chinese character.
return (1==$val);
}
/**
* This method is to convert a character into unsigned int.
*
* @param $char the input character
* @return int 0 if error occurs
*
*/
function char2u_int($char){
$str=bin2hex($char);
$val=0;
for($i=0;$i<2;$i++){
$c=ord($str[$i]);
if ($c>=48 && $c<=57) $c-=48; //between '0' and '9'
elseif($c>=65 && $c<=70) $c-=55; //between 'A' and 'F'
elseif($c>=97 && $c<=102) $c-=87; //between 'a' and 'f'
else $c=0; //error! use ascii char as default
if (0==$i) $c*=16; //high bit
$val+=$c;
}
return $val;
}
|
|
|
|
Re: Codes to judge Simplified Chinese characters(GB2312) [message #8237 is a reply to message #7961] |
Sat, 18 January 2003 14:13 |
laser
Messages: 9 Registered: January 2003
Karma: 0
|
Junior Member |
|
|
emm, I think it's better to use DB to do such job.
I'm using PostgreSQL, it has excellent encoding transform tools.
we can change the encoding between client (php cgi) and server (DB) using some simple query.
In my installation, I created my forum DB using UNICODE encoding.
then I added a global variable into GLOBAL.php:
$DBCLIENT_ENCODING = "GBK";
and then, add a few lines in index.php right after the connection has been made:
if(__dbtype__ == 'pgsql' && __db_connection_ok_ ){
$query = 'SET CLIENT_ENCODING TO '.'\''.$GLOBALS['DBCLIENT_ENCODING'].'\'';
pg_query($GLOBALS['__DB_INC__']['SQL_LINK'], $query);
}
now, we can use PostgreSQL's encoding conversion to support GBK/GB2312 encoding, and the DB still use UNICODE, if we change
the encoding to other scheme, we still can use the same DB, PostgreSQL would do the translation for us automatically.
Don't know much about mysql, but I think we could use some kind
of switch to utilize the DB's feature if it's possible.
|
|
|
|
³¹µ×¸ã¶¨¶à ÓïÑÔ [message #23650 is a reply to message #23218] |
Fri, 25 March 2005 05:56 |
linxiaoming
Messages: 6 Registered: March 2005
Karma: 0
|
Junior Member |
|
|
ÎÒÓÐÁËÒ»¸ö¸üºÃµÄ°ì·¨£¬²»Òª£¬don't do so :$DBCLIENT_ENCODING = "GBK";
1.modify apache->httpd.conf
#AddDefaultCharset ISO-8859-1
AddDefaultCharset utf-8
#AddDefaultCharset gb2312
2.modify postgresql->postgrsql.conf
client_encoding = sql_ascii # actually, defaults to database encoding
#client_encoding = GBK
3.backup your theme->chinese ,copy to another directory
4.Theme Management -> add a theme
Name = chinatest
Theme = default
Language = chinese
Locale = C
5.change encoding
open the file "msg" ,save as "msg",bug change the encoding to utf-8
open the file charset,change zh_CN to C
6.registe a new user with theme chinatest
7.login us the new registed user, now you can see chinese char with the utf-8 encoding.
8.but I still don't understand the charset "C"
£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿
ÎÒ»¹ÊDz»ÖªµÀÕâ¸öCÊÇÔõô»ØÊ£¡
ÔÚµÚ4²½ÖУ¬°ÑC¸Ä³Éutf-8Ò²Äܹ¤×÷£¡
¶øÇÒÏÖÔÚÎÒÕýÔÚÊäÈëµÄÕâ¸öÒ³ÃæµÄ×Ö·û¼¯ÏÔʾÊÇgb2312£¬ÊÇÔõô»ØÊ°¡£¡
ÓôÃÆ°¢£¡
£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿£¿
|
|
|