Codes to judge Simplified Chinese characters(GB2312) [message #7961] |
Fri, 03 January 2003 08:01 |
swinemania
Messages: 2 Registered: January 2003 Location: Beijing
Karma:
|
Junior Member |
|
|
Hi, all!
I wrote some php codes to check if there is any simplified chinese exists in a string. It works fine with GB2312 code-set.
I hope it will do some use to FUDforum.
/**
* This method is to check if a string contains Simplified Chinese (GB2312).
*
* @param $string
* @return boolean returns true, if it has GB2312 chars.
*
*/
function hasChnGBWord($string){
if (!($len=strlen($string))) return false;
for($i=0;$i<$len;$i++) {
if (isChnGBWord($string[$i],$string[$i+1])) return true;
}
return false;
}
/**
* This method is to check if a word is a Simplified Chinese (GB2312).
*
* Each GB2312 character contains two bytes, the higher byte and the lower byte.
*
* @param $higherByte
* @param $lowerByte
* @return boolean returns true, if it is a GB2312 word
*
*/
function isChnGBWord($higherByte,$lowerByte){
$higher=char2u_int($higherByte);
$lower=char2u_int($lowerByte);
$val=($higher & 0x80)?(($lower & 0x80)?1:-1):0; //If -1==$val, then error occurs! It is a half Chinese character.
return (1==$val);
}
/**
* This method is to convert a character into unsigned int.
*
* @param $char the input character
* @return int 0 if error occurs
*
*/
function char2u_int($char){
$str=bin2hex($char);
$val=0;
for($i=0;$i<2;$i++){
$c=ord($str[$i]);
if ($c>=48 && $c<=57) $c-=48; //between '0' and '9'
elseif($c>=65 && $c<=70) $c-=55; //between 'A' and 'F'
elseif($c>=97 && $c<=102) $c-=87; //between 'a' and 'f'
else $c=0; //error! use ascii char as default
if (0==$i) $c*=16; //high bit
$val+=$c;
}
return $val;
}
|
|
|