FUDforum: comp.lang.php » how to change old ereg?

Home » Imported messages » comp.lang.php » how to change old ereg?

Show: Today's Messages :: Polls :: Message Navigator

Re: PREG \d vs. [0-9] [message #181944 is a reply to message #181943]

Wed, 26 June 2013 16:02

Thomas 'PointedEars'
Messages: 701
Registered: October 2010

Karma:

Senior Member

Christoph Michael Becker wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Tony Mountifield wrote:
>>> That's because you have an unescaped / within your regex, so it sees
>>> /^M?(([0-9]?)[ ]?([0-9])(/ followed by a ? as a regex modifier.
>>
>> Good catch. Also, in POSIX Extended Regular Expressions (ERE) this is
>> written simpler
>>
>> ^M?(([0-9]?) ?([0-9])(…
>>
>> and in Perl-Compatible Regular Expressions (PCRE) it is written simpler
>>
>> ^M?((\d?) ?(\d)(…
>
> Isn't the exact interpretation of \d locale dependent? I was not able
> to find this information on php.net and I am not able to verify this, as
> I do not have locales available, which have decimal digits other than
> 0-9. However, at least when one works with UTF-8 encoded strings and
> uses the u modifier for the regular expression, \d is not the same as
> [0-9]:
>
>>>> $zero = "\xe0\xa5\xa6" // DEVANAGARI DIGIT ZERO
>>>> preg_match('/[0-9]/u', $zero)
> 0
>>>> preg_match('/\d/u', $zero)
> 1

PHP uses Perl-Compatible Regular Expressions (PCRE) here:

<http://php.net/preg_match>
<http://php.net/pcre>

So this can be found in greater detail in the PCRE documentation:

<http://pcre.org/pcre.txt>

“\d” can match more than just “0” to “9” in PCRE, but (unlike in Perl [1])
the behavior is _not_ locale-dependent by default. There is a flag,
PCRE_UCP, to let \d be equivalent to \p{Digit} etc. (UCP stands for “Unicode
Character Properties”), but apparently it is not set at compile-time for the
default PHP distribution:

$ locale
LANG=de_CH.UTF-8
LANGUAGE=
LC_CTYPE="de_CH.UTF-8"
LC_NUMERIC="de_CH.UTF-8"
LC_TIME="de_CH.UTF-8"
LC_COLLATE="de_CH.UTF-8"
LC_MONETARY="de_CH.UTF-8"
LC_MESSAGES=en_US.UTF-8
LC_PAPER="de_CH.UTF-8"
LC_NAME="de_CH.UTF-8"
LC_ADDRESS="de_CH.UTF-8"
LC_TELEPHONE="de_CH.UTF-8"
LC_MEASUREMENT="de_CH.UTF-8"
LC_IDENTIFICATION="de_CH.UTF-8"
LC_ALL=

$ php -r 'echo setlocale(LC_ALL, "de_CH.UTF-8") . "\n";
echo preg_match("/\d/", "१");'
de_CH.UTF-8
0

The “u” expression flag in PHP sets the PCRE_UTF8 run-time flag (as
documented), but apparently the PCRE_UCP run-time flag as well. Hence your
observation:

$ php -r 'echo preg_match("/\d/u", "१");'
1

$ php -v
PHP 5.4.15-1 (cli) (built: May 12 2013 12:17:45)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies
with XCache v3.0.1, Copyright (c) 2005-2013, by mOo
with Xdebug v2.2.1, Copyright (c) 2002-2012, by Derick Rethans
with XCache Optimizer v3.0.1, Copyright (c) 2005-2013, by mOo
with XCache Cacher v3.0.1, Copyright (c) 2005-2013, by mOo
with XCache Coverager v3.0.1, Copyright (c) 2005-2013, by mOo

In Perl there is the “a” flag to let Perl regular expressions match in ASCII
mode regardless of the locale, but it is not needed with PCRE (when PCRE_UCP
is not set at compile-time).

[1] <http://perldoc.perl.org/perlre.html>

PointedEars
--
Sometimes, what you learn is wrong. If those wrong ideas are close to the
root of the knowledge tree you build on a particular subject, pruning the
bad branches can sometimes cause the whole tree to collapse.
-- Mike Duffy in cljs, <news:Xns9FB6521286DB8invalidcom(at)94(dot)75(dot)214(dot)39>

Report message to a moderator

[Message index]

		how to change old ereg? By: astrid.kuhr on Wed, 26 June 2013 04:41
		Re: how to change old ereg? By: Thomas 'PointedEars' on Wed, 26 June 2013 09:47
		Re: how to change old ereg? By: astrid.kuhr on Wed, 26 June 2013 11:29
		Re: how to change old ereg? By: tony on Wed, 26 June 2013 11:57
		Re: how to change old ereg? By: Thomas 'PointedEars' on Wed, 26 June 2013 12:13
		PREG \d vs. [0-9] (was: how to change old ereg?) By: Christoph Michael Bec on Wed, 26 June 2013 14:42
		Re: PREG \d vs. [0-9] By: Thomas 'PointedEars' on Wed, 26 June 2013 16:02
		Re: how to change old ereg? By: Gregor Kofler on Wed, 26 June 2013 19:34
		Re: how to change old ereg? By: Thomas 'PointedEars' on Wed, 26 June 2013 23:18
		Re: how to change old ereg? By: The Natural Philosoph on Thu, 27 June 2013 11:55
		Re: how to change old ereg? By: astrid.kuhr on Wed, 26 June 2013 12:24
		Re: how to change old ereg? By: astrid.kuhr on Fri, 28 June 2013 08:38
		Re: how to change old ereg? By: tony on Fri, 28 June 2013 09:02
		Re: how to change old ereg? By: Thomas 'PointedEars' on Fri, 28 June 2013 10:39
		Re: how to change old ereg? By: Denis McMahon on Fri, 28 June 2013 11:08
		Re: how to change old ereg? By: astrid.kuhr on Fri, 28 June 2013 09:46
		Re: how to change old ereg? By: Jerry Stuckle on Fri, 28 June 2013 12:48

Previous Topic:	FORMS, validating mail was sent
Next Topic:	$referrer = $_SERVER['HTTP_REFERER'] echo

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Sun Apr 06 00:00:12 GMT 2025

Total time taken to generate the page: 0.03131 seconds