mysql dynamic binding and pass-by-ref deprecated [message #180816] |
Wed, 20 March 2013 03:49 |
oldyork90
Messages: 9 Registered: December 2011
Karma: 0
|
Junior Member |
|
|
I developed php code on a local uSWin box - php 5.3.8, mysql 5.5.
Every thing worked fine.
I moved it to godaddy - boom.
-->Call-time pass-by-reference has been deprecated
call_user_func_array(array($sth,'bind_param'),$vr) wants an array of
references.
But I get these deprecated messages when I push the references on an
array and when I try and run call_user_func_array()
I tried removing the references and directly pushing on the vars, but
no go.
Short of a giant switch statement to count parameters, any ideas?
godaddy is sitting at php 5.3, mysql 5.0.96
|
|
|
Re: mysql dynamic binding and pass-by-ref deprecated [message #180820 is a reply to message #180816] |
Wed, 20 March 2013 10:42 |
Jerry Stuckle
Messages: 2598 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 3/19/2013 11:49 PM, oldyork90 wrote:
> I developed php code on a local uSWin box - php 5.3.8, mysql 5.5.
> Every thing worked fine.
>
> I moved it to godaddy - boom.
>
> -->Call-time pass-by-reference has been deprecated
>
> call_user_func_array(array($sth,'bind_param'),$vr) wants an array of
> references.
>
> But I get these deprecated messages when I push the references on an
> array and when I try and run call_user_func_array()
>
> I tried removing the references and directly pushing on the vars, but
> no go.
>
> Short of a giant switch statement to count parameters, any ideas?
> godaddy is sitting at php 5.3, mysql 5.0.96
>
Sorry, my crystal ball is in the shop so I can't see your code.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex(at)attglobal(dot)net
==================
|
|
|
Re: mysql dynamic binding and pass-by-ref deprecated [message #180822 is a reply to message #180820] |
Wed, 20 March 2013 14:09 |
oldyork90
Messages: 9 Registered: December 2011
Karma: 0
|
Junior Member |
|
|
> Sorry, my crystal ball is in the shop so I can't see your code.
Mine isn't..
OOooooo. AAaaaaa. I see a white picket fence and a woman crying...
$vr = array();
..
..
$bind_name = 'bp' . $fld_name;
@ $$bind_name = $validated_field_input;
@ array_push($vr, &$$bind_name);
..
@ array_push($vr, &$validated_uid);
..
..
array_unshift($vr, $bind_types); // $bind_types is a string like
"ssisii"
call_user_func_array(array($sth,'bind_param'),$vr)
Thanks for any help.
|
|
|
Re: mysql dynamic binding and pass-by-ref deprecated [message #180823 is a reply to message #180822] |
Wed, 20 March 2013 14:29 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
oldyork90 wrote:
> $vr = array();
> .
> .
> $bind_name = 'bp' . $fld_name;
> @ $$bind_name = $validated_field_input;
> @ array_push($vr, &$$bind_name);
> .
> @ array_push($vr, &$validated_uid);
Simply omit the “&”s in the calls. Not pass-by-reference as such is
deprecated since PHP 5.3, but “*call-time* pass-by-reference”, as the
warning says. The rationale is that it should be up to the function to
decide whether a parameter should be treated as a reference, and so
require a name for a variable to be be modified; not to the caller.
RTFM: <http://php.net/manual/en/language.references.pass.php>
PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
|
|
|
|
|
|
Re: UTF-8 charset [message #180830 is a reply to message #180827] |
Wed, 20 March 2013 23:20 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 20/03/13 22:45, Christoph Becker wrote:
> The Cat in the Hat wrote:
>
>> How about omitting the smart quotes in your posts?
>>
>> Come back after you've figured out how to configure your newsreader properly.
>
> Thomas' message was properly encoded as UTF-8. Isn't that acceptable in
> this NG?
>
works for me with a proper newsreader.
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: UTF-8 charset [message #180831 is a reply to message #180829] |
Wed, 20 March 2013 23:31 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 20/03/13 23:13, The Cat in the Hat wrote:
> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>
>> The Cat in the Hat wrote:
>>
>>> How about omitting the smart quotes in your posts?
>>>
>>> Come back after you've figured out how to configure your newsreader properly.
>>
>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable in
>> this NG?
>
> Usenet was designed to use ASCII, not UTF-8. At best, it's poor netiquette in *any* NG.
>
Roads were designed to take horses and carts.
Hows your Nag and Surrey?
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: UTF-8 charset [message #180833 is a reply to message #180830] |
Wed, 20 March 2013 23:39 |
Scott Johnson
Messages: 196 Registered: January 2012
Karma: 0
|
Senior Member |
|
|
On 3/20/2013 4:20 PM, The Natural Philosopher wrote:
> On 20/03/13 22:45, Christoph Becker wrote:
>> The Cat in the Hat wrote:
>>
>>> How about omitting the smart quotes in your posts?
>>>
>>> Come back after you've figured out how to configure your newsreader
>>> properly.
>>
>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable in
>> this NG?
>>
> works for me with a proper newsreader.
>
>
No problem here either and I use thunderbird.
|
|
|
|
Re: UTF-8 charset [message #180838 is a reply to message #180829] |
Thu, 21 March 2013 09:11 |
M. Strobel
Messages: 386 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
Am 21.03.2013 00:13, schrieb The Cat in the Hat:
> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>
>> The Cat in the Hat wrote:
>>
>>> How about omitting the smart quotes in your posts?
>>>
>>> Come back after you've figured out how to configure your newsreader properly.
>>
>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable in
>> this NG?
>
> Usenet was designed to use ASCII, not UTF-8. At best, it's poor netiquette in *any* NG.
>
Utter nonsense.
/Str.
|
|
|
Re: mysql dynamic binding and pass-by-ref deprecated [message #180845 is a reply to message #180823] |
Thu, 21 March 2013 13:43 |
oldyork90
Messages: 9 Registered: December 2011
Karma: 0
|
Junior Member |
|
|
> Simply omit the "&"s in the calls.
Tried that and other things, but no go. Went with, :-),
case 23:
$status = $sth->bind_param($types,
$values[0], $values[1], $values[2],
$values[3], $values[4], $values[5],
$values[6], $values[7], $values[8],
$values[9], $values[10], $values[11],
$values[12], $values[13], $values[14],
$values[15], $values[16], $values[17],
$values[18], $values[19], $values[20],
$values[21], $values[22]);
(quotes worked for me)
|
|
|
Re: UTF-8 charset [message #180847 is a reply to message #180838] |
Thu, 21 March 2013 14:31 |
adrian
Messages: 27 Registered: December 2012
Karma: 0
|
Junior Member |
|
|
M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>
>>> The Cat in the Hat wrote:
>>>
>>>> How about omitting the smart quotes in your posts?
>>>>
>>>> Come back after you've figured out how to configure your newsreader
>>>> properly.
>>>
>>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>> in this NG?
>>
>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>> netiquette in *any* NG.
>>
>
> Utter nonsense.
There are some computers that cannot read UTF-8. Always use the lowest
common denominator if you want to communicate effectively without
excluding anyone.
--
~ Adrian Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
|
|
|
|
Re: UTF-8 charset [message #180849 is a reply to message #180847] |
Thu, 21 March 2013 16:22 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 21/03/13 14:31, Adrian Tuddenham wrote:
> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>
>> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>>
>>>> The Cat in the Hat wrote:
>>>>
>>>> > How about omitting the smart quotes in your posts?
>>>> >
>>>> > Come back after you've figured out how to configure your newsreader
>>>> > properly.
>>>>
>>>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>>> in this NG?
>>>
>>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>>> netiquette in *any* NG.
>>>
>>
>> Utter nonsense.
>
> There are some computers that cannot read UTF-8. Always use the lowest
> common denominator if you want to communicate effectively without
> excluding anyone.
>
That's why I always send a semaphore sign video attachment to all my posts.
Cos deaf people cant hear me otherwise. sheesh.
Remember Amoebea dont speak inglish. And they deserve to know what your
saying as well.
that's about as insane as having wheelchair access to a specialist rock
climbing shop..
Or braille markings on video DVDS.
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: UTF-8 charset [message #180850 is a reply to message #180847] |
Thu, 21 March 2013 17:05 |
M. Strobel
Messages: 386 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>
>> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>>
>>>> The Cat in the Hat wrote:
>>>>
>>>> > How about omitting the smart quotes in your posts?
>>>> >
>>>> > Come back after you've figured out how to configure your newsreader
>>>> > properly.
>>>>
>>>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>>> in this NG?
>>>
>>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>>> netiquette in *any* NG.
>>>
>>
>> Utter nonsense.
>
> There are some computers that cannot read UTF-8. Always use the lowest
> common denominator if you want to communicate effectively without
> excluding anyone.
>
More than 90% of the text is still readable if your usenet client only displays ASCII.
In the anglophone world people get easily to think ASCII is sufficient, but in a
global world even 16-bit Unicode (Basic Multilingual Plane) is not.
Some languages use UCS-16 internally, and it is not enough for all uses and users.
So UTF-8 is THE solution. There must be some progress sometime. ASCII, 7-bit
encodings and octal are left behind.
/Str.
|
|
|
Re: UTF-8 charset [message #180851 is a reply to message #180847] |
Thu, 21 March 2013 17:36 |
M. Strobel
Messages: 386 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>
>> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>>
>>>> The Cat in the Hat wrote:
>>>>
>>>> > How about omitting the smart quotes in your posts?
>>>> >
>>>> > Come back after you've figured out how to configure your newsreader
>>>> > properly.
>>>>
>>>> Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>>> in this NG?
>>>
>>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>>> netiquette in *any* NG.
>>>
>>
>> Utter nonsense.
>
> There are some computers that cannot read UTF-8. Always use the lowest
> common denominator if you want to communicate effectively without
> excluding anyone.
>
RFC 3977 Network News Transfer Protocol (NNTP) October 2006
[Page 3]
o the default character set is changed from US-ASCII [ANSI1986] to
UTF-8 [RFC3629] (note that US-ASCII is a subset of UTF-8);
I am so fed up with limited character sets like ISO-8859 or the Win encoding a
programmer/sysadmin has to care about.
/Str.
|
|
|
Re: UTF-8 charset [OT] [message #180852 is a reply to message #180851] |
Thu, 21 March 2013 18:22 |
adrian
Messages: 27 Registered: December 2012
Karma: 0
|
Junior Member |
|
|
M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
> Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
>> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>>
>>> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>>>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>>>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>>>
>>>> > The Cat in the Hat wrote:
>>>> >
>>>> >> How about omitting the smart quotes in your posts?
>>>> >>
>>>> >> Come back after you've figured out how to configure your newsreader
>>>> >> properly.
>>>> >
>>>> > Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>>> > in this NG?
>>>>
>>>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>>>> netiquette in *any* NG.
>>>>
>>>
>>> Utter nonsense.
>>
>> There are some computers that cannot read UTF-8. Always use the lowest
>> common denominator if you want to communicate effectively without
>> excluding anyone.
>>
>
> RFC 3977 Network News Transfer Protocol (NNTP) October 2006
> [Page 3]
> o the default character set is changed from US-ASCII [ANSI1986] to
> UTF-8 [RFC3629] (note that US-ASCII is a subset of UTF-8);
>
> I am so fed up with limited character sets like ISO-8859 or the Win encoding a
> programmer/sysadmin has to care about.
A lot depends on the readership you are aiming at. If you are writing
for just one person or a target audience who can all read your preferred
character set, there is no reason not to use it. If you are writing for
a larger audience and do not want to exclude some potential readers,
then you may have to limit your choice. If you are writing comercially,
you cannot afford to upset or alienate a single customer.
As a general rule, the larger the readership, the greater the
constraints which fall upon the writer. A lot of time and extra work
(and possibly frustration) on the part of one writer will save a much
greater amount of time wasted in total by hundreds of frustrated
readers.
--
~ Adrian Tuddenham ~
(Remove the ".invalid"s and add ".co.uk" to reply)
www.poppyrecords.co.uk
|
|
|
Re: UTF-8 charset [OT] [message #180853 is a reply to message #180852] |
Thu, 21 March 2013 20:00 |
Peter H. Coffin
Messages: 245 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On Thu, 21 Mar 2013 18:22:30 +0000, Adrian Tuddenham wrote:
> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>> [Page 3]
>> o the default character set is changed from US-ASCII [ANSI1986] to
>> UTF-8 [RFC3629] (note that US-ASCII is a subset of UTF-8);
>>
>> I am so fed up with limited character sets like ISO-8859 or the Win encoding a
>> programmer/sysadmin has to care about.
>
> A lot depends on the readership you are aiming at. If you are writing
> for just one person or a target audience who can all read your preferred
> character set, there is no reason not to use it. If you are writing for
> a larger audience and do not want to exclude some potential readers,
> then you may have to limit your choice. If you are writing comercially,
> you cannot afford to upset or alienate a single customer.
>
> As a general rule, the larger the readership, the greater the
> constraints which fall upon the writer. A lot of time and extra work
> (and possibly frustration) on the part of one writer will save a much
> greater amount of time wasted in total by hundreds of frustrated
> readers.
OTOH, curly-quotes (which IIRC were the issue here) add little, if
anything, to clarity for any language, and certainly have no place in
PHP coding.
--
83. If I'm eating dinner with the hero, put poison in his goblet, then
have to leave the table for any reason, I will order new drinks for
both of us instead of trying to decide whether or not to switch
with him. --Peter Anspach's list of things to do as an Evil Overlord
|
|
|
Re: UTF-8 charset [message #180854 is a reply to message #180850] |
Fri, 22 March 2013 00:17 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
M. Strobel wrote:
> Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
>> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>>> Am 21.03.2013 00:13, schrieb The Cat in the Hat:
>>>> Christoph Becker <cmbecker69(at)gmx(dot)de> wrote in
>>>> news:kide60$df1$1(at)speranza(dot)aioe(dot)org:
>>>> > The Cat in the Hat wrote:
>>>> >> How about omitting the smart quotes in your posts?
>>>> >>
>>>> >> Come back after you've figured out how to configure your newsreader
>>>> >> properly.
>>>> >
>>>> > Thomas' message was properly encoded as UTF-8. Isn't that acceptable
>>>> > in this NG?
>>>>
>>>> Usenet was designed to use ASCII, not UTF-8. At best, it's poor
>>>> netiquette in *any* NG.
>>>
>>> Utter nonsense.
ACK.
>> There are some computers that cannot read UTF-8.
That would be computers and software older than 30 years now. The Unicode
Standard, version 2.0, and UTF-8, one the encodings for the character set
thus specified, was published in 1992 CE. All reasonably modern operating
systems, in effect all commonly used ones, support Unicode and provide
Unicode-capable fonts. Many have made a Unicode encoding their default
encoding; for example, NTFS encodes filenames using UTF-16, and UTF-8 has
been the default locale encoding on GNU/Linux systems for several years now.
Thus, a major criticism of PHP is that as of version 5.4 it still has no
native Unicode support, while other popular programming languages on the
Web, like ECMAScript implementations and Python, have.
>> Always use the lowest common denominator if you want to communicate
>> effectively without excluding anyone.
Insisting on using *ancient* and therefore inherently *insecure* hardware
and software is no excuse for claiming something mindbogglingly absurd like
that using UTF-8 encoding would be “poor netiquette in *any* NG”. First of
all, netiquette (network etiquette) is less concerned with the technical
aspects of messages but with the behavior of people towards each other thus
exhibited. Which makes this clearly a case of “the pot calling the kettle
black”.
Speaking of netiquette, though, it *is* considered polite (on Usenet) to
introduce oneself with one's full name. People posting under unnecessarily
abbreviated names, pseudonyms, and plain nick names, like “The Cat in the
Hat”, are usually either frowned upon, laughed at, or ignored instead by
longtime regulars (like me). The same applies to people who violate
technical standards or quasi-standards such as RFC 5536, along with the AUPs
of their providers, by using non-addresses in address header field values:
<http://www.zedat.fu-berlin.de/NetNews-Regeln>
<http://www.kirchwitz.de/~amk/dni/netiquette> (based on a Big 8 original)
> More than 90% of the text is still readable if your usenet client only
> displays ASCII.
>
> In the anglophone world people get easily to think ASCII is sufficient,
> but in a global world even 16-bit Unicode (Basic Multilingual Plane) is
> not.
Those are good points, however:
> Some languages use UCS-16 internally, and it is not enough for all uses
> and users.
This is a common misconception. There is no “UCS-16” and there never was.
At most, there is or was (depending how you look at it), UCS-2 (Universal
Character Set 2), to whose character set the Unicode character set from
Unicode 2.0 on is byte-by-byte equivalent.
The underlying misconception here is confusing character set with character
encoding, probably also common because of the “charset” parameter of
Internet messages that despite its name declares *character encoding* of a
message.
UCS-2 is/was a standard for a *character set along with a specific, 16-bit
encoding* to encode all characters in the Basic Multilingual Plane (BMP),
and *only* those. This and the overhead for simple (purely US-ASCII-based)
texts lead to the later success of the competing standard, Unicode.
See also: <http://en.wikipedia.org/wiki/UCS-16> (properly redirected to the
UCS article, with an explanation why “UCS-16” is just wrong.)
*UTF*-16 (Unicode Transformation Format, 16-bit) is *another* encoding where
each *code unit* of a sequence for a character has 16 bit; a character may
be encoded using more than one code unit (the same is true for the other
UTFs). The character set thus encoded is the Unicode character set, and
both are defined in the Unicode Standard. That character set comprises, and
its transformation formats can encode, a lot more than just the BMP,
although of the subsets of Unicode the BMP remains the best supported one,
also due to pre-installed font support. At the moment, there are 1'114'112
code points in Unicode (U+0000 to U+10FFFF), so in theory a text encoded
with one of the UTFs could contain 1'114'112 different characters. (In
practice, some code point ranges in the BMP are reserved for surrogate
characters to allow more than the 65536 potential characters of the BMP
while keeping the encoding relatively simple and backwards-compatible.)
> So UTF-8 is THE solution. There must be some progress sometime.
ACK.
> ASCII, 7-bit encodings and octal are left behind.
US-ASCII *is* a 7-bit encoding. “Octal”?
The most important thing about UTF-8 is that it is equivalent to US-ASCII
for code points below U+0080 (one 8-bit code unit per character, the MSB is
always 0, encodes the same characters as in US-ASCII). Therefore, UTF-8,
through Unicode, allows texts with the greatest possible range of characters
(all written languages, even some extinct ones, common symbols, punctuation
etc.) while using the least amount of memory when this range is _not_ fully
used.
Actually, it should not be necessary to explain all this to people
subscribed to a programming newsgroup, and to Web developers in particular,
but there you are.
<http://www.joelonsoftware.com/articles/Unicode.html>
<http://unicode.org/faq/>
HTH
PointedEars
--
> If you get a bunch of authors […] that state the same "best practices"
> in any programming language, then you can bet who is wrong or right...
Not with javascript. Nonsense propagates like wildfire in this field.
-- Richard Cornford, comp.lang.javascript, 2011-11-14
|
|
|
Re: UTF-8 charset [OT] [message #180855 is a reply to message #180853] |
Fri, 22 March 2013 00:47 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Peter H. Coffin wrote:
> On Thu, 21 Mar 2013 18:22:30 +0000, Adrian Tuddenham wrote:
>> M. Strobel <sorry_no_mail_here(at)nowhere(dot)dee> wrote:
>>> [Page 3]
>>> o the default character set is changed from US-ASCII [ANSI1986] to
>>> UTF-8 [RFC3629] (note that US-ASCII is a subset of UTF-8);
That is interesting, but hardly relevant here. The relevant standard here
is Proposed Standard RFC 5536 (Netnews Article Format, 2009 CE) which refers
to Draft Standard RFC 5322 (Internet Message Format, 2008) for the general
format of Internet messages, which in turn refers to the Draft Standards
RFC 2045 to 2049 (MIME, 1996) for message encoding.
>>> I am so fed up with limited character sets like ISO-8859 or the Win
>>> encoding a programmer/sysadmin has to care about.
Do you even know that the so-called “ISO-8859-1” actually is Windows-1252?
> […]
> OTOH, curly-quotes (which IIRC were the issue here) add little, if
> anything, to clarity for any language, and certainly have no place in
> PHP coding.
Not quite. While typographic/proper quotation marks (“curly-quotes” is a
misnomer) have not been used as part of PHP code here (of course), there is
nothing wrong with using them in syntactically correct ways, like *inside*
string values, as long as the source code uses the corresponding character
encoding and the encoding is properly declared (since PHP is encoding-
agnostic, if you declare and use UTF-8 – without BOM, preferably – you can
type those characters verbatim; do not think that I “'d and ”'d
everything in PHP source code for <http://PointedEars.de/es-matrix>).
In fact, if the value is to be used in output, like non-code in a Web
document, it is strongly recommended to use typographical quotes so as to
improve the readability of the text. However, few Web developers are aware
of the benefits of Web typography; I must admit that I had not become very
much aware of it myself before writing my thesis (in publishable texts like
scientific works you *must* use correct typography). Only that some longer
time before that I had already written PHP code that automatically
transformed “ASCII” typography into proper one for an e-book; I should
probably refactor and reuse it now.
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
|
|
|
Re: UTF-8 charset [message #180856 is a reply to message #180854] |
Fri, 22 March 2013 00:55 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
Thomas 'PointedEars' Lahn wrote:
>> Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
>>> There are some computers that cannot read UTF-8.
> That would be computers and software older than 30 years now. The Unicode
> Standard, version 2.0, and UTF-8, one the encodings for the character set
> thus specified, was published in 1992 CE. All reasonably modern operating
> systems, in effect all commonly used ones, support Unicode and provide
> Unicode-capable fonts. Many have made a Unicode encoding their default
> encoding; for example, NTFS encodes filenames using UTF-16, and UTF-8 has
> been the default locale encoding on GNU/Linux systems for several years now.
I recently read, that UTF-8 is not available on many Windows PCs in
Taiwan, where `BIG5' is still prevalent, what would be very unfortunate.
OTOH restricting oneself to ASCII on Usenet reminds me of offering full
support for IE 6 on the Web (e.g. by using GIFs instead PNGs when
transparency is required), which IMO holds back reasonable innovation.
> Thus, a major criticism of PHP is that as of version 5.4 it still has no
> native Unicode support, while other popular programming languages on the
> Web, like ECMAScript implementations and Python, have.
Indeed, that's a shame and rather painful for developers. And if I'm
not mistaken, the situation won't change with PHP 5.5. :(
--
Christoph M. Becker
|
|
|
Unicode support (was: UTF-8 charset) [message #180857 is a reply to message #180856] |
Fri, 22 March 2013 01:16 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Christoph Becker wrote:
> Thomas 'PointedEars' Lahn wrote:
>>> Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
>>>> There are some computers that cannot read UTF-8.
>>
>> That would be computers and software older than 30 years now. The
>> Unicode Standard, version 2.0, and UTF-8, one the encodings for the
>> character set thus specified, was published in 1992 CE. All reasonably
>> modern operating systems, in effect all commonly used ones, support
>> Unicode and provide Unicode-capable fonts. Many have made a Unicode
>> encoding their default encoding; for example, NTFS encodes filenames
>> using UTF-16, and UTF-8 has been the default locale encoding on GNU/Linux
>> systems for several years now.
>
> I recently read, that UTF-8 is not available on many Windows PCs in
> Taiwan, where `BIG5' is still prevalent, what would be very unfortunate.
I would like to see proof of that. While possible, I consider it unlikely.
Taiwan is especially intertwined with the Western world (being the location
of major local hardware manufacturers, and major foreign investments on the
doorstep to mainland China), and Traditional Chinese as written in Taiwan
was one of the first scripts to be covered by Unicode, with CJK Unified
Ideographs, in 1992.
Windows has been supporting Unicode, and came pre-installed with Unicode-
capable fonts since Windows NT (and so, Windows 2000). Did not Windows XP
support end last year?
> OTOH restricting oneself to ASCII on Usenet reminds me of offering full
> support for IE 6 on the Web (e.g. by using GIFs instead PNGs when
> transparency is required), which IMO holds back reasonable innovation.
ACK.
>> Thus, a major criticism of PHP is that as of version 5.4 it still has no
>> native Unicode support, while other popular programming languages on the
>> Web, like ECMAScript implementations and Python, have.
>
> Indeed, that's a shame and rather painful for developers. And if I'm
> not mistaken, the situation won't change with PHP 5.5. :(
That would be a pity. Native Unicode support was announced for PHP 6 before
those plans were abandoned. I wonder, what can be that hard in implementing
it? Apparently even Perl managed the transition years ago although it
requires a few extra lines per script.
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
|
|
|
Re: UTF-8 charset [message #180862 is a reply to message #180854] |
Fri, 22 March 2013 08:46 |
M. Strobel
Messages: 386 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
Am 22.03.2013 01:17, schrieb Thomas 'PointedEars' Lahn:
> M. Strobel wrote:
>
>> So UTF-8 is THE solution. There must be some progress sometime.
>
> ACK.
>
>> ASCII, 7-bit encodings and octal are left behind.
>
> US-ASCII *is* a 7-bit encoding. “Octal”?
>
strobel@suse122-intel:~> php -a
Interactive mode enabled
php > echo 071, PHP_EOL;
57
php > echo 070, PHP_EOL;
56
php >
> The most important thing about UTF-8 is that it is equivalent to US-ASCII
> for code points below U+0080 (one 8-bit code unit per character, the MSB is
> always 0, encodes the same characters as in US-ASCII). Therefore, UTF-8,
> through Unicode, allows texts with the greatest possible range of characters
> (all written languages, even some extinct ones, common symbols, punctuation
> etc.) while using the least amount of memory when this range is _not_ fully
> used.
>
> Actually, it should not be necessary to explain all this to people
> subscribed to a programming newsgroup, and to Web developers in particular,
> but there you are.
+1
/Str.
|
|
|
Re: UTF-8 charset [message #180863 is a reply to message #180856] |
Fri, 22 March 2013 10:50 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 22/03/13 00:55, Christoph Becker wrote:
> I recently read, that UTF-8 is not available on many Windows PCs in
> Taiwan,...
That would be the ones running on pirated windows yes?
:-)
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180976 is a reply to message #180857] |
Sat, 30 March 2013 12:57 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
Thomas 'PointedEars' Lahn wrote:
>
> Christoph Becker wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>>> Am 21.03.2013 15:31, schrieb Adrian Tuddenham:
>>>> > There are some computers that cannot read UTF-8.
>>>
>>> That would be computers and software older than 30 years now. The
>>> Unicode Standard, version 2.0, and UTF-8, one the encodings for the
>>> character set thus specified, was published in 1992 CE. All reasonably
>>> modern operating systems, in effect all commonly used ones, support
>>> Unicode and provide Unicode-capable fonts. Many have made a Unicode
>>> encoding their default encoding; for example, NTFS encodes filenames
>>> using UTF-16, and UTF-8 has been the default locale encoding on GNU/Linux
>>> systems for several years now.
>>
>> I recently read, that UTF-8 is not available on many Windows PCs in
>> Taiwan, where `BIG5' is still prevalent, what would be very unfortunate.
>
> I would like to see proof of that. While possible, I consider it unlikely.
> Taiwan is especially intertwined with the Western world (being the location
> of major local hardware manufacturers, and major foreign investments on the
> doorstep to mainland China), and Traditional Chinese as written in Taiwan
> was one of the first scripts to be covered by Unicode, with CJK Unified
> Ideographs, in 1992.
I'm quite sure now, that I was mistaken. It is not UTF-8 support that
is missing on the PCs, but rather that the file systems are using BIG5
by default, what is of course something quite different. Sorry for the
incorrect information.
> Windows has been supporting Unicode, and came pre-installed with Unicode-
> capable fonts since Windows NT (and so, Windows 2000). Did not Windows XP
> support end last year?
According to <http://www.microsoft.com/en-us/windows/endofsupport.aspx>
support of Windows XP SP3 ends on April 8, 2014.
>> OTOH restricting oneself to ASCII on Usenet reminds me of offering full
>> support for IE 6 on the Web (e.g. by using GIFs instead PNGs when
>> transparency is required), which IMO holds back reasonable innovation.
>
> ACK.
>
>>> Thus, a major criticism of PHP is that as of version 5.4 it still has no
>>> native Unicode support, while other popular programming languages on the
>>> Web, like ECMAScript implementations and Python, have.
>>
>> Indeed, that's a shame and rather painful for developers. And if I'm
>> not mistaken, the situation won't change with PHP 5.5. :(
>
> That would be a pity. Native Unicode support was announced for PHP 6 before
> those plans were abandoned. I wonder, what can be that hard in implementing
> it? Apparently even Perl managed the transition years ago although it
> requires a few extra lines per script.
Native Unicode support was planned to be resp. already partially
implemented by using UTF-16 internally throughout. It seems, that
several PHP developers had concerns about the performance and memory
overhead of doing so. Obviously no agreement could be found, and on
March 11, 2010 Rasmus Lerdorf `decided' to delay PHP 6 and the Unicode
support.[1]
Of course there are several extensions handling UTF-8 quite well, but
these may not be available on shared web hosting services, and anyway
it's quite ugly to have e.g. strlen('€')===3 when dealing with UTF-8.
[1] <http://news.php.net/php.internals/47120>
--
Christoph M. Becker
|
|
|
Re: Unicode support [message #180977 is a reply to message #180976] |
Sat, 30 March 2013 13:25 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Christoph Becker wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Christoph Becker wrote:
>>> I recently read, that UTF-8 is not available on many Windows PCs in
>>> Taiwan, where `BIG5' is still prevalent, what would be very unfortunate.
>>
>> I would like to see proof of that. While possible, I consider it
>> unlikely. Taiwan is especially intertwined with the Western world (being
>> the location of major local hardware manufacturers, and major foreign
>> investments on the doorstep to mainland China), and Traditional Chinese
>> as written in Taiwan was one of the first scripts to be covered by
>> Unicode, with CJK Unified Ideographs, in 1992.
>
> I'm quite sure now, that I was mistaken. It is not UTF-8 support that
> is missing on the PCs, but rather that the file systems are using BIG5
> by default,
But not NTFS, I presume?
> what is of course something quite different. Sorry for the incorrect
> information.
Never mind.
>> Windows has been supporting Unicode, and came pre-installed with Unicode-
>> capable fonts since Windows NT (and so, Windows 2000). Did not Windows
>> XP support end last year?
>
> According to <http://www.microsoft.com/en-us/windows/endofsupport.aspx>
> support of Windows XP SP3 ends on April 8, 2014.
That is good to know. However, it should be noted that (at Microsoft) there
are different levels of support, and there are different versions of Windows
XP. Mainstream Support and Extended Support for some versions of Windows XP
has ended already:
< http://support.microsoft.com/lifecycle/search/default.aspx?sort=PN&alph a=Windows+XP&Filter=FilterMS&msdate=0>
We are now looking forward to the date that *all* support will be
discontinued for *all* versions of Windows XP (as there will be no more
Service Packs).
>>> […] if I'm not mistaken, the situation won't change with PHP 5.5. :(
>>
>> That would be a pity. Native Unicode support was announced for PHP 6
>> before those plans were abandoned. I wonder, what can be that hard in
>> implementing it? Apparently even Perl managed the transition years ago
>> although it requires a few extra lines per script.
>
> Native Unicode support was planned to be resp. already partially
> implemented by using UTF-16 internally throughout. It seems, that
> several PHP developers had concerns about the performance and memory
> overhead of doing so. Obviously no agreement could be found, and on
> March 11, 2010 Rasmus Lerdorf `decided' to delay PHP 6 and the Unicode
> support.[1]
Thanks. ISTM that PHP developers are struggling with the same issues as
Python developers did once. UTF-16 is considerably large, but string
operations (like determining string length) are fast within the BMP. UTF-8
is small for low code points, but harder to deal with. UTF-32 covers more
than the BMP, and would allow fast operations, but is wasteful of precious
memory. (As of Python 3.3, Python uses an encoding that attempts to get the
best of both worlds.)
Still, a UTF-8 pragma like in Perl would not be such a bad idea for PHP.
Only those who need it would use it.
> Of course there are several extensions handling UTF-8 quite well, but
> these may not be available on shared web hosting services, and anyway
> it's quite ugly to have e.g. strlen('€')===3 when dealing with UTF-8.
The same issue exists with characters outside the BMP in ECMAScript
implementations which uses 16-bit characters (usually one UTF-16 code unit
per character). But you can work around that rather efficiently.
> [1] <http://news.php.net/php.internals/47120>
I will read that later.
PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann
|
|
|
Re: Unicode support [message #180978 is a reply to message #180976] |
Sat, 30 March 2013 14:29 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 30/03/13 12:57, Christoph Becker wrote:
> it's quite ugly to have e.g. strlen('€')===3 when dealing with UTF-8.
>
Its has been noted that the Euro breaks almost everything it touches:-)
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180979 is a reply to message #180977] |
Sat, 30 March 2013 14:33 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
> The same issue exists with characters outside the BMP in ECMAScript
> implementations which uses 16-bit characters (usually one UTF-16 code unit
> per character). But you can work around that rather efficiently.
>
The problem become 'what do you mean by strlen()' - the space the
characters will occupy in an constant width font, or the storage
allocated to the string?
Mostly we are concerned with the latter.
Because lack of precision in font reproduction, or even in guaranteeing
which font may be selected, renders the former an 'open' question.
strlen('€')===3 is in fact the correct answer.
>> [1] <http://news.php.net/php.internals/47120>
>
> I will read that later.
>
>
> PointedEars
>
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180980 is a reply to message #180979] |
Sat, 30 March 2013 14:55 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
The Natural Philosopher wrote:
> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>
>> The same issue exists with characters outside the BMP in ECMAScript
>> implementations which uses 16-bit characters (usually one UTF-16 code
>> unit
>> per character). But you can work around that rather efficiently.
>>
>
> The problem become 'what do you mean by strlen()' - the space the
> characters will occupy in an constant width font, or the storage
> allocated to the string?
>
> Mostly we are concerned with the latter.
I am more concerned about the number of characters the string holds.
Say, I want to get the last character:
$str = '€';
echo $str[2];
> Because lack of precision in font reproduction, or even in guaranteeing
> which font may be selected, renders the former an 'open' question.
>
> strlen('€')===3 is in fact the correct answer.
I suppose most *higher level languages* define the length of a string as
the number of characters the string holds. Cf. ECMAScript's length
property and TCL's [string length]. Even PHP's mb_strlen() returns the
number of characters.
C is quite a different thing.
--
Christoph M. Becker
|
|
|
Re: Unicode support [message #180981 is a reply to message #180980] |
Sat, 30 March 2013 18:04 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 30/03/13 14:55, Christoph Becker wrote:
> The Natural Philosopher wrote:
>
>> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>
>>> The same issue exists with characters outside the BMP in ECMAScript
>>> implementations which uses 16-bit characters (usually one UTF-16 code
>>> unit
>>> per character). But you can work around that rather efficiently.
>>>
>>
>> The problem become 'what do you mean by strlen()' - the space the
>> characters will occupy in an constant width font, or the storage
>> allocated to the string?
>>
>> Mostly we are concerned with the latter.
>
> I am more concerned about the number of characters the string holds.
> Say, I want to get the last character:
>
> $str = '€';
> echo $str[2];
>
>> Because lack of precision in font reproduction, or even in guaranteeing
>> which font may be selected, renders the former an 'open' question.
>>
>> strlen('€')===3 is in fact the correct answer.
>
> I suppose most *higher level languages* define the length of a string as
> the number of characters the string holds. Cf. ECMAScript's length
> property and TCL's [string length]. Even PHP's mb_strlen() returns the
> number of characters.
>
so what happens in a typographic ligature like 'ᴁ'?
I think you are making a rod for your back here.
The storage requirements are exact specific and useful.
The concept of a 'character in a text string' is really not..and if you
go deep into typography with kerning, leading,ligature and the like and
the like you will understand why.
Not that I am any expert. I just know enough to now there's a minefield
out there.
> C is quite a different thing.
>
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180982 is a reply to message #180980] |
Sat, 30 March 2013 19:18 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Christoph Becker wrote:
> I am more concerned about the number of characters the string holds.
> Say, I want to get the last character:
>
> $str = '€';
> echo $str[2];
>
>> Because lack of precision in font reproduction, or even in guaranteeing
>> which font may be selected, renders the former an 'open' question.
>>
>> strlen('€')===3 is in fact the correct answer.
That depends on the character encoding of the source code. 3 is the correct
answer for an *UTF-8*-encoded U+20AC EURO SIGN character, which is then
encoded E2 82 AC. [1]
> I suppose most *higher level languages* define the length of a string as
> the number of characters the string holds. Cf. ECMAScript's length
> property and TCL's [string length]. Even PHP's mb_strlen() returns the
> number of characters.
AISB, unfortunately the “length” property of ECMAScript String instances is
not a good example in that regard as it does _not_ mean the number of
characters but the number of 16-bit units (UTF-16 code units, usually).
That is only less obvious than with UTF-8 because all Unicode characters in
the BMP can be encoded using only one such unit. (I am working on a
workaround; you could call it “an mb_strlen() for ECMAScript
implementations”.)
PointedEars
___________
[1] <http://www.rishida.net/tools/conversion/>
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300dec7(at)news(dot)demon(dot)co(dot)uk> (2004)
|
|
|
Re: Unicode support [message #180983 is a reply to message #180982] |
Sat, 30 March 2013 20:05 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
Thomas 'PointedEars' Lahn wrote:
> Christoph Becker wrote:
>
>> I am more concerned about the number of characters the string holds.
>> Say, I want to get the last character:
>>
>> $str = '€';
>> echo $str[2];
>>
>>> Because lack of precision in font reproduction, or even in guaranteeing
>>> which font may be selected, renders the former an 'open' question.
>>>
>>> strlen('€')===3 is in fact the correct answer.
>
> That depends on the character encoding of the source code. 3 is the correct
> answer for an *UTF-8*-encoded U+20AC EURO SIGN character, which is then
> encoded E2 82 AC. [1]
It is the correct answer for an UTF-8 encoded string, as strlen()
returns the length of string in bytes. But I doubt, that this is
usually expected, and IMO knowing the number of characters is more
useful for common *PHP* as opposed to low-level C programming.
>> I suppose most *higher level languages* define the length of a string as
>> the number of characters the string holds. Cf. ECMAScript's length
>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>> number of characters.
>
> AISB, unfortunately the “length” property of ECMAScript String instances is
> not a good example in that regard as it does _not_ mean the number of
> characters but the number of 16-bit units (UTF-16 code units, usually).
> That is only less obvious than with UTF-8 because all Unicode characters in
> the BMP can be encoded using only one such unit. (I am working on a
> workaround; you could call it “an mb_strlen() for ECMAScript
> implementations”.)
ECMA-262, Edition 5.1, Section 15.5.5.1, states:
| *length*
|
| The number of characters in the String value represented by this
| String object.
IMO the spec is quite clear here, and so the implementations might be in
error. Anyway, I appreciate your working on a workaround; may come in
handy. :)
--
Christoph M. Becker
|
|
|
Re: Unicode support [message #180984 is a reply to message #180983] |
Sat, 30 March 2013 20:19 |
Thomas 'PointedEars'
Messages: 701 Registered: October 2010
Karma: 0
|
Senior Member |
|
|
Christoph Becker wrote:
> Thomas 'PointedEars' Lahn wrote:
>> Christoph Becker wrote:
>>> I suppose most *higher level languages* define the length of a string as
>>> the number of characters the string holds. Cf. ECMAScript's length
>>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>>> number of characters.
>>
>> AISB, unfortunately the “length” property of ECMAScript String instances
>> is not a good example in that regard as it does _not_ mean the number of
>> characters but the number of 16-bit units (UTF-16 code units, usually).
>> That is only less obvious than with UTF-8 because all Unicode characters
>> in the BMP can be encoded using only one such unit. (I am working on a
“BMP” means “Basic Multilingual Plane” (U+0000 to U+FFFF).
>> workaround; you could call it “an mb_strlen() for ECMAScript
>> implementations”.)
>
> ECMA-262, Edition 5.1, Section 15.5.5.1, states:
>
> | *length*
> |
> | The number of characters in the String value represented by this
> | String object.
>
> IMO the spec is quite clear here,
It is.
> and so the implementations might be in error.
For a change, it is not. You should look up the definition of “character”
in the Specification. See also <news:2202673(dot)rtQqbKup0V(at)PointedEars(dot)de> pp.
(yes, in comp.lang.python).
> Anyway, I appreciate your working on a workaround; may come in handy. :)
At the latest after first contact ;-)
You will find it in
< http://PointEdears.de/websvn/listing.php?repname=JSX&path=%2Ftrunk%2Fst ring%2F>
when it is ready for testing.
F'up2 comp.lang.javascript
PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
|
|
|
Re: Unicode support [message #180985 is a reply to message #180981] |
Sat, 30 March 2013 20:26 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
The Natural Philosopher wrote:
> so what happens in a typographic ligature like 'ᴁ'?
>
> I think you are making a rod for your back here.
>
> The storage requirements are exact specific and useful.
Useful for what purpose?
> The concept of a 'character in a text string' is really not..and if you
> go deep into typography with kerning, leading,ligature and the like and
> the like you will understand why.
I'm not talking about typesetting. I'm talking about Unicode
characters, which are clearly defined. Of course there are issues
regarding combining diacritical marks, so the number of characters of
"ñ" is not necessarily 1, but that's another story.
--
Christoph M. Becker
|
|
|
Re: Unicode support [message #180986 is a reply to message #180985] |
Sat, 30 March 2013 21:26 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 30/03/13 20:26, Christoph Becker wrote:
> The Natural Philosopher wrote:
>
>> so what happens in a typographic ligature like 'ᴁ'?
>>
>> I think you are making a rod for your back here.
>>
>> The storage requirements are exact specific and useful.
>
> Useful for what purpose?
well malloc(strlen(str)) in C for a start, and picking single bytes out
of strings too.
>
>> The concept of a 'character in a text string' is really not..and if you
>> go deep into typography with kerning, leading,ligature and the like and
>> the like you will understand why.
>
> I'm not talking about typesetting. I'm talking about Unicode
> characters, which are clearly defined. Of course there are issues
> regarding combining diacritical marks, so the number of characters of
> "ñ" is not necessarily 1, but that's another story.
>
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180987 is a reply to message #180986] |
Sat, 30 March 2013 22:14 |
Christoph Becker
Messages: 91 Registered: June 2012
Karma: 0
|
Member |
|
|
The Natural Philosopher wrote:
> On 30/03/13 20:26, Christoph Becker wrote:
>> The Natural Philosopher wrote:
>>
>>> so what happens in a typographic ligature like 'ᴁ'?
>>>
>>> I think you are making a rod for your back here.
>>>
>>> The storage requirements are exact specific and useful.
>>
>> Useful for what purpose?
>
> well malloc(strlen(str)) in C for a start, and picking single bytes out
> of strings too.
C's strlen() returning the number of bytes is perfectly fine. But in
PHP I don't need malloc(), and I have not yet found a reason to pick out
single bytes out of a *character* string (as opposed to a *byte* string,
which are different concepts IMO), except for emulating some UTF-8 aware
functionality.
Consider that programming in PHP happens on a higher level than in C
(e.g. one doesn't need to allocate heap memory, nor does one have to
free it). This results in the ability to write programs faster, that
run slower. But the fact, that PHP's string functions work on bytes
(opposed to characters), partially annihilates the advantage[1]. That
is particularly sad, as PHP has its strengths for web development and
UTF-8 is the de facto standard for the web.
[1] Say, I need a wordwrap() for sending UTF-8 encoded emails. Sigh...
BTW: <http://www.php.net/manual/en/function.wordwrap.php> says:
| wordwrap — Wraps a string to a given number of characters
what's obviously wrong; it wraps a string to a given number of bytes.
--
Christoph M. Becker
|
|
|
Re: Unicode support [message #180988 is a reply to message #180987] |
Sat, 30 March 2013 22:29 |
The Natural Philosoph
Messages: 993 Registered: September 2010
Karma: 0
|
Senior Member |
|
|
On 30/03/13 22:14, Christoph Becker wrote:
> The Natural Philosopher wrote:
>
>> On 30/03/13 20:26, Christoph Becker wrote:
>>> The Natural Philosopher wrote:
>>>
>>>> so what happens in a typographic ligature like 'ᴁ'?
>>>>
>>>> I think you are making a rod for your back here.
>>>>
>>>> The storage requirements are exact specific and useful.
>>>
>>> Useful for what purpose?
>>
>> well malloc(strlen(str)) in C for a start, and picking single bytes out
>> of strings too.
>
> C's strlen() returning the number of bytes is perfectly fine. But in
> PHP I don't need malloc(), and I have not yet found a reason to pick out
> single bytes out of a *character* string (as opposed to a *byte* string,
> which are different concepts IMO), except for emulating some UTF-8 aware
> functionality.
>
> Consider that programming in PHP happens on a higher level than in C
> (e.g. one doesn't need to allocate heap memory, nor does one have to
> free it). This results in the ability to write programs faster, that
> run slower. But the fact, that PHP's string functions work on bytes
> (opposed to characters), partially annihilates the advantage[1]. That
> is particularly sad, as PHP has its strengths for web development and
> UTF-8 is the de facto standard for the web.
>
> [1] Say, I need a wordwrap() for sending UTF-8 encoded emails. Sigh...
> BTW: <http://www.php.net/manual/en/function.wordwrap.php> says:
>
> | wordwrap — Wraps a string to a given number of characters
>
> what's obviously wrong; it wraps a string to a given number of bytes.
>
well exactly. Which is why I do all serious programming in C - it may be
a pedantic bitch of a language, but at least when its done, I know that
some quirk of it is not going to deliver an entirely unexpected result.
Like wot happened to me in Javashite, where some comparisons of what I
thought were numbers, was cast to strings by one implementation and
numbers by the other..
How I longed for if(int(a)==(int(b)) or even atoi(b)....
There are to approaches to life: make it easy for idiots and wonder why
they still manage to fuck it up, or make the rules extremely clear and
teach people what they are. And don't let them loose until they have
learnt them
If all the money spent on road signs traffic lights and white lines had
been spent on teaching people to drive...
--
Ineptocracy
(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
|
|
|
Re: Unicode support [message #180991 is a reply to message #180980] |
Sun, 31 March 2013 12:06 |
M. Strobel
Messages: 386 Registered: December 2011
Karma: 0
|
Senior Member |
|
|
Am 30.03.2013 15:55, schrieb Christoph Becker:
> The Natural Philosopher wrote:
>
>> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>
>>> The same issue exists with characters outside the BMP in ECMAScript
>>> implementations which uses 16-bit characters (usually one UTF-16 code
>>> unit
>>> per character). But you can work around that rather efficiently.
>>>
>>
>> The problem become 'what do you mean by strlen()' - the space the
>> characters will occupy in an constant width font, or the storage
>> allocated to the string?
>>
>> Mostly we are concerned with the latter.
>
> I am more concerned about the number of characters the string holds.
> Say, I want to get the last character:
>
> $str = '€';
> echo $str[2];
>
>> Because lack of precision in font reproduction, or even in guaranteeing
>> which font may be selected, renders the former an 'open' question.
>>
>> strlen('€')===3 is in fact the correct answer.
>
> I suppose most *higher level languages* define the length of a string as
> the number of characters the string holds. Cf. ECMAScript's length
> property and TCL's [string length]. Even PHP's mb_strlen() returns the
> number of characters.
>
TCL has the functions [string length] and [string bytelength], with the obvious
differences.
/Str.
|
|
|