FUDforum: comp.lang.php » mysql dynamic binding and pass-by-ref deprecated

Home » Imported messages » comp.lang.php » mysql dynamic binding and pass-by-ref deprecated

Show: Today's Messages :: Polls :: Message Navigator

Re: Unicode support [message #180992 is a reply to message #180981]

Sun, 31 March 2013 12:10

M. Strobel
Messages: 386
Registered: December 2011

Karma: 0

Senior Member

Am 30.03.2013 19:04, schrieb The Natural Philosopher:
> On 30/03/13 14:55, Christoph Becker wrote:
>> The Natural Philosopher wrote:
>>
>>> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>>
>>>> The same issue exists with characters outside the BMP in ECMAScript
>>>> implementations which uses 16-bit characters (usually one UTF-16 code
>>>> unit
>>>> per character). But you can work around that rather efficiently.
>>>>
>>>
>>> The problem become 'what do you mean by strlen()' - the space the
>>> characters will occupy in an constant width font, or the storage
>>> allocated to the string?
>>>
>>> Mostly we are concerned with the latter.
>>
>> I am more concerned about the number of characters the string holds.
>> Say, I want to get the last character:
>>
>> $str = '€';
>> echo $str[2];
>>
>>> Because lack of precision in font reproduction, or even in guaranteeing
>>> which font may be selected, renders the former an 'open' question.
>>>
>>> strlen('€')===3 is in fact the correct answer.
>>
>> I suppose most *higher level languages* define the length of a string as
>> the number of characters the string holds. Cf. ECMAScript's length
>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>> number of characters.
>>
>
> so what happens in a typographic ligature like 'ᴁ'?
>
> I think you are making a rod for your back here.
>
> The storage requirements are exact specific and useful.
>
> The concept of a 'character in a text string' is really not..and if you go deep into
> typography with kerning, leading,ligature and the like and the like you will
> understand why.
>

Of course you need both, the storage requirements, and direct access to characters.
Maybe programming languages should use internally full 32 bit per char, or compress
the unicode string using a good library for access.

C does not even know strings, just byte arrays

/Str.

Report message to a moderator

Re: Unicode support [message #180993 is a reply to message #180992]

Sun, 31 March 2013 12:41

The Natural Philosoph
Messages: 993
Registered: September 2010

Karma: 0

Senior Member

On 31/03/13 13:10, M. Strobel wrote:
> Am 30.03.2013 19:04, schrieb The Natural Philosopher:
>> On 30/03/13 14:55, Christoph Becker wrote:
>>> The Natural Philosopher wrote:
>>>
>>>> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>>>
>>>> > The same issue exists with characters outside the BMP in ECMAScript
>>>> > implementations which uses 16-bit characters (usually one UTF-16 code
>>>> > unit
>>>> > per character). But you can work around that rather efficiently.
>>>> >
>>>>
>>>> The problem become 'what do you mean by strlen()' - the space the
>>>> characters will occupy in an constant width font, or the storage
>>>> allocated to the string?
>>>>
>>>> Mostly we are concerned with the latter.
>>>
>>> I am more concerned about the number of characters the string holds.
>>> Say, I want to get the last character:
>>>
>>> $str = '€';
>>> echo $str[2];
>>>
>>>> Because lack of precision in font reproduction, or even in guaranteeing
>>>> which font may be selected, renders the former an 'open' question.
>>>>
>>>> strlen('€')===3 is in fact the correct answer.
>>>
>>> I suppose most *higher level languages* define the length of a string as
>>> the number of characters the string holds. Cf. ECMAScript's length
>>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>>> number of characters.
>>>
>>
>> so what happens in a typographic ligature like 'ᴁ'?
>>
>> I think you are making a rod for your back here.
>>
>> The storage requirements are exact specific and useful.
>>
>> The concept of a 'character in a text string' is really not..and if you go deep into
>> typography with kerning, leading,ligature and the like and the like you will
>> understand why.
>>
>
> Of course you need both, the storage requirements, and direct access to characters.
> Maybe programming languages should use internally full 32 bit per char, or compress
> the unicode string using a good library for access.
>
> C does not even know strings, just byte arrays
>
Thereby avoiding the problems completely by not even pretending to
solve them.

And you can always wrote a unicode_strlen() to any specification you
want..the problem is..

....given that many 'characters' may take LESS than a byte (ligature) or
up to 3-4 bytes (unicode character sets).... what specification?

and the concept of a 'character' is practically valueless anyway..,

> /Str.
>
>

--
Ineptocracy

(in-ep-toc’-ra-cy) – a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.

Report message to a moderator

Re: Unicode support [message #180994 is a reply to message #180993]

Sun, 31 March 2013 17:12

M. Strobel
Messages: 386
Registered: December 2011

Karma: 0

Senior Member

Am 31.03.2013 14:41, schrieb The Natural Philosopher:
> On 31/03/13 13:10, M. Strobel wrote:
>> Am 30.03.2013 19:04, schrieb The Natural Philosopher:
>>> On 30/03/13 14:55, Christoph Becker wrote:
>>>> The Natural Philosopher wrote:
>>>>
>>>> > On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>>> >
>>>> >> The same issue exists with characters outside the BMP in ECMAScript
>>>> >> implementations which uses 16-bit characters (usually one UTF-16 code
>>>> >> unit
>>>> >> per character). But you can work around that rather efficiently.
>>>> >>
>>>> >
>>>> > The problem become 'what do you mean by strlen()' - the space the
>>>> > characters will occupy in an constant width font, or the storage
>>>> > allocated to the string?
>>>> >
>>>> > Mostly we are concerned with the latter.
>>>>
>>>> I am more concerned about the number of characters the string holds.
>>>> Say, I want to get the last character:
>>>>
>>>> $str = '€';
>>>> echo $str[2];
>>>>
>>>> > Because lack of precision in font reproduction, or even in guaranteeing
>>>> > which font may be selected, renders the former an 'open' question.
>>>> >
>>>> > strlen('€')===3 is in fact the correct answer.
>>>>
>>>> I suppose most *higher level languages* define the length of a string as
>>>> the number of characters the string holds. Cf. ECMAScript's length
>>>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>>>> number of characters.
>>>>
>>>
>>> so what happens in a typographic ligature like 'ᴁ'?
>>>
>>> I think you are making a rod for your back here.
>>>
>>> The storage requirements are exact specific and useful.
>>>
>>> The concept of a 'character in a text string' is really not..and if you go deep into
>>> typography with kerning, leading,ligature and the like and the like you will
>>> understand why.
>>>
>>
>> Of course you need both, the storage requirements, and direct access to characters.
>> Maybe programming languages should use internally full 32 bit per char, or compress
>> the unicode string using a good library for access.
>>
>> C does not even know strings, just byte arrays
>>
> Thereby avoiding the problems completely by not even pretending to solve them.
>
> And you can always wrote a unicode_strlen() to any specification you want..the
> problem is..
>
> ...given that many 'characters' may take LESS than a byte (ligature) or up to 3-4
> bytes (unicode character sets).... what specification?
>

So what? How much space does a backspace take? And how much a DEL (127)? This has not
been a problem so far.

And a ligature can be decomposed. Of course every case has to be discussed, but this
is done AFAIK.

/Str.

Report message to a moderator

Re: Unicode support [message #180995 is a reply to message #180979]

Tue, 02 April 2013 17:59

M. Strobel
Messages: 386
Registered: December 2011

Karma: 0

Senior Member

Am 30.03.2013 15:33, schrieb The Natural Philosopher:
> On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>
>> The same issue exists with characters outside the BMP in ECMAScript
>> implementations which uses 16-bit characters (usually one UTF-16 code unit
>> per character). But you can work around that rather efficiently.
>>
>
> The problem become 'what do you mean by strlen()' - the space the characters will
> occupy in an constant width font, or the storage allocated to the string?
>
> Mostly we are concerned with the latter.
>
> Because lack of precision in font reproduction, or even in guaranteeing which font
> may be selected, renders the former an 'open' question.
>
> strlen('€')===3 is in fact the correct answer.
>

strobel@suse123-acer:~> tclshi
% set eur "€"
€
% string length $eur
1
% string bytelength $eur
3
%

Languages have to catch up.

/Str.

Report message to a moderator

Pages (2): [ « ‹ 1 2]

Previous Topic:	Need Forex Feed in PHP
Next Topic:	can't get includes to load

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 10 18:25:53 GMT 2025

Total time taken to generate the page: 0.02110 seconds