FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » mysql dynamic binding and pass-by-ref deprecated
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: Unicode support [message #180994 is a reply to message #180993] Sun, 31 March 2013 17:12 Go to previous messageGo to previous message
M. Strobel is currently offline  M. Strobel
Messages: 386
Registered: December 2011
Karma:
Senior Member
Am 31.03.2013 14:41, schrieb The Natural Philosopher:
> On 31/03/13 13:10, M. Strobel wrote:
>> Am 30.03.2013 19:04, schrieb The Natural Philosopher:
>>> On 30/03/13 14:55, Christoph Becker wrote:
>>>> The Natural Philosopher wrote:
>>>>
>>>> > On 30/03/13 13:25, Thomas 'PointedEars' Lahn wrote:
>>>> >
>>>> >> The same issue exists with characters outside the BMP in ECMAScript
>>>> >> implementations which uses 16-bit characters (usually one UTF-16 code
>>>> >> unit
>>>> >> per character). But you can work around that rather efficiently.
>>>> >>
>>>> >
>>>> > The problem become 'what do you mean by strlen()' - the space the
>>>> > characters will occupy in an constant width font, or the storage
>>>> > allocated to the string?
>>>> >
>>>> > Mostly we are concerned with the latter.
>>>>
>>>> I am more concerned about the number of characters the string holds.
>>>> Say, I want to get the last character:
>>>>
>>>> $str = '€';
>>>> echo $str[2];
>>>>
>>>> > Because lack of precision in font reproduction, or even in guaranteeing
>>>> > which font may be selected, renders the former an 'open' question.
>>>> >
>>>> > strlen('€')===3 is in fact the correct answer.
>>>>
>>>> I suppose most *higher level languages* define the length of a string as
>>>> the number of characters the string holds. Cf. ECMAScript's length
>>>> property and TCL's [string length]. Even PHP's mb_strlen() returns the
>>>> number of characters.
>>>>
>>>
>>> so what happens in a typographic ligature like 'ᴁ'?
>>>
>>> I think you are making a rod for your back here.
>>>
>>> The storage requirements are exact specific and useful.
>>>
>>> The concept of a 'character in a text string' is really not..and if you go deep into
>>> typography with kerning, leading,ligature and the like and the like you will
>>> understand why.
>>>
>>
>> Of course you need both, the storage requirements, and direct access to characters.
>> Maybe programming languages should use internally full 32 bit per char, or compress
>> the unicode string using a good library for access.
>>
>> C does not even know strings, just byte arrays
>>
> Thereby avoiding the problems completely by not even pretending to solve them.
>
> And you can always wrote a unicode_strlen() to any specification you want..the
> problem is..
>
> ...given that many 'characters' may take LESS than a byte (ligature) or up to 3-4
> bytes (unicode character sets).... what specification?
>

So what? How much space does a backspace take? And how much a DEL (127)? This has not
been a problem so far.

And a ligature can be decomposed. Of course every case has to be discussed, but this
is done AFAIK.

/Str.
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Need Forex Feed in PHP
Next Topic: can't get includes to load
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Wed Nov 27 19:26:38 GMT 2024

Total time taken to generate the page: 0.04554 seconds