FUDforum: comp.lang.php » Processing accented characters submitted from forms

Home » Imported messages » comp.lang.php » Processing accented characters submitted from forms

Show: Today's Messages :: Polls :: Message Navigator

Re: Processing accented characters submitted from forms [message #184498 is a reply to message #184490]

Fri, 03 January 2014 14:30

Ben Bacarisse
Messages: 82
Registered: November 2013

Karma:

Member

JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes:

> On Fri, 03 Jan 2014 12:37:27 +0000, Ben Bacarisse wrote:
>
>> JohnT <john-sospam(at)jtresponse(dot)co(dot)uk> writes: <snip>
>>> We're already using iso-8859-1 for the whole website. It will be a lot
>>> of work to change all that, so I guess we'll have to put up with the
>>> odd Turkish I causing problems.
>>
>> It's not clear (to me at least) what's happening to the data, but as far
>> as any normal set of HTML pages are concerned (PHP generated or
>> otherwise) you don't have to put up with a dotted I causing problems on
>> an ISO-8859-1 encoded page. You can represent any Unicode character in
>> a page using character entities (browser and font support is always and
>> issue but not nowadays for anything as ordinary as İ).
>
> I think it must be the browser that is encoding the character because İ
> is not supported by iso-8859-1.

Note that the browser behaviour can be altered by form attributes
(specifically accept-charset). You can have a form that accepts UTF-8
on an ISO-8859-1 served page.

> It arrives in the request data as the html numeric entity code, as that
> is the only way it can be transmitted.
>
> This causes issues:
>
> As I always htmlencode user entered data before display, it means that it
> gets encoded twice. I'll have to add the 'disable double encode' flag
> thoughout my code :-)

Sure. One way or another you need to get the right encoding. This
method is not perfect since a user typing İ into a form may not
expect a dotted I to come out.

The best method is (probably) to:
(a) Give UTF-8 as the form's accept-charset.
(b) Encode htmlentities giving UTF-8 as the encoding. This should leave
the UTF-8 characters as UTF-8.
(c) Use mb_convert_encoding($etext, "HTML-ENTITIES", "UTF-8") to make
the string displayable in a page regardless of the page's character
encoding.

> Secondly, it will be added to the database as the entity code, so this
> will break searching the database etc...

If you take the approach of accepting UTF-8 from the form, you can put
that directly into the database.

> I think the proper fix would would be to convert to UTF-8.
> But thats a lot of work. For now, I think I'll just manually translit the
> codes that cause issues.

You really only need UTF-8 in the database. The page encoding is not
that important.

--
Ben.

Report message to a moderator

[Message index]

		Processing accented characters submitted from forms By: JohnT on Thu, 02 January 2014 15:55
		Re: Processing accented characters submitted from forms By: J.O. Aho on Thu, 02 January 2014 17:15
		Re: Processing accented characters submitted from forms By: JohnT on Fri, 03 January 2014 11:37
		Re: Processing accented characters submitted from forms By: Ben Bacarisse on Fri, 03 January 2014 12:37
		Re: Processing accented characters submitted from forms By: Tim Streater on Fri, 03 January 2014 12:52
		Re: Processing accented characters submitted from forms By: JohnT on Fri, 03 January 2014 13:00
		Re: Processing accented characters submitted from forms By: Jerry Stuckle on Fri, 03 January 2014 15:09
		Re: Processing accented characters submitted from forms By: J.O. Aho on Fri, 03 January 2014 17:57
		Re: Processing accented characters submitted from forms By: JohnT on Fri, 03 January 2014 12:53
		Re: Processing accented characters submitted from forms By: Ben Bacarisse on Fri, 03 January 2014 14:30
		Re: Processing accented characters submitted from forms By: Jerry Stuckle on Fri, 03 January 2014 15:11
		Re: Processing accented characters submitted from forms By: Ben Bacarisse on Fri, 03 January 2014 20:28
		Re: Processing accented characters submitted from forms By: Thomas 'PointedEars' on Fri, 03 January 2014 21:54
		Re: Processing accented characters submitted from forms By: Jerry Stuckle on Sat, 04 January 2014 00:59
		Re: Processing accented characters submitted from forms By: Christoph Michael Bec on Thu, 02 January 2014 19:52
		Re: Processing accented characters submitted from forms By: JohnT on Fri, 03 January 2014 11:40
		Re: Processing accented characters submitted from forms By: Thomas 'PointedEars' on Fri, 03 January 2014 12:53
		Re: Processing accented characters submitted from forms By: JohnT on Fri, 03 January 2014 13:05
		Re: Processing accented characters submitted from forms By: Thomas 'PointedEars' on Fri, 03 January 2014 15:03
		Re: Processing accented characters submitted from forms By: Thomas 'PointedEars' on Fri, 03 January 2014 16:08
		Re: Processing accented characters submitted from forms By: Tim Streater on Fri, 03 January 2014 21:58
		Re: Processing accented characters submitted from forms By: Ben Bacarisse on Fri, 03 January 2014 22:01

Previous Topic:	ORMs comparisons/complaints.
Next Topic:	thank you, richard@noreply

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Mon Nov 25 03:31:54 GMT 2024

Total time taken to generate the page: 0.03693 seconds