FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » Encoding Problems
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
Encoding Problems [message #186334] Wed, 02 July 2014 15:53 Go to next message
stef_204 is currently offline  stef_204
Messages: 1
Registered: July 2014
Karma: 0
Junior Member
Hi,

Newbie at php; please bear with me.

I am trying to use a script which basically creates an xml feed for email
messages found in an IMAP account; one can then either subscribe to feed
via an rss reader/aggregator or use browser to read html page.

Here is the script: <http://linuxtrove.com/wp/?p=209>
(imap2rss.php)

My problem: almost all of the messages (emails) show garbled text, I
believe due to encoding problems.

Here is a picture of how it looks both using a browser like Firefox or an
rss news feed reader.

<http://imagebin.org/314843>

The emails are legible using an email client but not via the xml and html
page created by php script.

This is what I first thought:
It looks like the emails are plain text but using utf-8 and when the feed
is created (and the html page); there is no recognition of the encoding
and it all gets garbled, etc.

Server (shared) is running PHP 5.2.17 on Apache/2.2.22.
PHP config --> default_charset: iso-8859-1

I cannot change the config on the server since it is shared hosting but I
can certainly modify the script to hopefully fix this issue.

I have tried to insert in the script:
ini_set( 'default_charset', 'UTF-8' );
htmlentities( $string, ENT_COMPAT, "UTF-8" );
header('Content-type: text/plain; charset=utf-8');
but no joy.

I have also tried to add to .htaccess IndexOptions +Charset=UTF-8 No joy
there either.

I could be completely wrong about the utf-8 issue; and perhaps it has
more something to do with decoding base64 in general.

This is what I now think is the problem, more specifically:
I have just tried a base64 online decoder and pasted the garbled text in
it and used the "decode" online feature, and the result is perfectly
legible once decoded, whether I choose utf-8 or ascii as charset (but I
should use utf-8).

So, looks like the feed is "echoed" or "printed" in base64 format....
It doesn't look like charset is the problem but decoding base64.

I can see the base64_decode function here
<http://www.php.net/manual/en/function.base64-decode.php>
but not sure if this is the right way to go about this; or how to apply
it in this script.

I am going about this fairly blind, I must say, and doing trial and error
which is just wrong....

Again, this is the script.
<http://linuxtrove.com/wp/?p=209>

Any pointers?

Tx.
Re: Encoding Problems [message #186335 is a reply to message #186334] Wed, 02 July 2014 19:15 Go to previous messageGo to next message
Jerry Stuckle is currently offline  Jerry Stuckle
Messages: 2598
Registered: September 2010
Karma: 0
Senior Member
On 7/2/2014 11:53 AM, stef_204 wrote:
> Hi,
>
> Newbie at php; please bear with me.
>
> I am trying to use a script which basically creates an xml feed for email
> messages found in an IMAP account; one can then either subscribe to feed
> via an rss reader/aggregator or use browser to read html page.
>
> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.
>
> Here is a picture of how it looks both using a browser like Firefox or an
> rss news feed reader.
>
> <http://imagebin.org/314843>
>
> The emails are legible using an email client but not via the xml and html
> page created by php script.
>
> This is what I first thought:
> It looks like the emails are plain text but using utf-8 and when the feed
> is created (and the html page); there is no recognition of the encoding
> and it all gets garbled, etc.
>
> Server (shared) is running PHP 5.2.17 on Apache/2.2.22.
> PHP config --> default_charset: iso-8859-1
>
> I cannot change the config on the server since it is shared hosting but I
> can certainly modify the script to hopefully fix this issue.
>
> I have tried to insert in the script:
> ini_set( 'default_charset', 'UTF-8' );
> htmlentities( $string, ENT_COMPAT, "UTF-8" );
> header('Content-type: text/plain; charset=utf-8');
> but no joy.
>
> I have also tried to add to .htaccess IndexOptions +Charset=UTF-8 No joy
> there either.
>
> I could be completely wrong about the utf-8 issue; and perhaps it has
> more something to do with decoding base64 in general.
>
> This is what I now think is the problem, more specifically:
> I have just tried a base64 online decoder and pasted the garbled text in
> it and used the "decode" online feature, and the result is perfectly
> legible once decoded, whether I choose utf-8 or ascii as charset (but I
> should use utf-8).
>
> So, looks like the feed is "echoed" or "printed" in base64 format....
> It doesn't look like charset is the problem but decoding base64.
>
> I can see the base64_decode function here
> <http://www.php.net/manual/en/function.base64-decode.php>
> but not sure if this is the right way to go about this; or how to apply
> it in this script.
>
> I am going about this fairly blind, I must say, and doing trial and error
> which is just wrong....
>
> Again, this is the script.
> <http://linuxtrove.com/wp/?p=209>
>
> Any pointers?
>
> Tx.
>

Stef,

Just looking at the output, my immediate thought was "this is base64
encoded". That would match if it works in an email reader as a base64
encoded attachment.

I don't have the time to go through 300+ LOC to try to figure out what
you code is doing, but when sending as either XML or HTML, you need to
first base64_decode() the text. Once you've done that, apply
htmlentities() to the string to encode the HTML entities. Then send it
to the RSS or HTML feed.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex(at)attglobal(dot)net
==================
Re: Encoding Problems [message #186336 is a reply to message #186334] Wed, 02 July 2014 19:49 Go to previous messageGo to next message
Thomas 'PointedEars'  is currently offline  Thomas 'PointedEars'
Messages: 701
Registered: October 2010
Karma: 0
Senior Member
stef_204 wrote:

> Newbie at php; please bear with me.

OK, I will not killfile you right away.

Please read and adhere to:

- <http://www.eternal-september.org/index.php?language=en&showpage=terms>
- <http://www.interhack.net/pubs/munging-harmful/>

And get a real name.


PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7(at)news(dot)demon(dot)co(dot)uk>
Re: Encoding Problems [message #186337 is a reply to message #186334] Wed, 02 July 2014 20:25 Go to previous messageGo to next message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
stef_204 wrote:

> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.

Indeed, that seems to be a encoding problem. Not so much a character
encoding problem but rather a body transfer encoding problem. You will
most likely have to take into account the "encoding" property of
imap_fetchstructure()'s return value.

Please heed Thomas' advice. It's always nice to "know" who you're
talking to and to be able to send a private reply, if appropriate.

--
Christoph M. Becker
Re: Encoding Problems [message #186339 is a reply to message #186334] Thu, 03 July 2014 12:08 Go to previous messageGo to next message
bill is currently offline  bill
Messages: 310
Registered: October 2010
Karma: 0
Senior Member
On 7/2/2014 11:53 AM, stef_204 wrote:


Are you the stef that translates HTML-kit into Dutch?
If you are, nice seeing you here.

--bill
Re: Encoding Problems [message #186340 is a reply to message #186336] Thu, 03 July 2014 12:09 Go to previous messageGo to next message
bill is currently offline  bill
Messages: 310
Registered: October 2010
Karma: 0
Senior Member
On 7/2/2014 3:49 PM, Thomas 'PointedEars' Lahn wrote:
> stef_204 wrote:
>
>> Newbie at php; please bear with me.
>
> OK, I will not killfile you right away.
>
> Please read and adhere to:
>
> - <http://www.eternal-september.org/index.php?language=en&showpage=terms>
> - <http://www.interhack.net/pubs/munging-harmful/>
>
> And get a real name.
>
>
> PointedEars
>

Wait a minute. A guy named "PointedEars" is complaining about a
real name. I want to see a copy of the birth certificate with
PointedEars on it.

LOL

I know you are talking about the from: address, but I couldn't
resist.

--bill
Re: Encoding Problems [message #186349 is a reply to message #186334] Sat, 05 July 2014 01:16 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
stef_204, 2014-07-02 17:53:

> Hi,
>
> Newbie at php; please bear with me.
>
> I am trying to use a script which basically creates an xml feed for email
> messages found in an IMAP account; one can then either subscribe to feed
> via an rss reader/aggregator or use browser to read html page.
>
> Here is the script: <http://linuxtrove.com/wp/?p=209>
> (imap2rss.php)
>
> My problem: almost all of the messages (emails) show garbled text, I
> believe due to encoding problems.
>
> Here is a picture of how it looks both using a browser like Firefox or an
> rss news feed reader.
>
> <http://imagebin.org/314843>

This is base64 and as to be decoded.

[...]
> I could be completely wrong about the utf-8 issue; and perhaps it has
> more something to do with decoding base64 in general.

Yep - that's the point.

> This is what I now think is the problem, more specifically:
> I have just tried a base64 online decoder and pasted the garbled text in
> it and used the "decode" online feature, and the result is perfectly
> legible once decoded, whether I choose utf-8 or ascii as charset (but I
> should use utf-8).
>
> So, looks like the feed is "echoed" or "printed" in base64 format....
> It doesn't look like charset is the problem but decoding base64.

Yep.

>
> I can see the base64_decode function here
> <http://www.php.net/manual/en/function.base64-decode.php>
> but not sure if this is the right way to go about this; or how to apply
> it in this script.

Well - unfortunately there is no easy "put it in there and it works".

First of all: Just testing for the subtype "PLAIN" (is not enough. You
also have to check for the encoding.

As far i see, this should be added here:

if($msgStructure->subtype=="PLAIN")
$body = renderPlainText($body);

So extend that for the encoding:

if($msgStructure->subtype=="PLAIN")
{
switch($msgStructure->encoding)
{
case 4:
// Body text is quoted-printable encoded
$body = quoted_printable_decode($body);
break;

case 3:
// Body text is base64 encoded
$body = base64_decode($data);
break;
}

$body = renderPlainText($body);
}

Also see <http://php.net/manual/en/function.imap-fetchstructure.php> and
the comments there.



--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186350 is a reply to message #186349] Sat, 05 July 2014 01:19 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
Arno Welzel, 2014-07-05 03:16:

> stef_204, 2014-07-02 17:53:
>
>> Hi,
>>
>> Newbie at php; please bear with me.
>>
>> I am trying to use a script which basically creates an xml feed for email
>> messages found in an IMAP account; one can then either subscribe to feed
>> via an rss reader/aggregator or use browser to read html page.
>>
>> Here is the script: <http://linuxtrove.com/wp/?p=209>
>> (imap2rss.php)
>>
>> My problem: almost all of the messages (emails) show garbled text, I
>> believe due to encoding problems.
>>
>> Here is a picture of how it looks both using a browser like Firefox or an
>> rss news feed reader.
>>
>> <http://imagebin.org/314843>
>
> This is base64 and as to be decoded.
>
> [...]
>> I could be completely wrong about the utf-8 issue; and perhaps it has
>> more something to do with decoding base64 in general.
>
> Yep - that's the point.
>
>> This is what I now think is the problem, more specifically:
>> I have just tried a base64 online decoder and pasted the garbled text in
>> it and used the "decode" online feature, and the result is perfectly
>> legible once decoded, whether I choose utf-8 or ascii as charset (but I
>> should use utf-8).
>>
>> So, looks like the feed is "echoed" or "printed" in base64 format....
>> It doesn't look like charset is the problem but decoding base64.
>
> Yep.
>
>>
>> I can see the base64_decode function here
>> <http://www.php.net/manual/en/function.base64-decode.php>
>> but not sure if this is the right way to go about this; or how to apply
>> it in this script.
>
> Well - unfortunately there is no easy "put it in there and it works".
>
> First of all: Just testing for the subtype "PLAIN" (is not enough. You
> also have to check for the encoding.
>
> As far i see, this should be added here:
>
> if($msgStructure->subtype=="PLAIN")
> $body = renderPlainText($body);
>
> So extend that for the encoding:
>
> if($msgStructure->subtype=="PLAIN")
> {
> switch($msgStructure->encoding)
> {
> case 4:
> // Body text is quoted-printable encoded
> $body = quoted_printable_decode($body);
> break;
>
> case 3:
> // Body text is base64 encoded
> $body = base64_decode($data);

Ups - sorry for the c&p typo. Of course it should be:

$body = base64_decode($body);

> break;
> }
>
> $body = renderPlainText($body);
> }
>
> Also see <http://php.net/manual/en/function.imap-fetchstructure.php> and
> the comments there.



--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186352 is a reply to message #186349] Sat, 05 July 2014 01:28 Go to previous messageGo to next message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
Arno Welzel wrote:

> So extend that for the encoding:
>
> if($msgStructure->subtype=="PLAIN")
> {
> switch($msgStructure->encoding)
> {
> case 4:
> // Body text is quoted-printable encoded
> $body = quoted_printable_decode($body);
> break;
>
> case 3:
> // Body text is base64 encoded
> $body = base64_decode($data);
> break;
> }
>
> $body = renderPlainText($body);
> }

What about a default clause, at least triggering a notice/warning that
the encoding is not understood?

--
Christoph M. Becker
Re: Encoding Problems [message #186353 is a reply to message #186352] Sat, 05 July 2014 23:36 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
Christoph Michael Becker, 2014-07-05 03:28:

> Arno Welzel wrote:
>
>> So extend that for the encoding:
>>
>> if($msgStructure->subtype=="PLAIN")
>> {
>> switch($msgStructure->encoding)
>> {
>> case 4:
>> // Body text is quoted-printable encoded
>> $body = quoted_printable_decode($body);
>> break;
>>
>> case 3:
>> // Body text is base64 encoded
>> $body = base64_decode($data);
>> break;
>> }
>>
>> $body = renderPlainText($body);
>> }
>
> What about a default clause, at least triggering a notice/warning that
> the encoding is not understood?

Good Point. But which other encoding except no encoding at all, base64
and or quoted printable may be used?


--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186354 is a reply to message #186353] Sun, 06 July 2014 02:16 Go to previous messageGo to next message
Denis McMahon is currently offline  Denis McMahon
Messages: 634
Registered: September 2010
Karma: 0
Senior Member
On Sun, 06 Jul 2014 01:36:20 +0200, Arno Welzel wrote:

> Christoph Michael Becker, 2014-07-05 03:28:

>> What about a default clause, at least triggering a notice/warning that
>> the encoding is not understood?

> Good Point. But which other encoding except no encoding at all, base64
> and or quoted printable may be used?

multipart/form-data

It might not be expected in an email, but then the email might be
generated by someone looking to target some notional email application
which had an exploitable vulnerability in that it would try and decode
multipart/form-data in a manner that further allowed a carefully crafted
invalid data sequence to trigger arbitrary code execution.

Now that exploit might not even be targeting the code being discussed
here, but when script kiddie spam house sends out billions of emails in
an attempt to exploit that vulnerability, the chances are that one of
them will make its way into this processing chain.

Detecting and cleanly handling both unrecognised declared encoding types
and malformed encoded data is therefore probably good practice.

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
Re: Encoding Problems [message #186355 is a reply to message #186354] Sun, 06 July 2014 03:19 Go to previous messageGo to next message
Richard Damon is currently offline  Richard Damon
Messages: 58
Registered: August 2011
Karma: 0
Member
On 7/5/14, 10:16 PM, Denis McMahon wrote:
> On Sun, 06 Jul 2014 01:36:20 +0200, Arno Welzel wrote:
>
>> Christoph Michael Becker, 2014-07-05 03:28:
>
>>> What about a default clause, at least triggering a notice/warning that
>>> the encoding is not understood?
>
>> Good Point. But which other encoding except no encoding at all, base64
>> and or quoted printable may be used?
>
> multipart/form-data
>
> It might not be expected in an email, but then the email might be
> generated by someone looking to target some notional email application
> which had an exploitable vulnerability in that it would try and decode
> multipart/form-data in a manner that further allowed a carefully crafted
> invalid data sequence to trigger arbitrary code execution.
>
> Now that exploit might not even be targeting the code being discussed
> here, but when script kiddie spam house sends out billions of emails in
> an attempt to exploit that vulnerability, the chances are that one of
> them will make its way into this processing chain.
>
> Detecting and cleanly handling both unrecognised declared encoding types
> and malformed encoded data is therefore probably good practice.
>

multipart/form-date would be a value for Content-Type, not a value for
Content-Transfer-Encoding.

The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
quoted-printable
base64
binary
8bit
7bit

(Binary and 8bit are only allowed if the receiving server indicates it
is capable of handling it).

Binary, 8bit, and 7bit imply not transform should be performed on the
data to decode it.
Re: Encoding Problems [message #186356 is a reply to message #186355] Sun, 06 July 2014 05:21 Go to previous messageGo to next message
gordonb.defz8 is currently offline  gordonb.defz8
Messages: 1
Registered: July 2014
Karma: 0
Junior Member
> multipart/form-date would be a value for Content-Type, not a value for
> Content-Transfer-Encoding.
>
> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
> quoted-printable
> base64
> binary
> 8bit
> 7bit

Never forget that viruses and malware aren't required to obey the
rules, and you could very well end up with Content-Type: spam/virus
and Content-Transfer-Encoding: rot13 .
Re: Encoding Problems [message #186358 is a reply to message #186356] Sun, 06 July 2014 11:14 Go to previous messageGo to next message
Richard Damon is currently offline  Richard Damon
Messages: 58
Registered: August 2011
Karma: 0
Member
On 7/6/14, 1:21 AM, Gordon Burditt wrote:
>> multipart/form-date would be a value for Content-Type, not a value for
>> Content-Transfer-Encoding.
>>
>> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
>> quoted-printable
>> base64
>> binary
>> 8bit
>> 7bit
>
> Never forget that viruses and malware aren't required to obey the
> rules, and you could very well end up with Content-Type: spam/virus
> and Content-Transfer-Encoding: rot13 .
>

And if you don't properly "decode" that payload there is a problem?

Yes, you don't want you program to crash on an improper value, and
somehow rejecting malformed messages is preferable, but ignoring the
error isn't bad (assuming your ultimate processing of the message is
well controlled)
Re: Encoding Problems [message #186359 is a reply to message #186353] Sun, 06 July 2014 13:30 Go to previous messageGo to next message
Christoph Michael Bec is currently offline  Christoph Michael Bec
Messages: 207
Registered: June 2013
Karma: 0
Senior Member
Arno Welzel wrote:

> Christoph Michael Becker, 2014-07-05 03:28:
>
>> Arno Welzel wrote:
>>
>>> So extend that for the encoding:
>>>
>>> if($msgStructure->subtype=="PLAIN")
>>> {
>>> switch($msgStructure->encoding)
>>> {
>>> case 4:
>>> // Body text is quoted-printable encoded
>>> $body = quoted_printable_decode($body);
>>> break;
>>>
>>> case 3:
>>> // Body text is base64 encoded
>>> $body = base64_decode($data);
>>> break;
>>> }
>>>
>>> $body = renderPlainText($body);
>>> }
>>
>> What about a default clause, at least triggering a notice/warning that
>> the encoding is not understood?
>
> Good Point. But which other encoding except no encoding at all, base64
> and or quoted printable may be used?

The PHP manual documents 6 values for the transfer encodings[1].
Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
handling (if only to ignore the body in these cases, what might be
necessary to avoid potential vulnerabilities).

[1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>

--
Christoph M. Becker
Re: Encoding Problems [message #186360 is a reply to message #186359] Sun, 06 July 2014 19:46 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
Christoph Michael Becker, 2014-07-06 15:30:

> Arno Welzel wrote:
>
>> Christoph Michael Becker, 2014-07-05 03:28:
>>
>>> Arno Welzel wrote:
>>>
>>>> So extend that for the encoding:
>>>>
>>>> if($msgStructure->subtype=="PLAIN")
>>>> {
>>>> switch($msgStructure->encoding)
>>>> {
>>>> case 4:
>>>> // Body text is quoted-printable encoded
>>>> $body = quoted_printable_decode($body);
>>>> break;
>>>>
>>>> case 3:
>>>> // Body text is base64 encoded
>>>> $body = base64_decode($data);
>>>> break;
>>>> }
>>>>
>>>> $body = renderPlainText($body);
>>>> }
>>>
>>> What about a default clause, at least triggering a notice/warning that
>>> the encoding is not understood?
>>
>> Good Point. But which other encoding except no encoding at all, base64
>> and or quoted printable may be used?
>
> The PHP manual documents 6 values for the transfer encodings[1].
> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
> handling (if only to ignore the body in these cases, what might be
> necessary to avoid potential vulnerabilities).
>
> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>

Thanks for the clarification - that's the URL I also referred to
originally ;-)



--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186361 is a reply to message #186360] Sun, 06 July 2014 20:21 Go to previous messageGo to next message
Richard Damon is currently offline  Richard Damon
Messages: 58
Registered: August 2011
Karma: 0
Member
On 7/6/14, 3:46 PM, Arno Welzel wrote:
> Christoph Michael Becker, 2014-07-06 15:30:
>
>> Arno Welzel wrote:
>>
>>> Christoph Michael Becker, 2014-07-05 03:28:
>>>
>>>> Arno Welzel wrote:
>>>>
>>>> > So extend that for the encoding:
>>>> >
>>>> > if($msgStructure->subtype=="PLAIN")
>>>> > {
>>>> > switch($msgStructure->encoding)
>>>> > {
>>>> > case 4:
>>>> > // Body text is quoted-printable encoded
>>>> > $body = quoted_printable_decode($body);
>>>> > break;
>>>> >
>>>> > case 3:
>>>> > // Body text is base64 encoded
>>>> > $body = base64_decode($data);
>>>> > break;
>>>> > }
>>>> >
>>>> > $body = renderPlainText($body);
>>>> > }
>>>>
>>>> What about a default clause, at least triggering a notice/warning that
>>>> the encoding is not understood?
>>>
>>> Good Point. But which other encoding except no encoding at all, base64
>>> and or quoted printable may be used?
>>
>> The PHP manual documents 6 values for the transfer encodings[1].
>> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
>> handling (if only to ignore the body in these cases, what might be
>> necessary to avoid potential vulnerabilities).
>>
>> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
>
> Thanks for the clarification - that's the URL I also referred to
> originally ;-)

Looking at your original code, your base64 path is converting $data to
$body, while the other paths are $body to $body.

The RFC defines binary as a raw encoding, meaning the message holds the
desired byte stream. The difference between it and 7bit and 8bit is that
in addition to using all values similar to 8bit, but also is allowed to
have nulls(0), and CR(13) and LF(10) don't delimit lines (which no
longer have the 998 byte length limit). If renderPlainText can't handle
that sort of data, maybe you should discard encoding binary, but then
just because the message doesn't say it is binary, doesn't force it to
obey (unless your MTA checks and enforces this), so renderPlainText
should do something "valid" for these cases anyway (even if it is just
outputting nothing).

Similarly, "Other" probably means that the encoding wasn't validly
specified, so you might want to reject, but you don't need to (as you
should be able to handle in some manner what ever "garbage" is sent to
you, even if it be rejecting or outputting nothing).
Re: Encoding Problems [message #186363 is a reply to message #186358] Mon, 07 July 2014 01:24 Go to previous messageGo to next message
Denis McMahon is currently offline  Denis McMahon
Messages: 634
Registered: September 2010
Karma: 0
Senior Member
On Sun, 06 Jul 2014 07:14:06 -0400, Richard Damon wrote:

> On 7/6/14, 1:21 AM, Gordon Burditt wrote:
>>> multipart/form-date would be a value for Content-Type, not a value for
>>> Content-Transfer-Encoding.

>>> The defined vauls (By RFC 2045) of Content-Transfer-Encoding are:
>>> quoted-printable base64 binary 8bit 7bit

>> Never forget that viruses and malware aren't required to obey the
>> rules, and you could very well end up with Content-Type: spam/virus and
>> Content-Transfer-Encoding: rot13 .

> And if you don't properly "decode" that payload there is a problem?

No, but doing something sensible and clean in the face of unexpected data
values is better than just bombing out.

My original points are and remain that (a) the values you get for the
content-transfer-encoding might not be in your list, and (b) that the
actual content-transfer-encoding might not match the declared content-
transfer-encoding.

This combination of factors means that having a default that assumes "if
it didn't match anything else it must be x" is a bad idea. It also means
that it's a good idea to expect and try and cleanly detect any content
decoding errors, on the assumption that at some point malformed content
will arrive and you want to handle it in a manner that, at the very
least, doesn't create a vulnerability.

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com
Re: Encoding Problems [message #186365 is a reply to message #186361] Mon, 07 July 2014 01:49 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
Richard Damon, 2014-07-06 22:21:

> On 7/6/14, 3:46 PM, Arno Welzel wrote:
>> Christoph Michael Becker, 2014-07-06 15:30:
>>
>>> Arno Welzel wrote:
>>>
>>>> Christoph Michael Becker, 2014-07-05 03:28:
>>>>
>>>> > Arno Welzel wrote:
>>>> >
>>>> >> So extend that for the encoding:
>>>> >>
>>>> >> if($msgStructure->subtype=="PLAIN")
>>>> >> {
>>>> >> switch($msgStructure->encoding)
>>>> >> {
>>>> >> case 4:
>>>> >> // Body text is quoted-printable encoded
>>>> >> $body = quoted_printable_decode($body);
>>>> >> break;
>>>> >>
>>>> >> case 3:
>>>> >> // Body text is base64 encoded
>>>> >> $body = base64_decode($data);
>>>> >> break;
>>>> >> }
>>>> >>
>>>> >> $body = renderPlainText($body);
>>>> >> }
>>>> >
>>>> > What about a default clause, at least triggering a notice/warning that
>>>> > the encoding is not understood?
>>>>
>>>> Good Point. But which other encoding except no encoding at all, base64
>>>> and or quoted printable may be used?
>>>
>>> The PHP manual documents 6 values for the transfer encodings[1].
>>> Particularly 2 (BINARY) and 5 (OTHER) seem to demand some further
>>> handling (if only to ignore the body in these cases, what might be
>>> necessary to avoid potential vulnerabilities).
>>>
>>> [1] <http://www.php.net/manual/en/function.imap-fetchstructure.php>
>>
>> Thanks for the clarification - that's the URL I also referred to
>> originally ;-)
>
> Looking at your original code, your base64 path is converting $data to
> $body, while the other paths are $body to $body.

Yep - that's why I corrected this fault in my follow-up in
<53B7529D(dot)9050002(at)arnowelzel(dot)de>.

This was just meant as suggestion how one could handle the diferent
content-transfer-encodings, not tested code ready to use. Therefore I
also mentioned the PHP manual for further reading.

Of course one should add a default case to handle "unknown"
content-transfer-encodings.


--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186387 is a reply to message #186334] Fri, 11 July 2014 16:47 Go to previous messageGo to next message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
stef_204, 2014-07-10 15:57:

> On Sat, 05 Jul 2014 03:19:25 +0200, Arno Welzel wrote:
>
>>> So extend that for the encoding:
>>>
>>> if($msgStructure->subtype=="PLAIN")
>>> {
>>> switch($msgStructure->encoding)
>>> {
>>> case 4:
>>> // Body text is quoted-printable encoded $body =
>>> quoted_printable_decode($body); break;
>>>
>>> case 3:
>>> // Body text is base64 encoded $body = base64_decode($data);
>>
>> Ups - sorry for the c&p typo. Of course it should be:
>>
>> $body = base64_decode($body);
>>
>>> break;
>>> }
>>>
>>> $body = renderPlainText($body);
>>> }
>
> Arno,
>
> The above seems to work. Thanks.
> I still get a little bit of garbled text due to charset utf-8 (I believe)
> but we are now 99% better on the $body.

Just keep in mind, that my example is not complete and just a suggestion
how to start - there should also be a case to handle text with transfer
encoding which does not need decoding at all and a default case to
handle unknown encodings.


--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
Re: Encoding Problems [message #186396 is a reply to message #186334] Fri, 18 July 2014 09:29 Go to previous messageGo to next message
Tim Streater is currently offline  Tim Streater
Messages: 328
Registered: September 2010
Karma: 0
Senior Member
In article <lqao35$8v5$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:

> Sorry to ask but I am struggling with the "subject" part of the email.
>
> I tried to find a fix but not joy, yet.
>
> I can decode the base64 encoded subject of each email individually by
> adding a: mb_decode_mimeheader as follows, but that's really just a
> "hack" and not proper, IMHO.
>
> And that only decodes $subject on the html page produced for individual
> emails, not the top level html page/rss feed which lists all of the
> emails.
>
> The subjects there are still reading:
> "=?UTF-8?B?"InsertGarbledText (base64) here"=?=

These are called "encoded words". You can read about it in RFC2047, or
look up the WikiPedia article on MIME (in caps). You'll have to write
some PHP to decode those. In general the format is:

introducer: =?
charset: UTF-8 (in this case)
separator: ?
coding: B for base64, Q for quoted printable
separator: ?
encoded text follows
terminator: ?=

That will allow you to pick the item apart and know what to do with it.

BTW, if you are doing stuff with emails, there's no substitute for
reading the RFCs and understanding how emails are put together. That's
what I did as part of the process for writing my own email client. RFCs
2045, 2046, 2047, 2048, and 2049 are a good place to start.

--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
Re: Encoding Problems [message #186397 is a reply to message #186334] Fri, 18 July 2014 10:05 Go to previous messageGo to next message
Tim Streater is currently offline  Tim Streater
Messages: 328
Registered: September 2010
Karma: 0
Senior Member
In article <lqapur$mnb$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:

> On Fri, 18 Jul 2014 10:29:32 +0100, Tim Streater wrote:
>
>> BTW, if you are doing stuff with emails, there's no substitute for
>> reading the RFCs and understanding how emails are put together. That's
>> what I did as part of the process for writing my own email client. RFCs
>> 2045, 2046, 2047, 2048, and 2049 are a good place to start.
>
> I agree with you--I'm just really pressed at the moment on this work
> project unfortunately and need to get it up and running ASAP.
> Not ideal at all, obviously, but coding/programming is only incidental to
> my work and not my main work.

Um, I see the problem. Would it help if I emailed you the function I
put together for the purpose?

--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
Re: Encoding Problems [message #186398 is a reply to message #186334] Fri, 18 July 2014 13:32 Go to previous messageGo to next message
Tim Streater is currently offline  Tim Streater
Messages: 328
Registered: September 2010
Karma: 0
Senior Member
In article <lqb17k$9bb$1(at)dont-email(dot)me>, stef_204 <notvalid(at)nomail(dot)nul>
wrote:

> On Fri, 18 Jul 2014 11:05:47 +0100, Tim Streater wrote:
>
>> Um, I see the problem. Would it help if I emailed you the function I put
>> together for the purpose?
>
> Sure, let's give it a shot and see if I am able to integrate it in to the
> script to resolve the issue.

OK - on its way, let me know if you don't receive it.

--
"People don't buy Microsoft for quality, they buy it for compatibility
with what Bob in accounting bought last year. Trace it back - they buy
Microsoft because the IBM Selectric didn't suck much" - P Seebach, afc
Re: Encoding Problems [message #186401 is a reply to message #186334] Sat, 19 July 2014 07:36 Go to previous message
Arno Welzel is currently offline  Arno Welzel
Messages: 317
Registered: October 2011
Karma: 0
Senior Member
stef_204, 2014-07-18 11:06:

> On Fri, 11 Jul 2014 18:47:55 +0200, Arno Welzel wrote:
>
>> Just keep in mind, that my example is not complete and just a suggestion
>> how to start - there should also be a case to handle text with transfer
>> encoding which does not need decoding at all and a default case to
>> handle unknown encodings.
>
> Arno,
>
> Sorry to ask but I am struggling with the "subject" part of the email.
>
> I tried to find a fix but not joy, yet.
>
> I can decode the base64 encoded subject of each email individually by
> adding a: mb_decode_mimeheader as follows, but that's really just a
> "hack" and not proper, IMHO.
>
> And that only decodes $subject on the html page produced for individual
> emails, not the top level html page/rss feed which lists all of the
> emails.
>
> The subjects there are still reading:
> "=?UTF-8?B?"InsertGarbledText (base64) here"=?=

imap_mime_header_decode() may help.

See <http://php.net/manual/en/function.imap-mime-header-decode.php>



--
Arno Welzel
http://arnowelzel.de
http://de-rec-fahrrad.de
http://fahrradzukunft.de
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Help with Pear Soap Client to weather service
Next Topic: Resolved: Generating "download" pages
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Thu Nov 21 10:23:45 GMT 2024

Total time taken to generate the page: 0.02552 seconds