Re: Parsing mbox files with Windows Php [message #181682 is a reply to message #181678] |
Fri, 24 May 2013 23:40 |
Chuck Anderson
Messages: 63 Registered: September 2010
Karma:
|
Member |
|
|
Jerry Stuckle wrote:
> On 5/24/2013 5:57 PM, Chuck Anderson wrote:
>>
>> I have been slowly building a Php/MySQL based IMAP email client. The
>> Php IMAP functions are well documented and easy to use. I store
>> messages in a MySQL database (attachments in the file system) for ease
>> of organization, maintenance, backup and searching. I like it enough
>> that I want to add my existing archive of email in Thunderbird. To do
>> so, I need to parse mbox files and extract message headers, parts, and
>> attachments. Thankfully, the Php IMAP functions can be used to open and
>> parse an mbox file (or even a single .eml file) as well as opening a
>> stream to the server.
>>
>> Using this functionality I can write a script to run on my WAMP
>> development machine that reads the Thunderbird folder structure, parses
>> the mbox files and saves individual messages along with their
>> folder/subfolder path in the Thunderbird folder hierarchy. It should be
>> as easy as pointing it to the top of the folder structure and letting it
>> do all the work from there.
>>
>> Unfortunately, it appears that the Windows Php binary is unable to
>> connect to an mbox file, so to make my job easy, I would have to upload
>> the entire folder structure (it is about 200MBs) to my shared host and
>> process it there. I would much rather "toy around" with this process on
>> my Windows development machine and not on the shared host.
>>
>> There is a Php bug filed for this, but it was determined that the
>> "underlying c-client function is unable to open a file."
>> https://bugs.php.net/bug.php?id=39880 - closed as "not a bug."
>>
>> $mbox = imap_open('pathto/mboxfile', '', '') // works on *nix, but not
>> on Windows.
>> - Warning: imap_open(): Couldn't open stream mboxfiles/Inbox ....
>> - Notice: Unknown: Can't open mailbox mboxfiles/Inbox: no such mailbox
>> (errflg=2) in Unknown on line 0.
>> (This second error is the one coming from the underlying c-client
>> function.)
>>
>> I believe this is a dead end but want to check if anyone has happened to
>> figure out a way to make this work in Windows - or if you know of a
>> separate mbox file parser that would be fairly simple to integrate with
>> my current Php IMAP based scripts - dependent on the output of
>> imap_fetchstructure, imap_headerinfo, imap_fetchbody(parts).
>>
>> If not, I will probably build a form that lets me select individual
>> (smaller) sections of the folder hierarchy to process individually
>> (tedious and prone to error).
>>
>
> Are you sure Thunderbird's files are in imap format?
I don't have the spec, but I have read as much. And the imap functions
do parse them nicely - on *nix.
> I didn't think they were - I thought they were in some
> Thunderbird-specific format.
Thunderbird (I am still in version 2) uses the mbox format. (I believe
that means that the beginning of a new email is denoted by a blank line
followed by "From - ...."). The rest is the usual email format, headers
until a double line feed denoting the beginning of the body. (Body
lines that happen to begin with "From" have to have "From" escaped in
those files.)
When saving a single email to a .eml file, before you can not parse it
with imap functions you must insert a dummy "From - " line (separate
from the usual From: header, this is "From - date") to the beginning of
the headers. Thunderbird places those into the mbox (multiple emails) file.
>
> But also, if you're opening 'pathto/mboxfile', why is it complaining
> about 'mboxfiles...'?
Sorry. I was using generic terms and switched terms (sloppy). The
error says it can not open the folder/file I specified. When I run the
exact same script on my shared Linux server (after uploading a sample
Thunderbird mail file), the imap functions produce output exactly like
they would if reading from a stream on the imap server.
--
*****************************
Chuck Anderson • Boulder, CO
http://cycletourist.com
Turn Off, Tune Out, Drop In
*****************************
|
|
|