Yahoo Groups -> mbox -> maillist.php -> missing posts [message #15672] |
Tue, 30 December 2003 20:34 |
srchild
Messages: 88 Registered: December 2003 Location: UK
Karma:
|
Member |
|
|
Looking at converting a yahoogroups group to FUD, so trying to transfer the existing archive.
I've collected the archive from Yahoo using this script:
http://www.lpthe.jussieu.fr/~zeitlin/yahoo2mbox.html
Now it is in mbox format, and it appears to be a valid mbox format e.g. if I view it with Elm it shows the correct number of messages and they are readable.
I load it into FUD 2.5.2 using:
cat archive | formail -s /path/to/php /path/to/maillist.php 1
Using 'Slow Reply Match' to recreate the threads, and subject mangling to remove the [listname] and body mangling to remove some of the advertising dross, it all looks good.
But it only loads about half of the messages, and the rest go missing for no obvious reason. It's not just dying early, it is missing messages out from early on in the archive. I've examined the archive file and can see no clues as to why some messages are imported and others are not. Some are email postings and some are posted from website. A user might have some messages imported whilst others by the same user are not.
I've experimented tidying up the archive file manually (removing adverts and wrapped Received lines from the first few messages, that sort of thing). I tried reordering the first few messages - one which loaded fine when first no longer loaded when moved to second in the archive.
I've tried feeding the archive through formail:
cat archive | formail > archive2
and it appears to quote lots (all?) the From_ lines, except the first line, whereas I thought it was supposed to quote only bogus From_ lines? So perhaps there is a problem with my archive and so formail is not breaking it up properly? (but note that Elm can read it properly).
I've found some fragments of text in messages/msg_1 but can't see how to interpret that - maybe there are clues in there?
Anyone got any clues for me?
Thanks
Simon Child
|
|
|