nntp import of large groups creates troubles [message #20202] |
Fri, 01 October 2004 22:29 |
devnull
Messages: 16 Registered: September 2004
Karma: 0
|
Junior Member |
|
|
After making a custom theme and installing FudForum on a new server, I configured the newsgroups I wanted to import.
Some groups are pretty big (over 5000 posts) what causes the nntp.php script to fail. I'm not sure if it's a timeout or if it just crashes.
The problem is that after I restart it, it imports old news again; I end up with hundreds of duplicate entries. For now, I quit the import until I find a solution. Accoring to a previous post here, crashed newsgroup imports will leave a .lock in errors/.nntp. For almost every bigger group, I have one in there.
Here my two ideas (it's 00:30am, so there might be more ):
1) Could the nntp.php script easily be changed to stop after eg 1000 posts? When running from cron, the board would fill slowly over several updates.
2) Is the number in the newsfiles in errors/.nntp the topic ID? It would involve more work, but no sourcechanges would be needed. Run nntp.php for one group until it stops and fix the file in .nntp manually, then re-run.
I do hope #1 is possible tho
|
|
|
|
|
|
|
|
|
Re: nntp import of large groups creates troubles [message #20293 is a reply to message #20291] |
Wed, 06 October 2004 20:27 |
Ilia
Messages: 13241 Registered: January 2002
Karma: 0
|
Senior Member Administrator Core Developer |
|
|
The "position" file must contain only a single #, this number being the id of the last imported message. So to write to this file, after every message you'd need to clear the file and write to it again.
This would be fairly slow, to do for every single message.
Perhaps, it would be more prudent to discover why PHP is crashing and address that problem. Now that you have a message import limit, you have an easy mechanism of prevent the script from timing out or eating too much CPU.
FUDforum Core Developer
|
|
|
Re: nntp import of large groups creates troubles [message #20375 is a reply to message #20202] |
Mon, 11 October 2004 12:21 |
devnull
Messages: 16 Registered: September 2004
Karma: 0
|
Junior Member |
|
|
For a quick solution, I just changed the script so that it writes the last article-id after every retrieved article. Time is something I don't have much right now, so I chose a quick and dirty solution.
Most groups were imported fine that way, and after several runs, up to date.
A few seemed to be stuck; although there were still a lot of articles to import, the script only increased the count by 1. I decided to tcpdump a single run to search for the possible cause.
I've isolated one case where the counter only increases by 1 and checked the article-id file over several runs. The ID does not change, although according to the forum view, one message was imported. The only change I made to the nntp.inc file was to add the following at line 452: $this->set_end($i);
Now, when looking into the dump, I saw that the article with the ID in the file was retrieved. Then the script requested the next ID, and this article was very long (10406 lines). Here the script stopped and the ID in the file has not changed. However, the first message is added to the database; I forgot to check that for some time and now have a lot of dupes (reminds me, is there a dupe search? )
I assume it's a problem with the size of the messages.
Attached is the conversation for two messages. The second message is requested, but not added. Then the script ends.
-
Attachment: posts.txt
(Size: 407.06KB, Downloaded 1540 times)
|
|
|
Re: nntp import of large groups creates troubles [message #20380 is a reply to message #20375] |
Mon, 11 October 2004 18:43 |
Ilia
Messages: 13241 Registered: January 2002
Karma: 0
|
Senior Member Administrator Core Developer |
|
|
There is no "dupe search", but I guess you can use the regular search engine.
The solution to your problem would be to move the $this->set_end() before $this->get_message().
This would mean that if a message was not imported the forum will start from the next message on the next run rather then trying to import failing message yet again.
FUDforum Core Developer
|
|
|
Re: nntp import of large groups creates troubles [message #20396 is a reply to message #20380] |
Tue, 12 October 2004 13:38 |
devnull
Messages: 16 Registered: September 2004
Karma: 0
|
Junior Member |
|
|
Thanks!
Now that I don't direct the output to /dev/null anymore, I ran over the following message when the update process stops:
<b>Fatal error</b>: Allowed memory size of 8388608 bytes exhausted (tried to allocate 40 bytes) in <b>/home/web/board/include/theme/default/wordwrap.inc</b> on line <b>27</b><br />
Rising the memory in the php.ini to eg 16MB results in the same; it just fills the 16777216 bytes instead.
When things are a bit more quiet here, I think I'll to make a dupe-search in Perl. I have to take a closer look at the db for that first.
A suggestion: avoiding dupes could be quite easily added to the board if a MD5 is generated for each message upon posting. The chances for two different posts with the same MD5 are practically zero. If the field in the db is set to unique, no dupe could be added.
[Updated on: Tue, 12 October 2004 13:40] Report message to a moderator
|
|
|
Re: nntp import of large groups creates troubles [message #20397 is a reply to message #20396] |
Tue, 12 October 2004 13:46 |
Ilia
Messages: 13241 Registered: January 2002
Karma: 0
|
Senior Member Administrator Core Developer |
|
|
Actually md5 is pretty unreliable. The forum instead natively stores a unique id assigned to each message by NNTP. You can use this field to find and remove dupes.
FUDforum Core Developer
|
|
|