FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » FUDforum Development » Bug Reports » nntp import of large groups creates troubles
Show: Today's Messages :: Polls :: Message Navigator
Switch to threaded view of this topic Create a new topic Submit Reply
nntp import of large groups creates troubles [message #20202] Fri, 01 October 2004 22:29 Go to next message
devnull is currently offline  devnull   United States
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
After making a custom theme and installing FudForum on a new server, I configured the newsgroups I wanted to import.

Some groups are pretty big (over 5000 posts) what causes the nntp.php script to fail. I'm not sure if it's a timeout or if it just crashes.

The problem is that after I restart it, it imports old news again; I end up with hundreds of duplicate entries. For now, I quit the import until I find a solution. Accoring to a previous post here, crashed newsgroup imports will leave a .lock in errors/.nntp. For almost every bigger group, I have one in there.

Here my two ideas (it's 00:30am, so there might be more Smile ):
1) Could the nntp.php script easily be changed to stop after eg 1000 posts? When running from cron, the board would fill slowly over several updates.
2) Is the number in the newsfiles in errors/.nntp the topic ID? It would involve more work, but no sourcechanges would be needed. Run nntp.php for one group until it stops and fix the file in .nntp manually, then re-run.

I do hope #1 is possible tho Wink
Re: nntp import of large groups creates troubles [message #20240 is a reply to message #20202] Mon, 04 October 2004 13:56 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Good idea.

I've added the feature to CVS. You can find the patch here:
http://cvs.prohost.org/c/index.cgi/FUDforum/chngview?cn=2938

Don't forget to modify the table structure of the SQL nntp table with the new field.


FUDforum Core Developer
Re: nntp import of large groups creates troubles [message #20251 is a reply to message #20202] Tue, 05 October 2004 09:28 Go to previous messageGo to next message
devnull is currently offline  devnull   United States
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
Thanks for the patch!

But... are you sure it works?

I patched the files and modified the SQL table (set it to 500), but when I run scripts/nntp.php, it quits without doing anything.

I'm using the current release with zlib. If it works for you, it must be my config and I have to reinstall. When I go back to the unpatched version, the script runs fine (but times out as we know).

I just want to avoid that, so I ask first Smile
Re: nntp import of large groups creates troubles [message #20254 is a reply to message #20251] Tue, 05 October 2004 15:18 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
There was a little buglet in the code which the following patch fixes:
http://cvs.prohost.org/c/index.cgi/FUDforum/chngview?cn=2945
You may need to clear the state files inside errors/.nntp for the newsgroup you were importing.


FUDforum Core Developer
Re: nntp import of large groups creates troubles [message #20273 is a reply to message #20202] Tue, 05 October 2004 21:46 Go to previous messageGo to next message
devnull is currently offline  devnull   United States
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
Great! Thanks for going through all this Very Happy

After the second install I made tarballs of the board and dumped the database; this way, I can quite easy go back to a fresh version with custom template and complete config. Helped a lot during testing Smile

So far, the double-patched version runs fine. Interestingly, according to the board, the forums have usually 501 posts (I import in steps of 500). I also spotted 502 and 500 counts, but that doesn't bug me now.

One last thing I need some clarity about:
In errors/.nntp/, I still see .lock files for each group. From my understanding, those locks should only exists during importing. Would it work if I simply rm -f *.lock after the update script went through? I guess those left locks only cause problems.
Re: nntp import of large groups creates troubles [message #20276 is a reply to message #20273] Tue, 05 October 2004 23:09 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Not quite. Lock files are harmless and while they can be safely removed there is absolutely no need to do so.

FUDforum Core Developer
Re: nntp import of large groups creates troubles [message #20291 is a reply to message #20276] Wed, 06 October 2004 19:25 Go to previous messageGo to next message
devnull is currently offline  devnull   United States
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
Ahh, ok. I've had "experiences" with left locks on several occasions, so I wanted to make sure.

One last idea:
Although I ran the nntp.php with a limit of 500 posts, it crashed while importing 3 newsgroups, leaving no file in errors/.nntp/. I think the value in those files is the first postid returned from a GROUP request plus the number of already imported posts, right?

It might create a bit more file activity (although Linux handles this in swap), but apart from that, should I change the script to write the file after every imported post? Or are there other reasons to leave it as it is?

In case the groups are read only on the board (like in my case) I think I could also fix this with a perl script.
Re: nntp import of large groups creates troubles [message #20293 is a reply to message #20291] Wed, 06 October 2004 20:27 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
The "position" file must contain only a single #, this number being the id of the last imported message. So to write to this file, after every message you'd need to clear the file and write to it again.
This would be fairly slow, to do for every single message.

Perhaps, it would be more prudent to discover why PHP is crashing and address that problem. Now that you have a message import limit, you have an easy mechanism of prevent the script from timing out or eating too much CPU.


FUDforum Core Developer
Re: nntp import of large groups creates troubles [message #20375 is a reply to message #20202] Mon, 11 October 2004 12:21 Go to previous messageGo to next message
devnull is currently offline  devnull   Germany
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
For a quick solution, I just changed the script so that it writes the last article-id after every retrieved article. Time is something I don't have much right now, so I chose a quick and dirty solution.

Most groups were imported fine that way, and after several runs, up to date.

A few seemed to be stuck; although there were still a lot of articles to import, the script only increased the count by 1. I decided to tcpdump a single run to search for the possible cause.

I've isolated one case where the counter only increases by 1 and checked the article-id file over several runs. The ID does not change, although according to the forum view, one message was imported. The only change I made to the nntp.inc file was to add the following at line 452: $this->set_end($i);

Now, when looking into the dump, I saw that the article with the ID in the file was retrieved. Then the script requested the next ID, and this article was very long (10406 lines). Here the script stopped and the ID in the file has not changed. However, the first message is added to the database; I forgot to check that for some time and now have a lot of dupes (reminds me, is there a dupe search? Very Happy )

I assume it's a problem with the size of the messages.

Attached is the conversation for two messages. The second message is requested, but not added. Then the script ends.
  • Attachment: posts.txt
    (Size: 407.06KB, Downloaded 1531 times)
Re: nntp import of large groups creates troubles [message #20380 is a reply to message #20375] Mon, 11 October 2004 18:43 Go to previous messageGo to next message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
There is no "dupe search", but I guess you can use the regular search engine.
The solution to your problem would be to move the $this->set_end() before $this->get_message().

This would mean that if a message was not imported the forum will start from the next message on the next run rather then trying to import failing message yet again.


FUDforum Core Developer
Re: nntp import of large groups creates troubles [message #20396 is a reply to message #20380] Tue, 12 October 2004 13:38 Go to previous messageGo to next message
devnull is currently offline  devnull   Germany
Messages: 16
Registered: September 2004
Karma: 0
Junior Member
Thanks!

Now that I don't direct the output to /dev/null anymore, I ran over the following message when the update process stops:

<b>Fatal error</b>: Allowed memory size of 8388608 bytes exhausted (tried to allocate 40 bytes) in <b>/home/web/board/include/theme/default/wordwrap.inc</b> on line <b>27</b><br />

Rising the memory in the php.ini to eg 16MB results in the same; it just fills the 16777216 bytes instead.

When things are a bit more quiet here, I think I'll to make a dupe-search in Perl. I have to take a closer look at the db for that first.

A suggestion: avoiding dupes could be quite easily added to the board if a MD5 is generated for each message upon posting. The chances for two different posts with the same MD5 are practically zero. If the field in the db is set to unique, no dupe could be added.

[Updated on: Tue, 12 October 2004 13:40]

Report message to a moderator

Re: nntp import of large groups creates troubles [message #20397 is a reply to message #20396] Tue, 12 October 2004 13:46 Go to previous message
Ilia is currently offline  Ilia   Canada
Messages: 13241
Registered: January 2002
Karma: 0
Senior Member
Administrator
Core Developer
Actually md5 is pretty unreliable. The forum instead natively stores a unique id assigned to each message by NNTP. You can use this field to find and remove dupes.

FUDforum Core Developer
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: nntp import doesn't always convert properly
Next Topic: Tiny bug
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Tue Nov 26 16:38:35 GMT 2024

Total time taken to generate the page: 0.02347 seconds