FUDforum
Fast Uncompromising Discussions. FUDforum will get your users talking.

Home » Imported messages » comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?
Show: Today's Messages :: Polls :: Message Navigator
Return to the default flat view Create a new topic Submit Reply
Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? [message #184745 is a reply to message #184743] Mon, 27 January 2014 17:05 Go to previous messageGo to previous message
Ben Bacarisse is currently offline  Ben Bacarisse
Messages: 82
Registered: November 2013
Karma:
Member
Arno Welzel <usenet(at)arnowelzel(dot)de> writes:

> Am 27.01.2014 02:43, schrieb Denis McMahon:
>
>> On Sun, 26 Jan 2014 05:34:21 -0800, rob.bradford2805 wrote:
>>
>>> What is the best/fastest approach to scan 100+ largish text files for
>>> word strings
>>
>> A quick googling finds:
>>
>> http://sourceforge.net/projects/php-grep/
>> http://net-wrench.com/download-tools/php-grep.php
>>
>> Claims to be able to search 1000 files in under 10 secs
>
> Under ideal conditions - maybe. But if each file is more than 1 MB, it
> is barely possible to even read this amount of data in just 10 seconds
> (assuming around 80 MB/s and 1000 MB of data to be searched).

There are so many variable here; we don't know what "largeish" means and
we don't know what sort of grep is being done (to count lines, display
lines, or just to find matching files?).

Anyway, I tried a couple of there out. With the most naive PHP grep
imaginable, I can do the equivalent of grep -l (to find matching files)
in reasonable time. With 987M of data in 201 5M files, it takes about
0.5 second in PHP compared to about 0.15 seconds for grep. I recon if
that is what is being done, then it's well worth it as a temporary
solution.

Counting matching lines is much slower: about 17s for PHP vs 1s for
grep. Although the server is likely to be faster than my laptop, the
ratios might be useful data.

--
Ben.
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: include capturing wrong value
Next Topic: help with preg_match pattern
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ]

Current Time: Sat Nov 23 13:40:30 GMT 2024

Total time taken to generate the page: 0.04012 seconds