FUDforum: comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?

Home » Imported messages » comp.lang.php » I Need to search over 100 largeish text documents efficiently. What's the best approach?

Show: Today's Messages :: Polls :: Message Navigator

Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? [message #184744 is a reply to message #184743]

Mon, 27 January 2014 12:23

Denis McMahon
Messages: 634
Registered: September 2010

Karma:

Senior Member

On Mon, 27 Jan 2014 10:58:42 +0100, Arno Welzel wrote:

> Am 27.01.2014 02:43, schrieb Denis McMahon:
>
>> On Sun, 26 Jan 2014 05:34:21 -0800, rob.bradford2805 wrote:
>>
>>> What is the best/fastest approach to scan 100+ largish text files for
>>> word strings
>>
>> A quick googling finds:
>>
>> http://sourceforge.net/projects/php-grep/
>> http://net-wrench.com/download-tools/php-grep.php
>>
>> Claims to be able to search 1000 files in under 10 secs
>
> Under ideal conditions - maybe. But if each file is more than 1 MB, it
> is barely possible to even read this amount of data in just 10 seconds
> (assuming around 80 MB/s and 1000 MB of data to be searched).
>
> Even using a simple word index (word plus the name of the file(s) and
> the position(s) where the word is located) would be the better solution.

Indeed, the fastest solution would be to index each file when it changes,
and keep the indexes in a db.

Perhaps there are common words you wouldn't index, in english these might
include:

a the in on an this that then ....

Then if you have a search phrase, remove the common words, look for the
uncommon words in close proximity to each other

It might help to know more about the grep too, is this using complex
regexp, or is it a simple string search done externally using grep.

--
Denis McMahon, denismfmcmahon(at)gmail(dot)com

Report message to a moderator

[Message index]

		I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Rob Bradford on Sun, 26 January 2014 13:34
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Richard Damon on Sun, 26 January 2014 14:09
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Michael Vilain on Sun, 26 January 2014 21:34
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Jerry Stuckle on Sun, 26 January 2014 15:29
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Sun, 26 January 2014 20:14
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: The Natural Philosoph on Sun, 26 January 2014 19:56
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Ben Bacarisse on Sun, 26 January 2014 20:55
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Mon, 27 January 2014 01:43
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Arno Welzel on Mon, 27 January 2014 09:58
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Denis McMahon on Mon, 27 January 2014 12:23
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Ben Bacarisse on Mon, 27 January 2014 17:05
		Re: I Need to search over 100 largeish text documents efficiently. What's the best approach? By: Rob Bradford on Mon, 27 January 2014 21:14

Previous Topic:	include capturing wrong value
Next Topic:	help with preg_match pattern

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 17 10:47:32 GMT 2025

Total time taken to generate the page: 0.04543 seconds