|
|
|
|
|
|
|
|
|
|
Re: no results when a search includes a numerical character [message #40018 is a reply to message #25783] |
Fri, 04 January 2008 11:41 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Tue, 21 June 2005 14:12 | The indexing of the text is done by code inside isearch.inc.t, specifically the text_to_worda() function.
|
Is it just a matter of changing
default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1));
to
default:
$t1 = array_unique(str_word_count(strip_tags(strtolower($text)), 1,1234567890));
(and then rebuild the search index)
TIA
Simon Child
|
|
|
|
|
Re: no results when a search includes a numerical character [message #40854 is a reply to message #40031] |
Sat, 19 April 2008 23:12 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
![United Kingdom United Kingdom](/forum/images/flags/gb.png) Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Sun, 06 January 2008 17:14 | You will need PHP 5.1.0 or greater since the feature was introduced (by me BTW ) in that release.
|
OK, I'm ready to make use of your feature now ![Smile](images/smiley_icons/icon_smile.gif)
Server is scheduled for upgrade to php5 next week, then I want to enable searching on numbers as described above.
The part that concerns me is the rebuilding of the search index, which is said to take a 'long time'. So how long is a 'long time'? I know you can't answer that, but... ![Smile](images/smiley_icons/icon_smile.gif)
My msg_1 file is 120Mb. Current fud_index table has 4,500,000 rows. Server is lightly loaded, loadav usually below 1, have just upgraded the ram to 1.5Gb so it is not needing the pagefile.
Are we talking 30 minutes, or 10 hours, or what? It's a managed server and has a php timeout which I can't increase (I think it is 100minutes). Can I run this reindex from the command line so it runs faster?
What happens if the index is only partially rebuilt? Then I have no useable index at all... Would it be a case then of reinstalling the old fud_index table from backup?
I assume the forum is unavailable during the rebuild?
Thanks
Simon Child
|
|
|
|
|
|
Re: no results when a search includes a numerical character [message #40897 is a reply to message #40876] |
Sat, 26 April 2008 10:30 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
![United Kingdom United Kingdom](/forum/images/flags/gb.png) Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Tue, 22 April 2008 18:54 | Well, the speed will be pretty minimal, so you may as well consider it disabled...
|
Well...
It only took 40minutes. I ran it from commandline, niced, and it didn't bring down the server (though forum wouldn't load during this time).
However, it hasn't indexed numbers.
Before doing the reindexing, having updated the template and rebuilt the theme, I waited a couple of days to check that new posts were being indexed including numbers, and they were. Some new posts, with strings including numbers, could be found by searching those strings, e.g. GP2GP
After rebuilding the index, the search no longer finds those posts.
The rebuild appears to be successful. It cleared the original index, and after the reindex fud_index contains the same number of records as before, and searches for standard strings (text-only e.g. 'test' still works). It just didn't index numbers.
So, does the index rebuild script not make use of word_to_texta? Do I have to make changes somewhere else as well?
using php 5.2.5, mysql 5
Thanks
Simon Child
|
|
|
|
Re: no results when a search includes a numerical character [message #40925 is a reply to message #40919] |
Mon, 28 April 2008 23:17 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
![United Kingdom United Kingdom](/forum/images/flags/gb.png) Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Mon, 28 April 2008 22:49 | It does use the word_to_texta() function, but perhaps the numbers were shorter then the minimum word length?
|
That specific 'word' (GP2GP) was not being indexed before I rebuilt the theme.
After I made the changes to word_to_texta() some posts containing that were indexed and could be found by searching on GP2GP (GP2GP may not mean anything to you, but it is of interest to my forum visitors!)
After I rebuilt the index, no posts including numbers were findable in a search, including those containing GP2GP which were indexed before I rebuilt.
Since I rebuilt, two new posts containing GP2GP have been indexed and can be found in a search for GP2GP, but that is all.
Does the index rebuild use a different minimum word length? Where is the word length set? Incidentally, if you mean the mysql fulltext search word length, I have that set to three characters.
Thanks
Simon Child
|
|
|
|
|
|
Re: no results when a search includes a numerical character [message #40973 is a reply to message #40951] |
Sat, 03 May 2008 11:02 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
![United Kingdom United Kingdom](/forum/images/flags/gb.png) Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Thu, 01 May 2008 00:38 | In your case you need to change the call to function str_word_count()
making it into
array_unique(str_word_count(strtolower($text), 1, '0123456789'));
|
Hmm, I'd missed off the quotes:
array_unique(str_word_count(strtolower($text),1,1234567890));
But putting them back, rebuilding the theme, checking that include/theme/default/isearch.inc has been updated - it has
array_unique(str_word_count(strtolower($text),1,'1234567890'));
then rebuilding the search index... still not indexing 'GP2GP' in the rebuilt index, even though new posts with that are being indexed as they arrive, and the index rebuild does work otherwise.
I see that indexdb.php does include isearch.inc, but it must be doing something different with it?
Simon Child
|
|
|
|
|
|
|
|
Re: no results when a search includes a numerical character - FIXED [message #41036 is a reply to message #41018] |
Sun, 11 May 2008 10:32 ![Go to previous message Go to previous message](/forum/theme/default/images/up.png) ![Go to next message Go to next message](/forum/theme/default/images/down.png) |
srchild
![United Kingdom United Kingdom](/forum/images/flags/gb.png) Messages: 88 Registered: December 2003 Location: UK
Karma: 1
|
Member |
|
|
Ilia wrote on Thu, 08 May 2008 00:47 | Set locale to C.
|
I found it was already finding it as C.
However stepping through the code I have found the cause, a bug in function text_to_worda
function text_to_worda($text)
{
$a = array();
/* if no good locale, default to splitting by spaces */
if (!$GLOBALS['good_locale']) {
$GLOBALS['usr']->lang = 'latvian';
}
$text = strip_tags(reverse_fmt($text));
while (1) {
switch ($GLOBALS['usr']->lang) {
case 'chinese_big5':
case 'chinese':
case 'japanese':
case 'korean':
return mb_word_split($text, $GLOBALS['usr']->lang);
break;
case 'latvian':
case 'russian-1251':
$t1 = array_unique(preg_split('![\x00-\x40]+!', $text, -1, PREG_SPLIT_NO_EMPTY));
break;
default:
$t1 = array_unique(str_word_count(strtolower($text),1,'1234567890'));
if ($text && !$t1) { /* fall through to split by special chars */
$GLOBALS['usr']->lang = 'latvian';
continue;
}
break;
}
The first time through if finds locale as C and language as English, and so as desired goes to 'default':
array_unique(str_word_count(strtolower($text),1,'1234567890'));
However, if any message makes it fall through this:
if ($text && !$t1) { /* fall through to split by special chars */
$GLOBALS['usr']->lang = 'latvian';
continue;
}
then $GLOBALS['usr']->lang is set to Latvian and this persists for the rest of the reindex, affecting parsing of every subsequent message.
When indexing a single message it wouldn't matter that $GLOBALS['usr']->lang gets set to Latvian, since the next message would be a fresh start with it set to English once more. But with the reindex running through all messages in once script, then every subsequent message is processed as though language is Latvian.
So I just changed three lines like this:
function text_to_worda($text)
{
$a = array();
/* if no good locale, default to splitting by spaces */
if (!$GLOBALS['good_locale']) {
$GLOBALS['usr']->lang = 'latvian';
}
// use local variable for message language
$thismessagelang = $GLOBALS['usr']->lang;
$text = strip_tags(reverse_fmt($text));
while (1) {
// switch ($GLOBALS['usr']->lang) {
// switch on message language
switch ($thismessagelang) {
case 'chinese_big5':
case 'chinese':
case 'japanese':
case 'korean':
return mb_word_split($text, $GLOBALS['usr']->lang);
break;
case 'latvian':
case 'russian-1251':
$t1 = array_unique(preg_split('![\x00-\x40]+!', $text, -1, PREG_SPLIT_NO_EMPTY));
break;
default:
$t1 = array_unique(str_word_count(strtolower($text),1,'1234567890'));
if ($text && !$t1) { /* fall through to split by special chars */
// if resetting language, do it locally not globally
// $GLOBALS['usr']->lang = 'latvian';
$thismessagelang = 'latvian';
continue;
}
break;
}
This seems to have fixed it for me, my index now includes numbers as required.
Thanks
Simon Child
|
|
|
|
|