phpBB2.de Foren-Übersicht  
[netclusive - internet broadcasting]
 FAQ  •  Suchen  •  Download  •  Lesezeichen  •  Mitgliederliste  •  Benutzergruppen   •  Registrieren  •  Profil  •  Einloggen, um private Nachrichten zu lesen  •  Login
 
        
 
        
 

LinkLift

Nächstes Thema anzeigen
Vorheriges Thema anzeigen

Dieses Forum ist gesperrt, du kannst keine Beiträge editieren, schreiben oder beantworten.Dieses Thema ist gesperrt, du kannst keine Beiträge editieren oder beantworten.
Autor Nachricht
R. U. Serious
phpBB2.de User
phpBB2.de User



Anmeldungsdatum: 06.08.2002
Beiträge: 48

BeitragVerfasst am: So 11 Aug, 2002 09:56 Antworten mit ZitatNach oben 

Kurzes Vorwort: Ich hab das für phpbb.com zusammengeschrieben. Ich finde es von der Sprache her relativ einfach verständlich. Hoffe ihr kommt klar. Aber das ganze nochmal zu übersetzen bin ich im Moment zu faul für. Very Happy (Aber wenn es jemand der alles versteht machen will, nur zu Smile)

Overview
  1. Introduction
  2. Fix functions_search.php (in 2.0.1)
  3. Make a search_stopwords for heavily indexed words.
  4. Delete search_stopwords from your searchtables

  1. Introduction
    The search function of phpBB uses two tables: search_wordlist which has a unique entry for each word that has ever been typed in your forum, and search_wordmatch which keeps track of which posts match which words from the search_wordlist.
    This is great for speed when searching, but has its drawback in the size of the searchtables. If you have more than enough db-space then don't mind reading this HowTo till the end. It will be of no/little relevance to you.. If however you are interested in reducing you db-size this is just for you! However you will need some knowledge in executing queries in your db (e.g. through phpmyadmin).
    With steps I describe in this HowTo I reduced my total db-size between 15-20%. (More specifically about 1 MB for a 3800 posts-forum (6.5MB to 5.4 MB)). Your results may vary greatly. This is not one of those "Loose weight fast"-scams; No, this really(!) works. I know, it sounds really cheep, but you wouldn't guess how many ppl. fall for these lines... *g*

    phpBB already brings functionality to avoid unnecessary growing of the searchtables. For exapmle it will mark 'heavily used' words with a common_word tag in the search_wordlist and stop adding further matches of this word in the search_wordmatch. For me this was only the case for a single word, so it is more like an "emergency exit", seldomly used, but good to have Wink
    Another thing phpBB does (and which I will concentrate on) is: it can use (and already uses, if you have english as default language) a blacklist of words, and all words on this list are not indexed at all, never. This blacklist is stored in a simple .txt file with one word per line (language/lang_esperanto/search_stopwords.txt). A list for the english language is already delivered with the regular download from your preferred phpBB-Dealer (just kidding Mr. Green Only d/l phpBB from this site: www.phpbb.com ! Wink ).

    This HowTo will tell you how to make a little fix so that this list gets actually used. I submitted it as a bug report and guess the fix will be in 2.0.2.
    Further it shows you how to find out the most used (and thereby) indexed words. And finally it provides you with a little script that will remove all search_entries from your list of words in search_stopwords.txt.


  2. Fix functions_search.php (in 2.0.1)
    This fix will probably be in 2.0.2. It was not in CVS 10. Aug. '02. If you use an older Version, you probably will have to make this fix.
    Open includes/functions_search.php and go to line 108 and 109, you will find
    Code:
       $search_raw_words['text'] = split_words(clean_words('post', $post_text, $stopword_array, $synonym_array));
       $search_raw_words['title'] = split_words(clean_words('post', $post_title, $stopword_array, $synonym_array));
    Please replace with
    Code:
       $search_raw_words['text'] = split_words(clean_words('post', $post_text, $stopwords_array, $synonym_array));
       $search_raw_words['title'] = split_words(clean_words('post', $post_title, $stopwords_array, $synonym_array));
    The change is very subtle, a single 's' per line in the variable $stopwords_array. edit: Geez, looks like I had mixed it up, it's corrected now. Thanks Ashe!
    This change is necessary so phpBB does not index words on the blacklist. Because of the typo the blacklist was always assumed to be empty.

  3. Make a search_stopwords for heavily indexed words.
    If you are using the english language as default, you may possibly skip this point, although I suggest you don't, because words used at different sites vary. At the two sites I am involved with there were a lot of different Top-Used-Words.

    To make the list very effective you don't want it to be unnecessarily long and at the same time include words that are part of the commonly used language. So they are used very often, yet do not indicate any kind of special information, for example " the, they, can, you, although, almost" etc. In German my Top-Used-Words were "der, die, das, ich" (meaning the, the, the, I ). To find out your list of Top-Used-Words you can run this query on your DB (e.g. in phpmyadmin):
    Code:

    SELECT ls.word_id, ls.word_text, COUNT(wm.word_id)  as entries FROM `phpbb_search_wordlist` as ls LEFT JOIN `phpbb_search_wordmatch` as wm ON ls.word_id=wm.word_id GROUP BY wm.word_id ORDER BY entries DESC LIMIT 0,50

    This will return a list of your 50 Top-Used-Words in descending order of their occurence, with three columns:
    • the word_id - this is unique for every word and indicates the relevant entries in the two tables.
    • word_text - this shows you the actual word
    • entries - the number of entries in search_wordmatch for this word.

    Now look at each entry in that list and decide wether you want to keep it in your index or if you are going to delete it and make searching for it impossible. I used all but a handful of the words.

    Save these words in normal text file with one word per line and name that file search_stopwords.txt. Now put this list in the directory: language/lang_esperanto/search_stopwords.txt where esperanto is the language that is set as default-language in your general board-configuration in your AdminCP.

    Now these words will no longer be indexed for the searchfunction.

  4. Delete search_stopwords from your searchtables
    Now you want to remove the thousands of entries of the blacklisted searwords from your searchtables. And after that you should optmize the two tables.I have included an example script, that will do this. It is not pretty, but it works (for me). I have mySQL 3.23 and php 4.1, you might need to make modifications if you have a different setup.
    After you have run the script, you should delete it from the server (although it should not be dangerous to run it twice).

    That's it. Hopefully this works for you. Enjoy Smile

Code:

<?php
//*****  reduce_my_searchtables_with_stopwords.php ****//

define('IN_PHPBB', true);
$phpbb_root_path = './';
include($phpbb_root_path . 'extension.inc');
include($phpbb_root_path . 'common.'.$phpEx);
include($phpbb_root_path . 'includes/functions_search.'.$phpEx);

// Start session management
$userdata = session_pagestart($user_ip, PAGE_SEARCH);
init_userprefs($userdata);
// End session management

$stopwords_array = file($phpbb_root_path . 'language/lang_' . $board_config['default_lang'] . "/search_stopwords.txt");

$liste='';
foreach($stopwords_array as $curr_word)
{
      $liste .= ( ( $liste != '' ) ? ', ' : '' ) ."'".trim($curr_word)."'";
}

   $sql = "SELECT word_id
      FROM " . SEARCH_WORD_TABLE . "
      WHERE word_text IN ($liste)";
   if ( !($result = $db->sql_query($sql)) )
   {
      message_die(GENERAL_ERROR, 'Could not obtain common word list', '', __LINE__, __FILE__, $sql);
   }

   $common_word_id = '';
   while ( $row = $db->sql_fetchrow($result) )
   {
      $common_word_id .= ( ( $common_word_id != '' ) ? ', ' : '' ) . $row['word_id'];
   }

if ($common_word_id=='') message_die(GENERAL_ERROR,'None of the words in the list are in your search_tables.<br>Note: This could also mean the list is empty ;)');
   //echo '>'.trim($curr_word)."<<br>";
   //echo $liste .'<br>'. $common_word_id;
   //exit;

   $sql = "DELETE FROM " . SEARCH_WORD_TABLE . " 
      WHERE word_id IN ($common_word_id)";
   if ( !$db->sql_query($sql) )
   {
      message_die(GENERAL_ERROR, 'Could not delete word match entry', '', __LINE__, __FILE__, $sql);
   }
   $sql = "OPTIMIZE TABLE " . SEARCH_WORD_TABLE;
   if ( !$db->sql_query($sql) )
   {
      message_die(GENERAL_ERROR, 'Could not optimize', '', __LINE__, __FILE__, $sql);
   }

   $sql = "DELETE FROM " . SEARCH_MATCH_TABLE . " 
      WHERE word_id IN ($common_word_id)";
   if ( !$db->sql_query($sql) )
   {
      message_die(GENERAL_ERROR, 'Could not delete word match entry', '', __LINE__, __FILE__, $sql);
   }
   $sql = "OPTIMIZE TABLE " . SEARCH_MATCH_TABLE;
   if ( !$db->sql_query($sql) )
   {
      message_die(GENERAL_ERROR, 'Could not pütimize', '', __LINE__, __FILE__, $sql);
   }
   
   message_die(GENERAL_MESSAGE,'<b>Done!</b><br><br>The following list-entries have been removed from your searchtables:'.$liste);
//echo $liste .'<br>'. $common_word_id;

?>

Disclaimer: You yourself are fully responsible for what you do and wether you follow what some weirdo you never met tells you to. So don't be mad or hold someone else reponsible if somehow, something screws up you your forum while you are trying any of the things described here. Wink
OfflineBenutzer-Profile anzeigenPrivate Nachricht senden
Google







Verfasst am: Nach oben

Beiträge der letzten Zeit anzeigen:      
Dieses Forum ist gesperrt, du kannst keine Beiträge editieren, schreiben oder beantworten.Dieses Thema ist gesperrt, du kannst keine Beiträge editieren oder beantworten.

Nächstes Thema anzeigen
Vorheriges Thema anzeigen
Du kannst keine Beiträge in dieses Forum schreiben.
Du kannst auf Beiträge in diesem Forum nicht antworten.
Du kannst deine Beiträge in diesem Forum nicht bearbeiten.
Du kannst deine Beiträge in diesem Forum nicht löschen.
Du kannst an Umfragen in diesem Forum nicht mitmachen.
Du kannst Dateien in diesem Forum posten
Du kannst Dateien in diesem Forum herunterladen

Ähnliche Beiträge
Thema Autor Forum Antworten Verfasst am
Fatal error: Allowed memory size ZMM-Electronic Support allgemein 6 Mi 05 Sep, 2007 21:00 Letzten Beitrag anzeigen
Allowed memory size exhausted in port... bikesonshow Support Forum 1 Do 12 Jul, 2007 01:09 Letzten Beitrag anzeigen
Andere Programmiersprachen im Editor.... Robertico Support allgemein 1 Fr 15 Jun, 2007 09:22 Letzten Beitrag anzeigen
ShoutBox Size and Place... Rahulkagal Support Forum 1 Sa 21 Apr, 2007 14:52 Letzten Beitrag anzeigen
Allowed memory size - Forum zu voll??? Vectra18 Support allgemein 8 Mi 17 Jan, 2007 10:53 Letzten Beitrag anzeigen