com.swfit.core.search
Class ScandinavianAnalyzer

java.lang.Object
  |
  +--org.apache.lucene.analysis.Analyzer
        |
        +--com.swfit.core.search.ScandinavianAnalyzer

public final class ScandinavianAnalyzer
extends org.apache.lucene.analysis.Analyzer

The ScandinavianAnalyzer is a required

A ScandinavianAnalyzer is a filter intended to do as little as possible in terms of excluding stop words (there aren't any yet). The SWFIT package is not intended for gigabytes of data, just some hundred HTML documents, and the only effect I am looking for here, is for people without Scandinavian keyboards (and character sets) to be able to find terms spelled with local characters.

A word of advice: You are sincerely adviced to NOT implement this code, as I now far too little about linguistics, unicode, character sets - you name it - to make a valuable contribution to the field. Whatever I do here, should have been done by a professional. Go back and read the last sentence. For all I know, this code will corrupt your data.

Since:
SWFIT1.0
Version:
$Revision: 1.2 $ $Date: 2003/02/03 05:57:34 $
Author:
Olaf Havnes

Constructor Summary
ScandinavianAnalyzer()
          Builds an analyzer with a default hashtable.
ScandinavianAnalyzer(java.util.Hashtable stop_table)
          Builds an analyzer with a Hashtable of given stop words
ScandinavianAnalyzer(java.lang.String[] stop_words)
          Builds an analyzer with an array of Strings of given stop words
 
Method Summary
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName, java.io.Reader reader)
          Constructs a StandardTokenizer filtered by
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ScandinavianAnalyzer

public ScandinavianAnalyzer()
Builds an analyzer with a default hashtable.

ScandinavianAnalyzer

public ScandinavianAnalyzer(java.lang.String[] stop_words)
Builds an analyzer with an array of Strings of given stop words

ScandinavianAnalyzer

public ScandinavianAnalyzer(java.util.Hashtable stop_table)
Builds an analyzer with a Hashtable of given stop words
Method Detail

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldName,
                                                                java.io.Reader reader)
Constructs a StandardTokenizer filtered by
Overrides:
tokenStream in class org.apache.lucene.analysis.Analyzer


Swfit developer homepage
Copyright © 2003 Orgdot AS. All Rights Reserved.