com.swfit.core.search
Class XHTMLPublishedObjectIndex

java.lang.Object
  |
  +--com.swfit.core.search.XHTMLPublishedObjectIndex

public class XHTMLPublishedObjectIndex
extends java.lang.Object

A XHTMLPublishedObjectIndex is a class that instantiates the necessary resources needed to call the Lucene IndexWriter. An important point to remember, is that the Analyzer used to store the data, must be the same Analyzer used to search the data. For that reason any XHTMLPublishedObjectIndex created will be stored in a Hashtable.

Jakarta Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Jakarta Lucene is an open source project available for free download from Apache Jakarta:

     http://jakarta.apache.org/lucene/docs/index.html
 

All in all this is my effort to understand the underlying structure of Lucene. I assume that there are all sorts of smart things one can do to optimize the searching processes, but I see from the FAQs and tutorials that it is recommended to be on the safe side (better safe than clever). And there is a million things I need to figure out first.

Since:
SWFIT1.0
Version:
$Revision: 1.1 $ $Date: 2003/02/02 20:47:25 $
Author:
Olaf Havnes

Field Summary
static java.lang.String ANALYZER_STRING
           
static java.lang.String[] ANALYZER_STRINGS
           
static org.apache.lucene.analysis.Analyzer[] ANALYZERS
          Tools for setting an Analyzer from a parameter.
static org.apache.lucene.index.Term AUTHOR_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term CATEGORY_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term CREATED_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static int DEFAULT_ANALYZER
           
static org.apache.lucene.index.Term DISPLAYTEXT_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term FCREATED_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term FILE_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term FMODIFIED_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term ID_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term LINK_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term LINKTEXT_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static org.apache.lucene.index.Term MODIFIED_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static int SCANDINAVIAN_ANALYZER
           
static org.apache.lucene.index.Term SEARCHTEXT_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
static int STANDARD_ANALYZER
           
static org.apache.lucene.index.Term TITLE_TERM
          Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
 
Method Summary
 org.apache.lucene.analysis.Analyzer analyzer()
           
static org.apache.lucene.analysis.Analyzer analyzer(int analyzer)
           
static int analyzer(java.lang.String analyzer)
           
 void closeReader()
          Close a reader to free resources and enable a writer to write new segments.
 void closeWriter()
          If the writer has the index locked for write, we need to close it to enable the reader to delete docs.
 void freeSearcher()
          Tells this XHTMLPublishedObjectIndex that it is OK to hand out the searcher again.
static XHTMLPublishedObjectIndex getIndexer(java.io.File index_directory, org.apache.lucene.analysis.Analyzer analyzer)
          Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in, a parameter code for the Analyzer to use and a set of standard file types.
static XHTMLPublishedObjectIndex getIndexer(java.lang.String index_directory)
          Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in.
static XHTMLPublishedObjectIndex getIndexer(java.lang.String index_directory, int analyzer)
          Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in and a parameter code for the Analyzer to use.
 org.apache.lucene.search.Searcher getSearcher()
          Check if the resources are unused, create a searcher if necessary, and hand it out.
static java.lang.String id(java.lang.String publish_sub_folder, java.lang.String xhtml_object_type, long creation_date)
           
 void indexXHTMLPublishedObject(XHTMLPublishedObjectList xpobj_list, XHTMLPublishedObject xpobj, boolean add)
           
 void indexXHTMLPublishedObjects(XHTMLPublishedObjectList xpobj_list, XHTMLPublishedObject[] xpobjs, boolean add)
           
 void outFileCheck()
          Go through all the outfiles and check that they actually exist.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ANALYZERS

public static final org.apache.lucene.analysis.Analyzer[] ANALYZERS
Tools for setting an Analyzer from a parameter.

An Analyzer should have no state. From the Official Lucene FAQ:

"An Analyzer object is merely a factory for TokenStream objects and in typical implementations it does not have any state so you can create one during program initialization and keep it in a static variable."

The following are tools for configuring the analyzer though the web.xml file.


STANDARD_ANALYZER

public static final int STANDARD_ANALYZER

SCANDINAVIAN_ANALYZER

public static final int SCANDINAVIAN_ANALYZER

DEFAULT_ANALYZER

public static final int DEFAULT_ANALYZER

ANALYZER_STRING

public static final java.lang.String ANALYZER_STRING

ANALYZER_STRINGS

public static final java.lang.String[] ANALYZER_STRINGS

TITLE_TERM

public static final org.apache.lucene.index.Term TITLE_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

AUTHOR_TERM

public static final org.apache.lucene.index.Term AUTHOR_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

CATEGORY_TERM

public static final org.apache.lucene.index.Term CATEGORY_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

LINK_TERM

public static final org.apache.lucene.index.Term LINK_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

LINKTEXT_TERM

public static final org.apache.lucene.index.Term LINKTEXT_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

CREATED_TERM

public static final org.apache.lucene.index.Term CREATED_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

MODIFIED_TERM

public static final org.apache.lucene.index.Term MODIFIED_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

FCREATED_TERM

public static final org.apache.lucene.index.Term FCREATED_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

FMODIFIED_TERM

public static final org.apache.lucene.index.Term FMODIFIED_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

FILE_TERM

public static final org.apache.lucene.index.Term FILE_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

DISPLAYTEXT_TERM

public static final org.apache.lucene.index.Term DISPLAYTEXT_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

SEARCHTEXT_TERM

public static final org.apache.lucene.index.Term SEARCHTEXT_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects

ID_TERM

public static final org.apache.lucene.index.Term ID_TERM
Terms for storing, sorting, retrieving and displaying XHTMLPublishObjects
Method Detail

analyzer

public final org.apache.lucene.analysis.Analyzer analyzer()

getIndexer

public static XHTMLPublishedObjectIndex getIndexer(java.lang.String index_directory)
                                            throws java.io.IOException
Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in. Indexes the default set of file types, and uses the StandardAnalyzer. The method is static, so that all requests for a XHTMLPublishedObjectIndex first checks if one exist for that directory, before a new instance is created.

Parameters:
index_directory - the File to store the index in.
Returns:
the XHTMLPublishedObjectIndex corresponding to the given directory.

getIndexer

public static XHTMLPublishedObjectIndex getIndexer(java.lang.String index_directory,
                                                   int analyzer)
                                            throws java.io.IOException
Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in and a parameter code for the Analyzer to use. Will index the standard file types. The method is static, so that all requests for a XHTMLPublishedObjectIndex first checks if one exist for that directory.

Parameters:
index_directory - the File to store the index in.
analyzer_type - the Analyzer type as a String.
Returns:
the XHTMLPublishedObjectIndex corresponding to the given directory.

getIndexer

public static final XHTMLPublishedObjectIndex getIndexer(java.io.File index_directory,
                                                         org.apache.lucene.analysis.Analyzer analyzer)
                                                  throws java.io.IOException
Build a XHTMLPublishedObjectIndex with the directory to store the Lucene index in, a parameter code for the Analyzer to use and a set of standard file types. The method is static, so that all requests for a XHTMLPublishedObjectIndex first checks if one exist for that directory.

Parameters:
index_directory - the File to store the index in.
analyzer - the Analyzer to use while indexing.
file_types - the array of Strings denoting the file types to index.
Returns:
the XHTMLPublishedObjectIndex corresponding to the given directory.

analyzer

public static final org.apache.lucene.analysis.Analyzer analyzer(int analyzer)

analyzer

public static final int analyzer(java.lang.String analyzer)

getSearcher

public final org.apache.lucene.search.Searcher getSearcher()
                                                    throws java.io.IOException,
                                                           java.lang.InterruptedException
Check if the resources are unused, create a searcher if necessary, and hand it out.

Returns:
the Searcher corresponding to this index.

freeSearcher

public final void freeSearcher()
Tells this XHTMLPublishedObjectIndex that it is OK to hand out the searcher again.

closeReader

public final void closeReader()
                       throws java.io.IOException
Close a reader to free resources and enable a writer to write new segments. At the same time we delete any searcher (s) built on this reader. Maybe there could be more than one searchers?

closeWriter

public final void closeWriter()
                       throws java.io.IOException
If the writer has the index locked for write, we need to close it to enable the reader to delete docs. In addition this method should be called by f.i a servlet after indexing a directory

id

public static final java.lang.String id(java.lang.String publish_sub_folder,
                                        java.lang.String xhtml_object_type,
                                        long creation_date)

indexXHTMLPublishedObject

public final void indexXHTMLPublishedObject(XHTMLPublishedObjectList xpobj_list,
                                            XHTMLPublishedObject xpobj,
                                            boolean add)
                                     throws java.lang.InterruptedException,
                                            java.io.IOException

indexXHTMLPublishedObjects

public final void indexXHTMLPublishedObjects(XHTMLPublishedObjectList xpobj_list,
                                             XHTMLPublishedObject[] xpobjs,
                                             boolean add)
                                      throws java.lang.InterruptedException,
                                             java.io.IOException

outFileCheck

public final void outFileCheck()
                        throws java.lang.InterruptedException,
                               java.io.IOException
Go through all the outfiles and check that they actually exist. Delete any documents that don't have a textual representation. This method should only be of any use if a folder has been deleted.


Swfit developer homepage
Copyright © 2003 Orgdot AS. All Rights Reserved.