org.apache.lucene.analysis.cjk
Class CJKAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.apache.lucene.analysis.cjk.CJKAnalyzer
All Implemented Interfaces:
Closeable

public final class CJKAnalyzer
extends StopwordAnalyzerBase

An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
static String[] STOP_WORDS
          Deprecated. use getDefaultStopSet() instead
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
CJKAnalyzer(Version matchVersion)
          Builds an analyzer which removes words in getDefaultStopSet().
CJKAnalyzer(Version matchVersion, Set<?> stopwords)
          Builds an analyzer with the given stop words
CJKAnalyzer(Version matchVersion, String... stopWords)
          Deprecated. use CJKAnalyzer(Version, Set) instead
 
Method Summary
protected  ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.
static Set<?> getDefaultStopSet()
          Returns an unmodifiable instance of the default stop-words set.
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STOP_WORDS

@Deprecated
public static final String[] STOP_WORDS
Deprecated. use getDefaultStopSet() instead
An array containing some common English words that are not usually useful for searching and some double-byte interpunctions.

Constructor Detail

CJKAnalyzer

public CJKAnalyzer(Version matchVersion)
Builds an analyzer which removes words in getDefaultStopSet().


CJKAnalyzer

public CJKAnalyzer(Version matchVersion,
                   Set<?> stopwords)
Builds an analyzer with the given stop words

Parameters:
matchVersion - lucene compatibility version
stopwords - a stopword set

CJKAnalyzer

@Deprecated
public CJKAnalyzer(Version matchVersion,
                              String... stopWords)
Deprecated. use CJKAnalyzer(Version, Set) instead

Builds an analyzer which removes words in the provided array.

Parameters:
stopWords - stop word array
Method Detail

getDefaultStopSet

public static Set<?> getDefaultStopSet()
Returns an unmodifiable instance of the default stop-words set.

Returns:
an unmodifiable instance of the default stop-words set.

createComponents

protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                      Reader reader)
Description copied from class: ReusableAnalyzerBase
Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.

Specified by:
createComponents in class ReusableAnalyzerBase
Parameters:
fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader
reader - the reader passed to the Tokenizer constructor
Returns:
the ReusableAnalyzerBase.TokenStreamComponents for this analyzer.