UAX29URLEmailAnalyzer (Lucene 3.6.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.standard
Class UAX29URLEmailAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.ReusableAnalyzerBase
          org.apache.lucene.analysis.StopwordAnalyzerBase
              org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer

All Implemented Interfaces:: Closeable

public final class UAX29URLEmailAnalyzer
extends StopwordAnalyzerBase
extends StopwordAnalyzerBase

Filters UAX29URLEmailTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

You must specify the required Version compatibility when creating UAX29URLEmailAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
`ReusableAnalyzerBase.TokenStreamComponents`

Field Summary
`static int`	`DEFAULT_MAX_TOKEN_LENGTH` Default maximum allowed token length
`static Set<?>`	`STOP_WORDS_SET` An unmodifiable set containing some common English words that are usually not useful for searching.

Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
`matchVersion, stopwords`

Constructor Summary
`UAX29URLEmailAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words (`STOP_WORDS_SET`).
`UAX29URLEmailAnalyzer(Version matchVersion, Reader stopwords)` Builds an analyzer with the stop words from the given reader.
`UAX29URLEmailAnalyzer(Version matchVersion, Set<?> stopWords)` Builds an analyzer with the given stop words.

Method Summary
`protected ReusableAnalyzerBase.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates a new `ReusableAnalyzerBase.TokenStreamComponents` instance for this analyzer.
`int`	`getMaxTokenLength()`
`void`	`setMaxTokenLength(int length)` Set maximum allowed token length.

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
`getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet`

Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
`initReader, reusableTokenStream, tokenStream`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH

Default maximum allowed token length

See Also:: Constant Field Values

STOP_WORDS_SET

public static final Set<?> STOP_WORDS_SET

An unmodifiable set containing some common English words that are usually not useful for searching.

Constructor Detail

UAX29URLEmailAnalyzer

public UAX29URLEmailAnalyzer(Version matchVersion,
                             Set<?> stopWords)

Builds an analyzer with the given stop words.

Parameters:: matchVersion - Lucene version to match See above; stopWords - stop words

UAX29URLEmailAnalyzer

public UAX29URLEmailAnalyzer(Version matchVersion)

Builds an analyzer with the default stop words (STOP_WORDS_SET).

Parameters:: matchVersion - Lucene version to match See above

UAX29URLEmailAnalyzer

public UAX29URLEmailAnalyzer(Version matchVersion,
                             Reader stopwords)
                      throws IOException

Builds an analyzer with the stop words from the given reader.

Parameters:: matchVersion - Lucene version to match See above; stopwords - Reader to read stop words from
Throws:: IOException
See Also:: WordlistLoader.getWordSet(java.io.Reader, org.apache.lucene.util.Version)

Method Detail

setMaxTokenLength

public void setMaxTokenLength(int length)

Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or reusableTokenStream is called.

getMaxTokenLength

public int getMaxTokenLength()

See Also:: setMaxTokenLength(int)

createComponents

protected ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                      Reader reader)

Description copied from class: ReusableAnalyzerBase

Creates a new ReusableAnalyzerBase.TokenStreamComponents instance for this analyzer.

Specified by:: createComponents in class ReusableAnalyzerBase

Parameters:: fieldName - the name of the fields content passed to the ReusableAnalyzerBase.TokenStreamComponents sink as a reader; reader - the reader passed to the Tokenizer constructor
Returns:: the ReusableAnalyzerBase.TokenStreamComponents for this analyzer.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.standard Class UAX29URLEmailAnalyzer

DEFAULT_MAX_TOKEN_LENGTH

STOP_WORDS_SET

UAX29URLEmailAnalyzer

UAX29URLEmailAnalyzer

UAX29URLEmailAnalyzer

setMaxTokenLength

getMaxTokenLength

createComponents

org.apache.lucene.analysis.standard
Class UAX29URLEmailAnalyzer