BulgarianAnalyzer (Lucene 4.0.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.bg
Class BulgarianAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.lucene.analysis.util.StopwordAnalyzerBase
          org.apache.lucene.analysis.bg.BulgarianAnalyzer

All Implemented Interfaces:: Closeable

public final class BulgarianAnalyzer
extends StopwordAnalyzerBase
extends StopwordAnalyzerBase

Analyzer for Bulgarian.

This analyzer implements light-stemming as specified by: Searching Strategies for the Bulgarian Language http://members.unine.ch/jacques.savoy/Papers/BUIR.pdf

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
`Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents`

Field Summary
`static String`	`DEFAULT_STOPWORD_FILE` File containing default Bulgarian stopwords.

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
`matchVersion, stopwords`

Constructor Summary
`BulgarianAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words: `DEFAULT_STOPWORD_FILE`.
`BulgarianAnalyzer(Version matchVersion, CharArraySet stopwords)` Builds an analyzer with the given stop words.
`BulgarianAnalyzer(Version matchVersion, CharArraySet stopwords, CharArraySet stemExclusionSet)` Builds an analyzer with the given stop words and a stem exclusion set.

Method Summary
`Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates a `Analyzer.TokenStreamComponents` which tokenizes all the text in the provided `Reader`.
`static CharArraySet`	`getDefaultStopSet()` Returns an unmodifiable instance of the default stop-words set.

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
`getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet`

Methods inherited from class org.apache.lucene.analysis.Analyzer
`close, getOffsetGap, getPositionIncrementGap, initReader, tokenStream`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_STOPWORD_FILE

public static final String DEFAULT_STOPWORD_FILE

File containing default Bulgarian stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.

See Also:: Constant Field Values

Constructor Detail

BulgarianAnalyzer

public BulgarianAnalyzer(Version matchVersion)

Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.

BulgarianAnalyzer

public BulgarianAnalyzer(Version matchVersion,
                         CharArraySet stopwords)

Builds an analyzer with the given stop words.

BulgarianAnalyzer

public BulgarianAnalyzer(Version matchVersion,
                         CharArraySet stopwords,
                         CharArraySet stemExclusionSet)

Builds an analyzer with the given stop words and a stem exclusion set. If a stem exclusion set is provided this analyzer will add a KeywordMarkerFilter before BulgarianStemFilter.

Method Detail

getDefaultStopSet

public static CharArraySet getDefaultStopSet()

Returns an unmodifiable instance of the default stop-words set.

Returns:: an unmodifiable instance of the default stop-words set.

createComponents

public Analyzer.TokenStreamComponents createComponents(String fieldName,
                                                       Reader reader)

Creates a Analyzer.TokenStreamComponents which tokenizes all the text in the provided Reader.

Specified by:: createComponents in class Analyzer

Returns:: A Analyzer.TokenStreamComponents built from an StandardTokenizer filtered with StandardFilter, LowerCaseFilter, StopFilter , KeywordMarkerFilter if a stem exclusion set is provided and BulgarianStemFilter.

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.bg Class BulgarianAnalyzer

DEFAULT_STOPWORD_FILE

BulgarianAnalyzer

BulgarianAnalyzer

BulgarianAnalyzer

getDefaultStopSet

createComponents

org.apache.lucene.analysis.bg
Class BulgarianAnalyzer