Package org.apache.lucene.analysis.standard

Standards-based analyzers implemented with JFlex.

See:
          Description

Interface Summary
StandardTokenizerInterface Internal interface for supporting versioned grammars.
 

Class Summary
ClassicAnalyzer Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
ClassicFilter Normalizes tokens extracted with ClassicTokenizer.
ClassicTokenizer A grammar-based tokenizer constructed with JFlex
StandardAnalyzer Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
StandardFilter Normalizes tokens extracted with StandardTokenizer.
StandardTokenizer A grammar-based tokenizer constructed with JFlex.
StandardTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29

Tokens produced are of the following types: <ALPHANUM>: A sequence of alphabetic and numeric characters <NUM>: A number <SOUTHEAST_ASIAN>: A sequence of characters from South and Southeast Asian languages, including Thai, Lao, Myanmar, and Khmer <IDEOGRAPHIC>: A single CJKV ideographic character <HIRAGANA>: A single hiragana character

UAX29URLEmailAnalyzer Filters UAX29URLEmailTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.
UAX29URLEmailTokenizer This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
UAX29URLEmailTokenizerImpl This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
 

Package org.apache.lucene.analysis.standard Description

Standards-based analyzers implemented with JFlex.

The org.apache.lucene.analysis.standard package contains three fast grammar-based tokenizers constructed with JFlex: