Uses of Class
org.apache.lucene.analysis.Tokenizer

Packages that use Tokenizer
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens. 
org.apache.lucene.analysis.ar Analyzer for Arabic. 
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). 
org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters). 
org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words. 
org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm. 
org.apache.lucene.analysis.in Analysis components for Indian languages. 
org.apache.lucene.analysis.ja Analyzer for Japanese. 
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters. 
org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames. 
org.apache.lucene.analysis.ru Analyzer for Russian. 
org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex. 
org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. 
 

Uses of Tokenizer in org.apache.lucene.analysis
 

Subclasses of Tokenizer in org.apache.lucene.analysis
 class CharTokenizer
          An abstract base class for simple, character-oriented tokenizers.
 class EmptyTokenizer
          Emits no tokens
 class KeywordTokenizer
          Emits the entire input as a single token.
 class LetterTokenizer
          A LetterTokenizer is a tokenizer that divides text at non-letters.
 class LowerCaseTokenizer
          LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.
 class MockTokenizer
          Tokenizer for testing.
 class WhitespaceTokenizer
          A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
 

Fields in org.apache.lucene.analysis declared as Tokenizer
protected  Tokenizer ReusableAnalyzerBase.TokenStreamComponents.source
           
 

Constructors in org.apache.lucene.analysis with parameters of type Tokenizer
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance.
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source, TokenStream result)
          Creates a new ReusableAnalyzerBase.TokenStreamComponents instance.
 

Uses of Tokenizer in org.apache.lucene.analysis.ar
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ar
 class ArabicLetterTokenizer
          Deprecated. (3.1) Use StandardTokenizer instead.
 

Uses of Tokenizer in org.apache.lucene.analysis.cjk
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cjk
 class CJKTokenizer
          Deprecated. Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.
 

Uses of Tokenizer in org.apache.lucene.analysis.cn
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn
 class ChineseTokenizer
          Deprecated. Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
 

Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
 

Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart
 class SentenceTokenizer
          Tokenizes input text into sentences.
 

Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
 

Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
 class ICUTokenizer
          Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)
 

Uses of Tokenizer in org.apache.lucene.analysis.in
 

Subclasses of Tokenizer in org.apache.lucene.analysis.in
 class IndicTokenizer
          Deprecated. (3.6) Use StandardTokenizer instead.
 

Uses of Tokenizer in org.apache.lucene.analysis.ja
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ja
 class JapaneseTokenizer
          Tokenizer for Japanese that uses morphological analysis.
 

Uses of Tokenizer in org.apache.lucene.analysis.ngram
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ngram
 class EdgeNGramTokenizer
          Tokenizes the input from an edge into n-grams of given size(s).
 class NGramTokenizer
          Tokenizes the input into n-grams of the given size(s).
 

Uses of Tokenizer in org.apache.lucene.analysis.path
 

Subclasses of Tokenizer in org.apache.lucene.analysis.path
 class PathHierarchyTokenizer
          Tokenizer for path-like hierarchies.
 class ReversePathHierarchyTokenizer
          Tokenizer for domain-like hierarchies.
 

Uses of Tokenizer in org.apache.lucene.analysis.ru
 

Subclasses of Tokenizer in org.apache.lucene.analysis.ru
 class RussianLetterTokenizer
          Deprecated. Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
 

Uses of Tokenizer in org.apache.lucene.analysis.standard
 

Subclasses of Tokenizer in org.apache.lucene.analysis.standard
 class ClassicTokenizer
          A grammar-based tokenizer constructed with JFlex
 class StandardTokenizer
          A grammar-based tokenizer constructed with JFlex.
 class UAX29URLEmailTokenizer
          This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.
 

Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
 

Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia
 class WikipediaTokenizer
          Extension of StandardTokenizer that is aware of Wikipedia syntax.