|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.util.AttributeSource org.apache.lucene.analysis.TokenStream org.apache.lucene.analysis.Tokenizer org.apache.lucene.analysis.CharTokenizer org.apache.lucene.analysis.LetterTokenizer org.apache.lucene.analysis.ar.ArabicLetterTokenizer
StandardTokenizer
instead.
@Deprecated public class ArabicLetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Version
compatibility when creating
ArabicLetterTokenizer
:
CharTokenizer
uses an int based API to normalize and
detect token characters. See isTokenChar(int)
and
CharTokenizer.normalize(int)
for details.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
---|
AttributeSource.AttributeFactory, AttributeSource.State |
Field Summary |
---|
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
---|
input |
Constructor Summary | |
---|---|
ArabicLetterTokenizer(AttributeSource.AttributeFactory factory,
Reader in)
Deprecated. use ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0. |
|
ArabicLetterTokenizer(AttributeSource source,
Reader in)
Deprecated. use ArabicLetterTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0. |
|
ArabicLetterTokenizer(Reader in)
Deprecated. use ArabicLetterTokenizer(Version, Reader) instead. This will
be removed in Lucene 4.0. |
|
ArabicLetterTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader in)
Deprecated. Construct a new ArabicLetterTokenizer using a given AttributeSource.AttributeFactory . |
|
ArabicLetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Deprecated. Construct a new ArabicLetterTokenizer using a given AttributeSource . |
|
ArabicLetterTokenizer(Version matchVersion,
Reader in)
Deprecated. Construct a new ArabicLetterTokenizer. |
Method Summary | |
---|---|
protected boolean |
isTokenChar(int c)
Deprecated. Allows for Letter category or NonspacingMark category |
Methods inherited from class org.apache.lucene.analysis.CharTokenizer |
---|
end, incrementToken, isTokenChar, normalize, normalize, reset |
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
---|
close, correctOffset |
Methods inherited from class org.apache.lucene.analysis.TokenStream |
---|
reset |
Methods inherited from class org.apache.lucene.util.AttributeSource |
---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public ArabicLetterTokenizer(Version matchVersion, Reader in)
matchVersion
- Lucene version
to match See abovein
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource source, Reader in)
AttributeSource
.
matchVersion
- Lucene version to match See abovesource
- the attribute source to use for this Tokenizerin
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
AttributeSource.AttributeFactory
. * @param
matchVersion Lucene version to match See
above
factory
- the attribute factory to use for this Tokenizerin
- the input to split up into tokens@Deprecated public ArabicLetterTokenizer(Reader in)
ArabicLetterTokenizer(Version, Reader)
instead. This will
be removed in Lucene 4.0.
@Deprecated public ArabicLetterTokenizer(AttributeSource source, Reader in)
ArabicLetterTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0.
AttributeSource
.
@Deprecated public ArabicLetterTokenizer(AttributeSource.AttributeFactory factory, Reader in)
ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0.
AttributeSource.AttributeFactory
.
Method Detail |
---|
protected boolean isTokenChar(int c)
isTokenChar
in class LetterTokenizer
LetterTokenizer.isTokenChar(int)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |