TermToBytesRefAttribute (Lucene 4.0.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.tokenattributes
Interface TermToBytesRefAttribute

All Superinterfaces:: Attribute

All Known Implementing Classes:: CharTermAttributeImpl, NumericTokenStream.NumericTermAttributeImpl, Token

public interface TermToBytesRefAttribute
extends Attribute
extends Attribute

This attribute is requested by TermsHashPerField to index the contents. This attribute can be used to customize the final byte[] encoding of terms.

Consumers of this attribute call getBytesRef() up-front, and then invoke fillBytesRef() for each term. Example:

   final TermToBytesRefAttribute termAtt = tokenStream.getAttribute(TermToBytesRefAttribute.class);
   final BytesRef bytes = termAtt.getBytesRef();

   while (termAtt.incrementToken() {

     // you must call termAtt.fillBytesRef() before doing something with the bytes.
     // this encodes the term value (internally it might be a char[], etc) into the bytes.
     int hashCode = termAtt.fillBytesRef();

     if (isInteresting(bytes)) {
     
       // because the bytes are reused by the attribute (like CharTermAttribute's char[] buffer),
       // you should make a copy if you need persistent access to the bytes, otherwise they will
       // be rewritten across calls to incrementToken()

       doSomethingWith(new BytesRef(bytes));
     }
   }
   ...

WARNING: This API is experimental and might change in incompatible ways in the next release.: This is a very expert API, please use CharTermAttributeImpl and its implementation of this method for UTF-8 terms.

Method Summary
`int`	`fillBytesRef()` Updates the bytes `getBytesRef()` to contain this term's final encoding, and returns its hashcode.
`BytesRef`	`getBytesRef()` Retrieve this attribute's BytesRef.

Method Detail

fillBytesRef

int fillBytesRef()

Updates the bytes getBytesRef() to contain this term's final encoding, and returns its hashcode.

Returns:

the hashcode as defined by BytesRef.hashCode():

  int hash = 0;
  for (int i = termBytes.offset; i < termBytes.offset+termBytes.length; i++) {
    hash = 31*hash + termBytes.bytes[i];
  }

Implement this for performance reasons, if your code can calculate the hash on-the-fly. If this is not the case, just return termBytes.hashCode().

getBytesRef

BytesRef getBytesRef()

Retrieve this attribute's BytesRef. The bytes are updated from the current term when the consumer calls fillBytesRef().

Returns:: this Attributes internal BytesRef.