org.apache.lucene.facet.index
Class CategoryDocumentBuilder

java.lang.Object
  extended by org.apache.lucene.facet.index.CategoryDocumentBuilder
Direct Known Subclasses:
EnhancementsDocumentBuilder

public class CategoryDocumentBuilder
extends Object

A utility class which allows attachment of CategoryPaths or CategoryAttributes to a given document using a taxonomy.
Construction could be done with either a given FacetIndexingParams or the default implementation DefaultFacetIndexingParams.
A CategoryDocumentBuilder can be reused by repeatedly setting the categories and building the document. Categories are provided either as CategoryAttribute elements through setCategories(Iterable), or as CategoryPath elements through setCategoryPaths(Iterable).

Note that both setCategories(Iterable) and setCategoryPaths(Iterable) return this CategoryDocumentBuilder, allowing the following pattern: new CategoryDocumentBuilder(taxonomy, params).setCategories(categories).build(doc).

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary
protected  Map<String,List<CategoryAttribute>> categoriesMap
           
protected  ArrayList<Field> fieldList
          A list of fields which is filled at ancestors' construction and used during build(Document).
protected  FacetIndexingParams indexingParams
          Parameters to be used when indexing categories.
protected  TaxonomyWriter taxonomyWriter
          A TaxonomyWriter for adding categories and retrieving their ordinals.
 
Constructor Summary
CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter)
          Creating a facets document builder with default facet indexing parameters.
See: CategoryDocumentBuilder(TaxonomyWriter, FacetIndexingParams)
CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter, FacetIndexingParams params)
          Creating a facets document builder with a given facet indexing parameters object.
 
Method Summary
 Document build(Document doc)
          Adds the fields created in one of the "set" methods to the document
protected  void fillCategoriesMap(Iterable<CategoryAttribute> categories)
          Fills the categories mapping between a field name and a list of categories that belongs to it according to this builder's FacetIndexingParams object
protected  CategoryListTokenizer getCategoryListTokenizer(TokenStream categoryStream)
          Get a category list tokenizer (or a series of such tokenizers) to create the category list tokens.
protected  CategoryTokenizer getCategoryTokenizer(TokenStream categoryStream)
          Get a CategoryTokenizer to create the category tokens.
protected  CountingListTokenizer getCountingListTokenizer(TokenStream categoryStream)
          Get a CountingListTokenizer for creating counting list token.
protected  TokenStream getParentsStream(CategoryAttributesStream categoryAttributesStream)
          Get a stream of categories which includes the parents, according to policies defined in indexing parameters.
 CategoryDocumentBuilder setCategories(Iterable<CategoryAttribute> categories)
          Set the categories of the document builder from an Iterable of CategoryAttribute objects.
 CategoryDocumentBuilder setCategoryPaths(Iterable<CategoryPath> categoryPaths)
          Set the categories of the document builder from an Iterable of CategoryPath objects.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

taxonomyWriter

protected final TaxonomyWriter taxonomyWriter
A TaxonomyWriter for adding categories and retrieving their ordinals.


indexingParams

protected final FacetIndexingParams indexingParams
Parameters to be used when indexing categories.


fieldList

protected final ArrayList<Field> fieldList
A list of fields which is filled at ancestors' construction and used during build(Document).


categoriesMap

protected Map<String,List<CategoryAttribute>> categoriesMap
Constructor Detail

CategoryDocumentBuilder

public CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter)
                        throws IOException
Creating a facets document builder with default facet indexing parameters.
See: CategoryDocumentBuilder(TaxonomyWriter, FacetIndexingParams)

Parameters:
taxonomyWriter - to which new categories will be added, as well as translating known categories to ordinals
Throws:
IOException

CategoryDocumentBuilder

public CategoryDocumentBuilder(TaxonomyWriter taxonomyWriter,
                               FacetIndexingParams params)
                        throws IOException
Creating a facets document builder with a given facet indexing parameters object.

Parameters:
taxonomyWriter - to which new categories will be added, as well as translating known categories to ordinals
params - holds all parameters the indexing process should use such as category-list parameters
Throws:
IOException
Method Detail

setCategoryPaths

public CategoryDocumentBuilder setCategoryPaths(Iterable<CategoryPath> categoryPaths)
                                         throws IOException
Set the categories of the document builder from an Iterable of CategoryPath objects.

Parameters:
categoryPaths - An iterable of CategoryPath objects which holds the categories (facets) which will be added to the document at build(Document)
Returns:
This CategoryDocumentBuilder, to enable this one line call: new CategoryDocumentBuilder(TaxonomyWriter). setCategoryPaths(Iterable).build(Document).
Throws:
IOException

setCategories

public CategoryDocumentBuilder setCategories(Iterable<CategoryAttribute> categories)
                                      throws IOException
Set the categories of the document builder from an Iterable of CategoryAttribute objects.

Parameters:
categories - An iterable of CategoryAttribute objects which holds the categories (facets) which will be added to the document at build(Document)
Returns:
This CategoryDocumentBuilder, to enable this one line call: new CategoryDocumentBuilder(TaxonomyWriter). setCategories(Iterable).build(Document).
Throws:
IOException

getParentsStream

protected TokenStream getParentsStream(CategoryAttributesStream categoryAttributesStream)
Get a stream of categories which includes the parents, according to policies defined in indexing parameters.

Parameters:
categoryAttributesStream - The input stream
Returns:
The parents stream.
See Also:
OrdinalPolicy (for policy of adding category tokens for parents), PathPolicy (for policy of adding category list tokens for parents)

fillCategoriesMap

protected void fillCategoriesMap(Iterable<CategoryAttribute> categories)
                          throws IOException
Fills the categories mapping between a field name and a list of categories that belongs to it according to this builder's FacetIndexingParams object

Parameters:
categories - Iterable over the category attributes
Throws:
IOException

getCategoryListTokenizer

protected CategoryListTokenizer getCategoryListTokenizer(TokenStream categoryStream)
Get a category list tokenizer (or a series of such tokenizers) to create the category list tokens.

Parameters:
categoryStream - A stream containing CategoryAttribute with the relevant data.
Returns:
The category list tokenizer (or series of tokenizers) to be used in creating category list tokens.

getCountingListTokenizer

protected CountingListTokenizer getCountingListTokenizer(TokenStream categoryStream)
Get a CountingListTokenizer for creating counting list token.

Parameters:
categoryStream - A stream containing CategoryAttributes with the relevant data.
Returns:
A counting list tokenizer to be used in creating counting list token.

getCategoryTokenizer

protected CategoryTokenizer getCategoryTokenizer(TokenStream categoryStream)
                                          throws IOException
Get a CategoryTokenizer to create the category tokens. This method can be overridden for adding more attributes to the category tokens.

Parameters:
categoryStream - A stream containing CategoryAttribute with the relevant data.
Returns:
The CategoryTokenizer to be used in creating category tokens.
Throws:
IOException

build

public Document build(Document doc)
Adds the fields created in one of the "set" methods to the document