org.apache.lucene.facet.taxonomy.directory
Class DirectoryTaxonomyReader

java.lang.Object
  extended by org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader
All Implemented Interfaces:
Closeable, TaxonomyReader

public class DirectoryTaxonomyReader
extends Object
implements TaxonomyReader

A TaxonomyReader which retrieves stored taxonomy information from a Directory.

Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader
TaxonomyReader.ChildrenArrays
 
Field Summary
 
Fields inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader
INVALID_ORDINAL, ROOT_ORDINAL
 
Constructor Summary
DirectoryTaxonomyReader(Directory directory)
          Open for reading a taxonomy stored in a given Directory.
 
Method Summary
 void close()
           
 void decRef()
          Expert: decreases the refCount of this TaxonomyReader instance.
protected  void ensureOpen()
           
 TaxonomyReader.ChildrenArrays getChildrenArrays()
          getChildrenArrays() returns a TaxonomyReader.ChildrenArrays object which can be used together to efficiently enumerate the children of any category.
 Map<String,String> getCommitUserData()
          Retrieve user committed data.
 int getOrdinal(CategoryPath categoryPath)
          getOrdinal() returns the ordinal of the category given as a path.
 int getParent(int ordinal)
          getParent() returns the ordinal of the parent category of the category with the given ordinal.
 int[] getParentArray()
          getParentArray() returns an int array of size getSize() listing the ordinal of the parent category of each category in the taxonomy.
 CategoryPath getPath(int ordinal)
          getPath() returns the path name of the category with the given ordinal.
 boolean getPath(int ordinal, CategoryPath result)
          getPath() returns the path name of the category with the given ordinal.
 int getRefCount()
          Expert: returns the current refCount for this taxonomy reader
 int getSize()
          getSize() returns the number of categories in the taxonomy.
 void incRef()
          Expert: increments the refCount of this TaxonomyReader instance.
protected  IndexReader openIndexReader(Directory directory)
           
 boolean refresh()
          refresh() re-reads the taxonomy information if there were any changes to the taxonomy since this instance was opened or last refreshed.
 void setCacheSize(int size)
          setCacheSize controls the maximum allowed size of each of the caches used by getPath(int) and getOrdinal(CategoryPath).
 void setDelimiter(char delimiter)
          setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components.
 String toString(int max)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DirectoryTaxonomyReader

public DirectoryTaxonomyReader(Directory directory)
                        throws IOException
Open for reading a taxonomy stored in a given Directory.

Parameters:
directory - The Directory in which to the taxonomy lives. Note that the taxonomy is read directly to that directory (not from a subdirectory of it).
Throws:
CorruptIndexException - if the Taxonomy is corrupted.
IOException - if another error occurred.
Method Detail

openIndexReader

protected IndexReader openIndexReader(Directory directory)
                               throws CorruptIndexException,
                                      IOException
Throws:
CorruptIndexException
IOException

ensureOpen

protected final void ensureOpen()
                         throws AlreadyClosedException
Throws:
AlreadyClosedException - if this IndexReader is closed

setCacheSize

public void setCacheSize(int size)
setCacheSize controls the maximum allowed size of each of the caches used by getPath(int) and getOrdinal(CategoryPath).

Currently, if the given size is smaller than the current size of a cache, it will not shrink, and rather we be limited to its current size.

Parameters:
size - the new maximum cache size, in number of entries.

setDelimiter

public void setDelimiter(char delimiter)
setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components. Do not use this method unless you really know what you are doing.

If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all LuceneTaxonomyWriter and LuceneTaxonomyReader objects you create.


getOrdinal

public int getOrdinal(CategoryPath categoryPath)
               throws IOException
Description copied from interface: TaxonomyReader
getOrdinal() returns the ordinal of the category given as a path. The ordinal is the category's serial number, an integer which starts with 0 and grows as more categories are added (note that once a category is added, it can never be deleted).

If the given category wasn't found in the taxonomy, INVALID_ORDINAL is returned.

Specified by:
getOrdinal in interface TaxonomyReader
Throws:
IOException

getPath

public CategoryPath getPath(int ordinal)
                     throws CorruptIndexException,
                            IOException
Description copied from interface: TaxonomyReader
getPath() returns the path name of the category with the given ordinal. The path is returned as a new CategoryPath object - to reuse an existing object, use TaxonomyReader.getPath(int, CategoryPath).

A null is returned if a category with the given ordinal does not exist.

Specified by:
getPath in interface TaxonomyReader
Throws:
CorruptIndexException
IOException

getPath

public boolean getPath(int ordinal,
                       CategoryPath result)
                throws CorruptIndexException,
                       IOException
Description copied from interface: TaxonomyReader
getPath() returns the path name of the category with the given ordinal. The path is written to the given CategoryPath object (which is cleared first).

If a category with the given ordinal does not exist, the given CategoryPath object is not modified, and the method returns false. Otherwise, the method returns true.

Specified by:
getPath in interface TaxonomyReader
Throws:
CorruptIndexException
IOException

getParent

public int getParent(int ordinal)
Description copied from interface: TaxonomyReader
getParent() returns the ordinal of the parent category of the category with the given ordinal.

When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back. However, implementations are expected to provide a much more efficient implementation:

getParent() should be a very quick method, as it is used during the facet aggregation process in faceted search. Implementations will most likely want to serve replies to this method from a pre-filled cache.

If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy.

Specified by:
getParent in interface TaxonomyReader

getParentArray

public int[] getParentArray()
getParentArray() returns an int array of size getSize() listing the ordinal of the parent category of each category in the taxonomy.

The caller can hold on to the array it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the array it got as read-only and not modify it, because other callers might have gotten the same array too, and getParent() calls are also answered from the same array.

The getParentArray() call is extremely efficient, merely returning a reference to an array that already exists. For a caller that plans to call getParent() for many categories, using getParentArray() and the array it returns is a somewhat faster approach because it avoids the overhead of method calls and volatile dereferencing.

If you use getParentArray() instead of getParent(), remember that the array you got is (naturally) not modified after a refresh(), so you should always call getParentArray() again after a refresh().

Specified by:
getParentArray in interface TaxonomyReader

refresh

public boolean refresh()
                throws IOException,
                       InconsistentTaxonomyException
Description copied from interface: TaxonomyReader
refresh() re-reads the taxonomy information if there were any changes to the taxonomy since this instance was opened or last refreshed. Calling refresh() is more efficient than close()ing the old instance and opening a new one.

If there were no changes since this instance was opened or last refreshed, then this call does nothing. Note, however, that this is still a relatively slow method (as it needs to verify whether there have been any changes on disk to the taxonomy), so it should not be called too often needlessly. In faceted search, the taxonomy reader's refresh() should be called only after a reopen() of the main index.

Refreshing the taxonomy might fail in some cases, for example if the taxonomy was recreated since this instance was opened or last refreshed. In this case an InconsistentTaxonomyException is thrown, suggesting that in order to obtain up-to-date taxonomy data a new TaxonomyReader should be opened. Note: This TaxonomyReader instance remains unchanged and usable in this case, and the application can continue to use it, and should still Closeable.close() when no longer needed.

It should be noted that refresh() is similar in purpose to IndexReader.reopen(), but the two methods behave differently. refresh() refreshes the existing TaxonomyReader object, rather than opening a new one in addition to the old one as reopen() does. The reason is that in a taxonomy, one can only add new categories and cannot modify or delete existing categories; Therefore, there is no reason to keep an old snapshot of the taxonomy open - refreshing the taxonomy to the newest data and using this new snapshots in all threads (whether new or old) is fine. This saves us needing to keep multiple copies of the taxonomy open in memory.

Specified by:
refresh in interface TaxonomyReader
Returns:
true if anything has changed, false otherwise.
Throws:
IOException
InconsistentTaxonomyException

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Throws:
IOException

getSize

public int getSize()
Description copied from interface: TaxonomyReader
getSize() returns the number of categories in the taxonomy.

Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.

Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).

Specified by:
getSize in interface TaxonomyReader

getCommitUserData

public Map<String,String> getCommitUserData()
                                     throws IOException
Description copied from interface: TaxonomyReader
Retrieve user committed data.

Specified by:
getCommitUserData in interface TaxonomyReader
Throws:
IOException
See Also:
TwoPhaseCommit.commit(Map)

getChildrenArrays

public TaxonomyReader.ChildrenArrays getChildrenArrays()
Description copied from interface: TaxonomyReader
getChildrenArrays() returns a TaxonomyReader.ChildrenArrays object which can be used together to efficiently enumerate the children of any category.

The caller can hold on to the object it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the object which it got (and the arrays it contains) as read-only and not modify it, because other callers might have gotten the same object too.

Implementations should have O(getSize()) time for the first call or after a refresh(), but O(1) time for further calls. In neither case there should be a need to read new data from disk. These guarantees are most likely achieved by calculating this object (based on the getParentArray()) when first needed, and later (if the taxonomy was not refreshed) returning the same object (without any allocation or copying) when requested.

The reason we have one method returning one object, rather than two methods returning two arrays, is to avoid race conditions in a multi- threaded application: We want to avoid the possibility of returning one new array and one old array, as those could not be used together.

Specified by:
getChildrenArrays in interface TaxonomyReader

toString

public String toString(int max)

decRef

public void decRef()
            throws IOException
Expert: decreases the refCount of this TaxonomyReader instance. If the refCount drops to 0, then this reader is closed.

Specified by:
decRef in interface TaxonomyReader
Throws:
IOException

getRefCount

public int getRefCount()
Expert: returns the current refCount for this taxonomy reader

Specified by:
getRefCount in interface TaxonomyReader

incRef

public void incRef()
Expert: increments the refCount of this TaxonomyReader instance. RefCounts are used to determine when a taxonomy reader can be closed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding decRef(), in a finally clause; otherwise the reader may never be closed.

Specified by:
incRef in interface TaxonomyReader