|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyReader
public class DirectoryTaxonomyReader
A TaxonomyReader
which retrieves stored taxonomy information from a
Directory
.
Reading from the on-disk index on every method call is too slow, so this implementation employs caching: Some methods cache recent requests and their results, while other methods prefetch all the data into memory and then provide answers directly from in-memory tables. See the documentation of individual methods for comments on their performance.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader |
---|
TaxonomyReader.ChildrenArrays |
Field Summary |
---|
Fields inherited from interface org.apache.lucene.facet.taxonomy.TaxonomyReader |
---|
INVALID_ORDINAL, ROOT_ORDINAL |
Constructor Summary | |
---|---|
DirectoryTaxonomyReader(Directory directory)
Open for reading a taxonomy stored in a given Directory . |
Method Summary | |
---|---|
void |
close()
|
void |
decRef()
Expert: decreases the refCount of this TaxonomyReader instance. |
protected void |
ensureOpen()
|
TaxonomyReader.ChildrenArrays |
getChildrenArrays()
getChildrenArrays() returns a TaxonomyReader.ChildrenArrays object which can
be used together to efficiently enumerate the children of any category. |
Map<String,String> |
getCommitUserData()
Retrieve user committed data. |
int |
getOrdinal(CategoryPath categoryPath)
getOrdinal() returns the ordinal of the category given as a path. |
int |
getParent(int ordinal)
getParent() returns the ordinal of the parent category of the category with the given ordinal. |
int[] |
getParentArray()
getParentArray() returns an int array of size getSize() listing the ordinal of the parent category of each category in the taxonomy. |
CategoryPath |
getPath(int ordinal)
getPath() returns the path name of the category with the given ordinal. |
boolean |
getPath(int ordinal,
CategoryPath result)
getPath() returns the path name of the category with the given ordinal. |
int |
getRefCount()
Expert: returns the current refCount for this taxonomy reader |
int |
getSize()
getSize() returns the number of categories in the taxonomy. |
void |
incRef()
Expert: increments the refCount of this TaxonomyReader instance. |
protected IndexReader |
openIndexReader(Directory directory)
|
boolean |
refresh()
refresh() re-reads the taxonomy information if there were any changes to the taxonomy since this instance was opened or last refreshed. |
void |
setCacheSize(int size)
setCacheSize controls the maximum allowed size of each of the caches used by getPath(int) and getOrdinal(CategoryPath) . |
void |
setDelimiter(char delimiter)
setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components. |
String |
toString(int max)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DirectoryTaxonomyReader(Directory directory) throws IOException
Directory
.
directory
- The Directory
in which to the taxonomy lives. Note that
the taxonomy is read directly to that directory (not from a
subdirectory of it).
CorruptIndexException
- if the Taxonomy is corrupted.
IOException
- if another error occurred.Method Detail |
---|
protected IndexReader openIndexReader(Directory directory) throws CorruptIndexException, IOException
CorruptIndexException
IOException
protected final void ensureOpen() throws AlreadyClosedException
AlreadyClosedException
- if this IndexReader is closedpublic void setCacheSize(int size)
getPath(int)
and getOrdinal(CategoryPath)
.
Currently, if the given size is smaller than the current size of a cache, it will not shrink, and rather we be limited to its current size.
size
- the new maximum cache size, in number of entries.public void setDelimiter(char delimiter)
If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all LuceneTaxonomyWriter and LuceneTaxonomyReader objects you create.
public int getOrdinal(CategoryPath categoryPath) throws IOException
TaxonomyReader
If the given category wasn't found in the taxonomy, INVALID_ORDINAL is returned.
getOrdinal
in interface TaxonomyReader
IOException
public CategoryPath getPath(int ordinal) throws CorruptIndexException, IOException
TaxonomyReader
TaxonomyReader.getPath(int, CategoryPath)
.
A null is returned if a category with the given ordinal does not exist.
getPath
in interface TaxonomyReader
CorruptIndexException
IOException
public boolean getPath(int ordinal, CategoryPath result) throws CorruptIndexException, IOException
TaxonomyReader
If a category with the given ordinal does not exist, the given
CategoryPath object is not modified, and the method returns
false
. Otherwise, the method returns true
.
getPath
in interface TaxonomyReader
CorruptIndexException
IOException
public int getParent(int ordinal)
TaxonomyReader
When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back. However, implementations are expected to provide a much more efficient implementation:
getParent() should be a very quick method, as it is used during the facet aggregation process in faceted search. Implementations will most likely want to serve replies to this method from a pre-filled cache.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy.
getParent
in interface TaxonomyReader
public int[] getParentArray()
The caller can hold on to the array it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the array it got as read-only and not modify it, because other callers might have gotten the same array too, and getParent() calls are also answered from the same array.
The getParentArray() call is extremely efficient, merely returning a reference to an array that already exists. For a caller that plans to call getParent() for many categories, using getParentArray() and the array it returns is a somewhat faster approach because it avoids the overhead of method calls and volatile dereferencing.
If you use getParentArray() instead of getParent(), remember that the array you got is (naturally) not modified after a refresh(), so you should always call getParentArray() again after a refresh().
getParentArray
in interface TaxonomyReader
public boolean refresh() throws IOException, InconsistentTaxonomyException
TaxonomyReader
If there were no changes since this instance was opened or last refreshed, then this call does nothing. Note, however, that this is still a relatively slow method (as it needs to verify whether there have been any changes on disk to the taxonomy), so it should not be called too often needlessly. In faceted search, the taxonomy reader's refresh() should be called only after a reopen() of the main index.
Refreshing the taxonomy might fail in some cases, for example
if the taxonomy was recreated since this instance was opened or last refreshed.
In this case an InconsistentTaxonomyException
is thrown,
suggesting that in order to obtain up-to-date taxonomy data a new
TaxonomyReader
should be opened. Note: This TaxonomyReader
instance remains unchanged and usable in this case, and the application can
continue to use it, and should still Closeable.close()
when no longer needed.
It should be noted that refresh() is similar in purpose to IndexReader.reopen(), but the two methods behave differently. refresh() refreshes the existing TaxonomyReader object, rather than opening a new one in addition to the old one as reopen() does. The reason is that in a taxonomy, one can only add new categories and cannot modify or delete existing categories; Therefore, there is no reason to keep an old snapshot of the taxonomy open - refreshing the taxonomy to the newest data and using this new snapshots in all threads (whether new or old) is fine. This saves us needing to keep multiple copies of the taxonomy open in memory.
refresh
in interface TaxonomyReader
IOException
InconsistentTaxonomyException
public void close() throws IOException
close
in interface Closeable
IOException
public int getSize()
TaxonomyReader
Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
getSize
in interface TaxonomyReader
public Map<String,String> getCommitUserData() throws IOException
TaxonomyReader
getCommitUserData
in interface TaxonomyReader
IOException
TwoPhaseCommit.commit(Map)
public TaxonomyReader.ChildrenArrays getChildrenArrays()
TaxonomyReader
TaxonomyReader.ChildrenArrays
object which can
be used together to efficiently enumerate the children of any category.
The caller can hold on to the object it got indefinitely - it is guaranteed that no-one else will modify it. The other side of the same coin is that the caller must treat the object which it got (and the arrays it contains) as read-only and not modify it, because other callers might have gotten the same object too.
Implementations should have O(getSize()) time for the first call or after a refresh(), but O(1) time for further calls. In neither case there should be a need to read new data from disk. These guarantees are most likely achieved by calculating this object (based on the getParentArray()) when first needed, and later (if the taxonomy was not refreshed) returning the same object (without any allocation or copying) when requested.
The reason we have one method returning one object, rather than two methods returning two arrays, is to avoid race conditions in a multi- threaded application: We want to avoid the possibility of returning one new array and one old array, as those could not be used together.
getChildrenArrays
in interface TaxonomyReader
public String toString(int max)
public void decRef() throws IOException
decRef
in interface TaxonomyReader
IOException
public int getRefCount()
getRefCount
in interface TaxonomyReader
public void incRef()
incRef
in interface TaxonomyReader
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |