|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter
public class DirectoryTaxonomyWriter
TaxonomyWriter
which uses a Directory
to store the taxonomy
information on disk, and keeps an additional in-memory cache of some or all
categories.
In addition to the permanently-stored information in the Directory
,
efficiency dictates that we also keep an in-memory cache of recently
seen or all categories, so that we do not need to go back to disk
for every category addition to see which ordinal this category already has,
if any. A TaxonomyWriterCache
object determines the specific caching
algorithm used.
This class offers some hooks for extending classes to control the
IndexWriter
instance that is used. See openIndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.index.IndexWriterConfig)
.
Nested Class Summary | |
---|---|
static class |
DirectoryTaxonomyWriter.DiskOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained on file system |
static class |
DirectoryTaxonomyWriter.MemoryOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained in memory |
static interface |
DirectoryTaxonomyWriter.OrdinalMap
Mapping from old ordinal to new ordinals, used when merging indexes wit separate taxonomies. |
Field Summary | |
---|---|
static String |
INDEX_CREATE_TIME
Property name of user commit data that contains the creation time of a taxonomy index. |
Constructor Summary | |
---|---|
DirectoryTaxonomyWriter(Directory d)
|
|
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode)
Creates a new instance with a default cached as defined by defaultTaxonomyWriterCache() . |
|
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode,
TaxonomyWriterCache cache)
Construct a Taxonomy writer. |
Method Summary | |
---|---|
int |
addCategory(CategoryPath categoryPath)
addCategory() adds a category with a given path name to the taxonomy, and returns its ordinal. |
protected int |
addCategoryDocument(CategoryPath categoryPath,
int length,
int parent)
|
void |
addTaxonomies(Directory[] taxonomies,
DirectoryTaxonomyWriter.OrdinalMap[] ordinalMaps)
Take all the categories of one or more given taxonomies, and add them to the main taxonomy (this), if they are not already there. |
void |
close()
Frees used resources as well as closes the underlying IndexWriter ,
which commits whatever changes made to it to the underlying
Directory . |
protected void |
closeResources()
A hook for extending classes to close additional resources that were used. |
void |
commit()
Calling commit() ensures that all the categories written so far are visible to a reader that is opened (or reopened) after that call. |
void |
commit(Map<String,String> commitUserData)
Like commit(), but also store properties with the index. |
protected IndexWriterConfig |
createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
Create the IndexWriterConfig that would be used for opening the internal index writer. |
static TaxonomyWriterCache |
defaultTaxonomyWriterCache()
Defines the default TaxonomyWriterCache to use in constructors
which do not specify one. |
protected void |
ensureOpen()
Verifies that this instance wasn't closed, or throws AlreadyClosedException if it is. |
protected int |
findCategory(CategoryPath categoryPath)
Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy. |
int |
getCacheMemoryUsage()
Returns the number of memory bytes used by the cache. |
int |
getParent(int ordinal)
getParent() returns the ordinal of the parent category of the category with the given ordinal. |
int |
getSize()
getSize() returns the number of categories in the taxonomy. |
protected IndexWriter |
openIndexWriter(Directory directory,
IndexWriterConfig config)
Open internal index writer, which contains the taxonomy data. |
protected IndexReader |
openReader()
Open an IndexReader from the internal IndexWriter , by
calling IndexReader.open(IndexWriter, boolean) . |
void |
prepareCommit()
prepare most of the work needed for a two-phase commit. |
void |
prepareCommit(Map<String,String> commitUserData)
Like above, and also prepares to store user data with the index. |
void |
rollback()
Rollback changes to the taxonomy writer and closes the instance. |
void |
setCacheMissesUntilFill(int i)
Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache. |
void |
setDelimiter(char delimiter)
setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components. |
static void |
unlock(Directory directory)
Forcibly unlocks the taxonomy in the named directory. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String INDEX_CREATE_TIME
Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.
Constructor Detail |
---|
public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode, TaxonomyWriterCache cache) throws IOException
directory
- The Directory
in which to store the taxonomy. Note that
the taxonomy is written directly to that directory (not to a
subdirectory of it).openMode
- Specifies how to open a taxonomy for writing: APPEND
means open an existing index for append (failing if the index does
not yet exist). CREATE
means create a new index (first
deleting the old one if it already existed).
APPEND_OR_CREATE
appends to an existing index if there
is one, otherwise it creates a new index.cache
- A TaxonomyWriterCache
implementation which determines
the in-memory caching policy. See for example
LruTaxonomyWriterCache
and Cl2oTaxonomyWriterCache
.
If null or missing, defaultTaxonomyWriterCache()
is used.
CorruptIndexException
- if the taxonomy is corrupted.
LockObtainFailedException
- if the taxonomy is locked by another writer. If it is known
that no other concurrent writer is active, the lock might
have been left around by an old dead process, and should be
removed using unlock(Directory)
.
IOException
- if another error occurred.public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode) throws CorruptIndexException, LockObtainFailedException, IOException
defaultTaxonomyWriterCache()
.
CorruptIndexException
LockObtainFailedException
IOException
public DirectoryTaxonomyWriter(Directory d) throws CorruptIndexException, LockObtainFailedException, IOException
CorruptIndexException
LockObtainFailedException
IOException
Method Detail |
---|
public void setDelimiter(char delimiter)
If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all LuceneTaxonomyWriter and LuceneTaxonomyReader objects you create for the same directory.
public static void unlock(Directory directory) throws IOException
Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this taxonomy.
This method is unnecessary if your Directory
uses a
NativeFSLockFactory
instead of the default
SimpleFSLockFactory
. When the "native" lock is used, a lock
does not stay behind forever when the process using it dies.
IOException
protected IndexWriter openIndexWriter(Directory directory, IndexWriterConfig config) throws IOException
Extensions may provide their own IndexWriter
implementation or instance.
NOTE: the instance this method returns will be closed upon calling
to close()
.
NOTE: the merge policy in effect must not merge none adjacent segments. See
comment in createIndexWriterConfig(IndexWriterConfig.OpenMode)
for the logic behind this.
directory
- the Directory
on top of which an IndexWriter
should be opened.config
- configuration for the internal index writer.
IOException
createIndexWriterConfig(IndexWriterConfig.OpenMode)
protected IndexWriterConfig createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig
that would be used for opening the internal index writer.
IndexWriter
as they see fit,
including setting a merge-scheduler
, or
deletion-policy
, different RAM size
etc.
openMode
- see IndexWriterConfig.OpenMode
openIndexWriter(Directory, IndexWriterConfig)
protected IndexReader openReader() throws IOException
IndexReader
from the internal IndexWriter
, by
calling IndexReader.open(IndexWriter, boolean)
. Extending classes can override
this method to return their own IndexReader
.
IOException
public static TaxonomyWriterCache defaultTaxonomyWriterCache()
TaxonomyWriterCache
to use in constructors
which do not specify one.
The current default is Cl2oTaxonomyWriterCache
constructed
with the parameters (1024, 0.15f, 3), i.e., the entire taxonomy is
cached in memory while building it.
public void close() throws CorruptIndexException, IOException
IndexWriter
,
which commits whatever changes made to it to the underlying
Directory
.
close
in interface Closeable
CorruptIndexException
IOException
public int getCacheMemoryUsage()
protected void closeResources() throws IOException
IndexReader
as well as the
TaxonomyWriterCache
instances that were used. super.closeResources()
call in your implementation.
IOException
protected int findCategory(CategoryPath categoryPath) throws IOException
IOException
public int addCategory(CategoryPath categoryPath) throws IOException
TaxonomyWriter
Before adding a category, addCategory() makes sure that all its ancestor categories exist in the taxonomy as well. As result, the ordinal of a category is guaranteed to be smaller then the ordinal of any of its descendants.
addCategory
in interface TaxonomyWriter
IOException
protected final void ensureOpen()
AlreadyClosedException
if it is.
protected int addCategoryDocument(CategoryPath categoryPath, int length, int parent) throws CorruptIndexException, IOException
CorruptIndexException
IOException
public void commit() throws CorruptIndexException, IOException
TwoPhaseCommit.commit()
commit
in interface TwoPhaseCommit
CorruptIndexException
IOException
public void commit(Map<String,String> commitUserData) throws CorruptIndexException, IOException
DirectoryTaxonomyReader.getCommitUserData()
.
See TwoPhaseCommit.commit(Map)
.
commit
in interface TwoPhaseCommit
CorruptIndexException
IOException
TwoPhaseCommit.commit()
,
TwoPhaseCommit.prepareCommit(Map)
public void prepareCommit() throws CorruptIndexException, IOException
IndexWriter.prepareCommit()
.
prepareCommit
in interface TwoPhaseCommit
CorruptIndexException
IOException
public void prepareCommit(Map<String,String> commitUserData) throws CorruptIndexException, IOException
IndexWriter.prepareCommit(Map)
prepareCommit
in interface TwoPhaseCommit
CorruptIndexException
IOException
TwoPhaseCommit.prepareCommit()
public int getSize()
Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
getSize
in interface TaxonomyWriter
public void setCacheMissesUntilFill(int i)
LuceneTaxonomyWriter holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.
If the number is set to 0
, the entire taxonomy is read
into the cache on first use, without fetching individual categories
first.
Note that if the memory cache of choice is limited in size, and cannot hold the entire content of the on-disk taxonomy, then it is never read in its entirety into the cache, regardless of the setting of this method.
public int getParent(int ordinal) throws IOException
TaxonomyWriter
When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy. TODO (Facet): instead of a getParent(ordinal) method, consider having a
getCategory(categorypath, prefixlen) which is similar to addCategory except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by addCategory() so it means less code.
getParent
in interface TaxonomyWriter
IOException
public void addTaxonomies(Directory[] taxonomies, DirectoryTaxonomyWriter.OrdinalMap[] ordinalMaps) throws IOException
Additionally, fill a mapping for each of the added taxonomies, mapping its ordinals to the ordinals in the enlarged main taxonomy. These mapping are saved into an array of OrdinalMap objects given by the user, one for each of the given taxonomies (not including "this", the main taxonomy). Often the first of these will be a MemoryOrdinalMap and the others will be a DiskOrdinalMap - see discussion in {OrdinalMap}.
Note that the taxonomies to be added are given as Directory objects, not opened TaxonomyReader/TaxonomyWriter objects, so if any of them are currently managed by an open TaxonomyWriter, make sure to commit() (or close()) it first. The main taxonomy (this) is an open TaxonomyWriter, and does not need to be commit()ed before this call.
IOException
public void rollback() throws IOException
AlreadyClosedException
).
rollback
in interface TwoPhaseCommit
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |