org.apache.lucene.index
Class BalancedSegmentMergePolicy

java.lang.Object
  extended by org.apache.lucene.index.MergePolicy
      extended by org.apache.lucene.index.LogMergePolicy
          extended by org.apache.lucene.index.LogByteSizeMergePolicy
              extended by org.apache.lucene.index.BalancedSegmentMergePolicy
All Implemented Interfaces:
Closeable

public class BalancedSegmentMergePolicy
extends LogByteSizeMergePolicy

Merge policy that tries to balance not doing large segment merges with not accumulating too many segments in the index, to provide for better performance in near real-time setting.

This is based on code from zoie, described in more detail at http://code.google.com/p/zoie/wiki/ZoieMergePolicy.


Nested Class Summary
static class BalancedSegmentMergePolicy.MergePolicyParams
          Specifies configuration parameters for BalancedSegmentMergePolicy.
 
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.MergeAbortedException, MergePolicy.MergeException, MergePolicy.MergeSpecification, MergePolicy.OneMerge
 
Field Summary
static int DEFAULT_NUM_LARGE_SEGMENTS
           
 
Fields inherited from class org.apache.lucene.index.LogByteSizeMergePolicy
DEFAULT_MAX_MERGE_MB, DEFAULT_MAX_MERGE_MB_FOR_FORCED_MERGE, DEFAULT_MIN_MERGE_MB
 
Fields inherited from class org.apache.lucene.index.LogMergePolicy
calibrateSizeByDeletes, DEFAULT_MAX_MERGE_DOCS, DEFAULT_MERGE_FACTOR, DEFAULT_NO_CFS_RATIO, LEVEL_LOG_SPAN, maxMergeDocs, maxMergeSize, maxMergeSizeForForcedMerge, mergeFactor, minMergeSize, noCFSRatio, useCompoundFile
 
Fields inherited from class org.apache.lucene.index.MergePolicy
writer
 
Constructor Summary
BalancedSegmentMergePolicy()
           
 
Method Summary
 MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos infos)
          Finds merges necessary to force-merge all deletes from the index.
 MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentInfo,Boolean> segmentsToMerge)
          Returns the merges necessary to merge the index down to a specified number of segments.
 MergePolicy.MergeSpecification findMerges(SegmentInfos infos)
          Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so.
 int getMaxSmallSegments()
           
 int getNumLargeSegments()
           
 boolean getPartialExpunge()
           
 void setMaxSmallSegments(int maxSmallSegments)
           
 void setMergeFactor(int mergeFactor)
          Determines how often segment indices are merged by addDocument().
 void setMergePolicyParams(BalancedSegmentMergePolicy.MergePolicyParams params)
           
 void setNumLargeSegments(int numLargeSegments)
           
 void setPartialExpunge(boolean doPartialExpunge)
           
protected  long size(SegmentInfo info)
           
 
Methods inherited from class org.apache.lucene.index.LogByteSizeMergePolicy
getMaxMergeMB, getMaxMergeMBForForcedMerge, getMaxMergeMBForOptimize, getMinMergeMB, setMaxMergeMB, setMaxMergeMBForForcedMerge, setMaxMergeMBForOptimize, setMinMergeMB
 
Methods inherited from class org.apache.lucene.index.LogMergePolicy
close, getCalibrateSizeByDeletes, getMaxMergeDocs, getMergeFactor, getNoCFSRatio, getUseCompoundFile, isMerged, isMerged, message, setCalibrateSizeByDeletes, setMaxMergeDocs, setNoCFSRatio, setUseCompoundFile, sizeBytes, sizeDocs, toString, useCompoundFile, verbose
 
Methods inherited from class org.apache.lucene.index.MergePolicy
setIndexWriter
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_NUM_LARGE_SEGMENTS

public static final int DEFAULT_NUM_LARGE_SEGMENTS
See Also:
Constant Field Values
Constructor Detail

BalancedSegmentMergePolicy

public BalancedSegmentMergePolicy()
Method Detail

setMergePolicyParams

public void setMergePolicyParams(BalancedSegmentMergePolicy.MergePolicyParams params)

size

protected long size(SegmentInfo info)
             throws IOException
Overrides:
size in class LogByteSizeMergePolicy
Throws:
IOException

setPartialExpunge

public void setPartialExpunge(boolean doPartialExpunge)

getPartialExpunge

public boolean getPartialExpunge()

setNumLargeSegments

public void setNumLargeSegments(int numLargeSegments)

getNumLargeSegments

public int getNumLargeSegments()

setMaxSmallSegments

public void setMaxSmallSegments(int maxSmallSegments)

getMaxSmallSegments

public int getMaxSmallSegments()

setMergeFactor

public void setMergeFactor(int mergeFactor)
Description copied from class: LogMergePolicy
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches is slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Overrides:
setMergeFactor in class LogMergePolicy

findForcedMerges

public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos,
                                                       int maxNumSegments,
                                                       Map<SegmentInfo,Boolean> segmentsToMerge)
                                                throws IOException
Description copied from class: LogMergePolicy
Returns the merges necessary to merge the index down to a specified number of segments. This respects the LogMergePolicy.maxMergeSizeForForcedMerge setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the MergeScheduler in use may make use of concurrency.

Overrides:
findForcedMerges in class LogMergePolicy
Parameters:
infos - the total set of segments in the index
maxNumSegments - requested maximum number of segments in the index (currently this is always 1)
segmentsToMerge - contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.
Throws:
IOException

findForcedDeletesMerges

public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos infos)
                                                       throws CorruptIndexException,
                                                              IOException
Description copied from class: LogMergePolicy
Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

Overrides:
findForcedDeletesMerges in class LogMergePolicy
Parameters:
infos - the total set of segments in the index
Throws:
CorruptIndexException
IOException

findMerges

public MergePolicy.MergeSpecification findMerges(SegmentInfos infos)
                                          throws IOException
Description copied from class: LogMergePolicy
Checks if any merges are now necessary and returns a MergePolicy.MergeSpecification if so. A merge is necessary when there are more than LogMergePolicy.setMergeFactor(int) segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the MergeScheduler to use concurrency.

Overrides:
findMerges in class LogMergePolicy
Parameters:
infos - the total set of segments in the index
Throws:
IOException