org.apache.lucene.util.fst
Class FST<T>

java.lang.Object
  extended by org.apache.lucene.util.fst.FST<T>

public final class FST<T>
extends Object

Represents an finite state machine (FST), using a compact byte[] format.

The format is similar to what's used by Morfologik (http://sourceforge.net/projects/morfologik).

See the package documentation for some simple examples.

NOTE: the FST cannot be larger than ~2.1 GB because it uses int to address the byte[].

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
static class FST.Arc<T>
          Represents a single arc.
static class FST.BytesReader
          Reads the bytes from this FST.
static class FST.INPUT_TYPE
          Specifies allowed range of each int input label for this FST.
 
Field Summary
 int arcCount
           
 int arcWithOutputCount
           
static int END_LABEL
          If arc has this label then that arc is final/accepted
 FST.INPUT_TYPE inputType
           
 int nodeCount
           
 Outputs<T> outputs
           
 
Constructor Summary
FST(DataInput in, Outputs<T> outputs)
          Load a previously saved FST.
 
Method Summary
 FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
          Finds an arc leaving the incoming arc, replacing the arc in place.
 int getArcCount()
           
 int getArcWithOutputCount()
           
 FST.BytesReader getBytesReader(int pos)
           
 T getEmptyOutput()
           
 FST.Arc<T> getFirstArc(FST.Arc<T> arc)
          Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
 FST.INPUT_TYPE getInputType()
           
 int getNodeCount()
           
 FST<T> pack(int minInCountDeref, int maxDerefNodes, float acceptableOverheadRatio)
          Expert: creates an FST by packing this one.
static
<T> FST<T>
read(File file, Outputs<T> outputs)
          Reads an automaton from a file.
 FST.Arc<T> readFirstRealTargetArc(int node, FST.Arc<T> arc, FST.BytesReader in)
           
 FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
          Follow the follow arc and read the first arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
 FST.Arc<T> readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in)
          Follows the follow arc and reads the last arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
 FST.Arc<T> readNextArc(FST.Arc<T> arc, FST.BytesReader in)
          In-place read; returns the arc.
 int readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in)
          Peeks at next arc's label; does not alter arc.
 FST.Arc<T> readNextRealArc(FST.Arc<T> arc, FST.BytesReader in)
          Never returns null, but you should never call this if arc.isLast() is true.
 void save(DataOutput out)
           
 void save(File file)
          Writes an automaton to a file.
 void setAllowArrayArcs(boolean v)
           
 int sizeInBytes()
          Returns bytes used to represent the FST
static
<T> boolean
targetHasArcs(FST.Arc<T> arc)
          returns true if the node at this address has any outgoing arcs
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inputType

public final FST.INPUT_TYPE inputType

outputs

public final Outputs<T> outputs

nodeCount

public int nodeCount

arcCount

public int arcCount

arcWithOutputCount

public int arcWithOutputCount

END_LABEL

public static final int END_LABEL
If arc has this label then that arc is final/accepted

See Also:
Constant Field Values
Constructor Detail

FST

public FST(DataInput in,
           Outputs<T> outputs)
    throws IOException
Load a previously saved FST.

Throws:
IOException
Method Detail

getInputType

public FST.INPUT_TYPE getInputType()

sizeInBytes

public int sizeInBytes()
Returns bytes used to represent the FST


getEmptyOutput

public T getEmptyOutput()

save

public void save(DataOutput out)
          throws IOException
Throws:
IOException

save

public void save(File file)
          throws IOException
Writes an automaton to a file.

Throws:
IOException

read

public static <T> FST<T> read(File file,
                              Outputs<T> outputs)
                   throws IOException
Reads an automaton from a file.

Throws:
IOException

targetHasArcs

public static <T> boolean targetHasArcs(FST.Arc<T> arc)
returns true if the node at this address has any outgoing arcs


getFirstArc

public FST.Arc<T> getFirstArc(FST.Arc<T> arc)
Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node


readLastTargetArc

public FST.Arc<T> readLastTargetArc(FST.Arc<T> follow,
                                    FST.Arc<T> arc,
                                    FST.BytesReader in)
                             throws IOException
Follows the follow arc and reads the last arc of its target; this changes the provided arc (2nd arg) in-place and returns it.

Returns:
Returns the second argument (arc).
Throws:
IOException

readFirstTargetArc

public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow,
                                     FST.Arc<T> arc,
                                     FST.BytesReader in)
                              throws IOException
Follow the follow arc and read the first arc of its target; this changes the provided arc (2nd arg) in-place and returns it.

Returns:
Returns the second argument (arc).
Throws:
IOException

readFirstRealTargetArc

public FST.Arc<T> readFirstRealTargetArc(int node,
                                         FST.Arc<T> arc,
                                         FST.BytesReader in)
                                  throws IOException
Throws:
IOException

readNextArc

public FST.Arc<T> readNextArc(FST.Arc<T> arc,
                              FST.BytesReader in)
                       throws IOException
In-place read; returns the arc.

Throws:
IOException

readNextArcLabel

public int readNextArcLabel(FST.Arc<T> arc,
                            FST.BytesReader in)
                     throws IOException
Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!

Throws:
IOException

readNextRealArc

public FST.Arc<T> readNextRealArc(FST.Arc<T> arc,
                                  FST.BytesReader in)
                           throws IOException
Never returns null, but you should never call this if arc.isLast() is true.

Throws:
IOException

findTargetArc

public FST.Arc<T> findTargetArc(int labelToMatch,
                                FST.Arc<T> follow,
                                FST.Arc<T> arc,
                                FST.BytesReader in)
                         throws IOException
Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.

Throws:
IOException

getNodeCount

public int getNodeCount()

getArcCount

public int getArcCount()

getArcWithOutputCount

public int getArcWithOutputCount()

setAllowArrayArcs

public void setAllowArrayArcs(boolean v)

getBytesReader

public FST.BytesReader getBytesReader(int pos)

pack

public FST<T> pack(int minInCountDeref,
                   int maxDerefNodes,
                   float acceptableOverheadRatio)
            throws IOException
Expert: creates an FST by packing this one. This process requires substantial additional RAM (currently up to ~8 bytes per node depending on acceptableOverheadRatio), but then should produce a smaller FST.

The implementation of this method uses ideas from Smaller Representation of Finite State Automata, which describes techniques to reduce the size of a FST. However, this is not a strict implementation of the algorithms described in this paper.

Throws:
IOException


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.