Package org.apache.lucene.util.fst
Class Util
java.lang.Object
org.apache.lucene.util.fst.Util
Static helper methods.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Represents a path in TopNSearcher.static final class
Holds a single input (IntsRef) + output, returned byshortestPaths()
.static class
Utility class to find top N shortest paths from start point(s). -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> T
Looks up the output for this input, or null if the input is not acceptedstatic <T> T
Looks up the output for this input, or null if the input is not accepted.static IntsRef
getByOutput
(FST<Long> fst, long targetOutput) Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending.static IntsRef
getByOutput
(FST<Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<Long> arc, FST.Arc<Long> scratchArc, IntsRef result) Expert: likegetByOutput(FST, long)
except reusing BytesReader, initial and scratch Arc, and result.static <T> FST.Arc<T>
readCeilArc
(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise returnnull
.static <T> Util.MinResult<T>[]
shortestPaths
(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString) Starting from node, find the top N min cost completions to a final node.static BytesRef
toBytesRef
(IntsRef input, BytesRef scratch) Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte.static <T> void
Dumps anFST
to a GraphViz'sdot
language description for visualization.static IntsRef
Just takes unsigned byte values from the BytesRef and converts into an IntsRef.static IntsRef
toUTF16
(CharSequence s, IntsRef scratch) Just maps each UTF16 unit (char) to the ints in an IntsRef.static IntsRef
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it.static IntsRef
toUTF32
(CharSequence s, IntsRef scratch) Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it.
-
Method Details
-
get
Looks up the output for this input, or null if the input is not accepted.- Throws:
IOException
-
get
Looks up the output for this input, or null if the input is not accepted- Throws:
IOException
-
getByOutput
Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist.NOTE: this only works with
FST<Long>
, only works when the outputs are ascending in order with the inputs. For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.- Throws:
IOException
-
getByOutput
public static IntsRef getByOutput(FST<Long> fst, long targetOutput, FST.BytesReader in, FST.Arc<Long> arc, FST.Arc<Long> scratchArc, IntsRef result) throws IOException Expert: likegetByOutput(FST, long)
except reusing BytesReader, initial and scratch Arc, and result.- Throws:
IOException
-
shortestPaths
public static <T> Util.MinResult<T>[] shortestPaths(FST<T> fst, FST.Arc<T> fromNode, T startOutput, Comparator<T> comparator, int topN, boolean allowEmptyString) throws IOException Starting from node, find the top N min cost completions to a final node.- Throws:
IOException
-
toDot
public static <T> void toDot(FST<T> fst, Writer out, boolean sameRank, boolean labelStates) throws IOException Dumps anFST
to a GraphViz'sdot
language description for visualization. Example of use:PrintWriter pw = new PrintWriter("out.dot"); Util.toDot(fst, pw, true, true); pw.close();
and then, from command line:dot -Tpng -o out.png out.dot
Note: larger FSTs (a few thousand nodes) won't even render, don't bother. If the FST is > 2.1 GB in size then this method will throw strange exceptions.
- Parameters:
sameRank
- Iftrue
, the resultingdot
file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer.labelStates
- Iftrue
states will have labels equal to their offsets in their binary format. Expands the graph considerably.- Throws:
IOException
- See Also:
-
- "http://www.graphviz.org/"
-
toUTF16
Just maps each UTF16 unit (char) to the ints in an IntsRef. -
toUTF32
Decodes the Unicode codepoints from the provided CharSequence and places them in the provided scratch IntsRef, which must not be null, returning it. -
toUTF32
Decodes the Unicode codepoints from the provided char[] and places them in the provided scratch IntsRef, which must not be null, returning it. -
toIntsRef
Just takes unsigned byte values from the BytesRef and converts into an IntsRef. -
toBytesRef
Just converts IntsRef to BytesRef; you must ensure the int values fit into a byte. -
readCeilArc
public static <T> FST.Arc<T> readCeilArc(int label, FST<T> fst, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise returnnull
.- Parameters:
label
- the label to ceil onfst
- the fst to operate onfollow
- the arc to follow reading the label fromarc
- the arc to read into in placein
- the fst'sFST.BytesReader
- Throws:
IOException
-