Package org.apache.lucene.demo.knn
Class KnnVectorDict
- java.lang.Object
-
- org.apache.lucene.demo.knn.KnnVectorDict
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class KnnVectorDict extends Object implements Closeable
Manages a map from token to numeric vector for use with KnnVector indexing and search. The map is stored as an FST: token-to-ordinal plus a dense binary file holding the vectors.
-
-
Constructor Summary
Constructors Constructor Description KnnVectorDict(Directory directory, String dictName)Sole constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static voidbuild(Path gloveInput, Directory directory, String dictName)Convert from a GloVe-formatted dictionary file to a KnnVectorDict file pair.voidclose()voidget(BytesRef token, byte[] output)Get the vector corresponding to the given token.intgetDimension()Get the dimension of the vectors returned by this.longramBytesUsed()Return the size of the dictionary in bytes
-
-
-
Constructor Detail
-
KnnVectorDict
public KnnVectorDict(Directory directory, String dictName) throws IOException
Sole constructor- Parameters:
directory- Lucene directory from which knn directory should be read.dictName- the base name of the directory files that store the knn vector dictionary. A file with extension '.bin' holds the vectors and the '.fst' maps tokens to offsets in the '.bin' file.- Throws:
IOException
-
-
Method Detail
-
get
public void get(BytesRef token, byte[] output) throws IOException
Get the vector corresponding to the given token. NOTE: the returned array is shared and its contents will be overwritten by subsequent calls. The caller is responsible to copy the data as needed.- Parameters:
token- the token to look upoutput- the array in which to write the corresponding vector. Its length must begetDimension()*Float.BYTES. It will be filled with zeros if the token is not present in the dictionary.- Throws:
IllegalArgumentException- if the output array is incorrectly sizedIOException- if there is a problem reading the dictionary
-
getDimension
public int getDimension()
Get the dimension of the vectors returned by this.- Returns:
- the vector dimension
-
close
public void close() throws IOException
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
build
public static void build(Path gloveInput, Directory directory, String dictName) throws IOException
Convert from a GloVe-formatted dictionary file to a KnnVectorDict file pair.- Parameters:
gloveInput- the path to the input dictionary. The dictionary is delimited by newlines, and each line is space-delimited. The first column has the token, and the remaining columns are the vector components, as text. The dictionary must be sorted by its leading tokens (considered as bytes).directory- a Lucene directory to write the dictionary to.dictName- Base name for the knn dictionary files.- Throws:
IOException
-
ramBytesUsed
public long ramBytesUsed()
Return the size of the dictionary in bytes
-
-