Vector indexing
FalkorDB’s vector index uses the HNSW (Hierarchical Navigable Small World) algorithm with cosine similarity or Euclidean distance, supporting 1–4096-dimensional vectors.
With the introduction of the vector data-type a new type of index was introduced.
A vector index is a dedicated index for indexing and searching through vectors.
To create this type of index use the following syntax:
CREATE VECTOR INDEX FOR <entity_pattern> ON <entity_attribute> OPTIONS <options>
The options are:
{
dimension: INT, // Required, length of the vector to be indexed
similarityFunction: STRING, // Required, currently only euclidean or cosine are allowed
M: INT, // Optional, maximum number of outgoing edges per node. default 16
efConstruction: INT, // Optional, number of candidates during construction. default 200
efRuntime: INT // Optional, number of candidates during search. default 10
}
For example, to create a vector index over all Product nodes description attribute
use the following syntax:
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}")
await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}
Similarly to create a vector index over all Call relationships summary attribute
use the following syntax:
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}")
await graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}
Important: When creating a vector index, both the vector dimension and similarity function must be provided. Currently, the only supported similarity functions are ‘euclidean’ or ‘cosine’.
Understanding Vector Index Parameters
Required Parameters
- dimension: The length of the vectors to be indexed. Must match the dimensionality of your embeddings (e.g., 128, 384, 768, 1536).
- similarityFunction: The distance metric used for similarity search:
euclidean: Euclidean distance (L2 norm). Best for embeddings where magnitude matters.cosine: Cosine similarity. Best for normalized embeddings where direction matters more than magnitude.
Note: The supported dimension range is 1–4096.
Optional Parameters
These parameters control the HNSW (Hierarchical Navigable Small World) index structure:
- M (default: 16): Maximum number of connections per node in the graph
- Higher values improve recall but increase memory usage and build time
- Recommended range: 12-48
- Use 16-32 for most applications
- efConstruction (default: 200): Number of candidates evaluated during index construction
- Higher values improve index quality but slow down indexing
- Recommended range: 100-400
- Use 200-300 for balanced quality/speed
- efRuntime (default: 10): Number of candidates evaluated during search
- Higher values improve recall but slow down queries
- Can be adjusted per-query for speed/accuracy tradeoffs
- Recommended: Start with 10, increase if recall is insufficient
See Also
- Full-text Index — for keyword and text-based search
- Range Index — for numeric and string range queries
Tip: Use a vector index for semantic similarity search, a full-text index for keyword search, and a range index for exact or range-based property lookups.
Inserting vectors
To create a new vector use the vecf32 function as follows:
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})")
await graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})").execute().await?;
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");
CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})
The above query creates a new Product node with a description attribute containing a vector.
Query vector index
Vector indices are used to search for similar vectors to a given query vector using the similarity function as a measure of “distance”.
To query the index use either db.idx.vector.queryNodes for node retrieval or
db.idx.vector.queryRelationships for relationships.
CALL db.idx.vector.queryNodes(
label: STRING,
attribute: STRING,
k: INTEGER,
query: VECTOR
) YIELD node, score
CALL db.idx.vector.queryRelationships(
relationshipType: STRING,
attribute: STRING,
k: INTEGER,
query: VECTOR
) YIELD relationship, score
To query up to 10 similar Product descriptions to a given query description vector
issue the following procedure call:
result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node")
const result = await graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");
let result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node").execute().await?;
ResultSet result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");
CALL db.idx.vector.queryNodes(
'Product',
'description',
10,
vecf32(<array_of_vector_elements>)
) YIELD node
The procedure can yield both the indexed entity assigned to the found similar vector in addition to a similarity score of that entity.
Deleting a vector index
To remove a vector index, simply issue the drop index command as follows:
DROP VECTOR INDEX FOR <entity_pattern> (<entity_attribute>)
For example, to drop the vector index over Product description, invoke:
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)")
await graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)").execute().await?;
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");
DROP VECTOR INDEX FOR (p:Product) ON (p.description)
Index Management
Listing Vector Indexes
To view all indexes (including vector) in your graph, use:
CALL db.indexes()
Vector indexes are marked with type VECTOR and show the dimension and similarity function in the options field.
Verifying Vector Index Usage
To verify that a vector index is being used, examine the query execution plan:
# Query using vector index
query_vector = [2.1, 0.82, 1.3]
result = graph.explain(f"CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32({query_vector})) YIELD node RETURN node")
print(result)
# Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
const queryVector = [2.1, 0.82, 1.3];
const result = await graph.explain(`CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([${queryVector}])) YIELD node RETURN node`);
console.log(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
let result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node").execute().await?;
println!("{}", result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
float[] queryVector = {2.1f, 0.82f, 1.3f};
String result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node");
System.out.println(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
# Query using vector index
GRAPH.EXPLAIN DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node"
# Output shows: ProcedureCall | db.idx.vector.queryNodes
Performance Tradeoffs and Best Practices
When to Use Vector Indexes
Vector indexes are essential for:
- Semantic search: Finding similar items based on meaning, not just keywords
- Recommendation systems: Discovering similar products, content, or users
- RAG (Retrieval Augmented Generation): Retrieving relevant context for LLMs
- Duplicate detection: Finding near-duplicate items based on embeddings
- Image/audio similarity: When using vision or audio embedding models
Performance Considerations
Benefits:
- Enables efficient approximate nearest neighbor (ANN) search
- Scales to millions of vectors with sub-linear query time
- Supports both node and relationship vectors
Costs:
- Memory usage: Vector indexes are memory-intensive
- A 1M vector index with 768 dimensions (float32) requires ~3GB of memory
- Formula:
vectors × dimensions × 4 bytes + HNSW overhead (~20%)
- Build time: Index construction can be slow for large datasets
- Approximate results: Returns approximate (not exact) nearest neighbors
- No support for filtering: Vector queries don’t combine well with property filters
Recommendations:
- Choose appropriate vector dimensions (balance between quality and cost)
- Use cosine similarity for normalized embeddings (e.g., from OpenAI, Sentence Transformers)
- Use euclidean distance for unnormalized data
- Tune M and efConstruction based on your accuracy requirements
- Consider batch indexing for large datasets
- Monitor memory usage carefully
Similarity Function Tradeoffs
Cosine Similarity:
- Best for: Text embeddings, normalized vectors
- Measures: Angular distance between vectors
- Range: -1 to 1 (1 = identical direction)
- Use when: Vector magnitude is not meaningful
Euclidean Distance:
- Best for: Unnormalized data, physical measurements
- Measures: Straight-line distance between vectors
- Range: 0 to ∞ (0 = identical)
- Use when: Both direction and magnitude matter
Example: Realistic Vector Search
# Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}")
# Insert products with embeddings (embeddings would come from your model)
embedding = model.encode("laptop computer") # Your embedding model
graph.query(f"CREATE (p:Product {name: 'Laptop', embedding: vecf32({embedding.tolist()})})")
# Search for similar products
query_embedding = model.encode("notebook pc")
result = graph.query(f"CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({query_embedding.tolist()})) YIELD node, score RETURN node.name, score ORDER BY score DESC")
for record in result.result_set:
print(f"Product: {record[0]}, Similarity: {record[1]}")
// Create vector index for product embeddings
await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");
// Insert products with embeddings (embeddings would come from your model)
const embedding = await model.encode("laptop computer"); // Your embedding model
await graph.query(`CREATE (p:Product {name: 'Laptop', embedding: vecf32([${embedding}])})`);
// Search for similar products
const queryEmbedding = await model.encode("notebook pc");
const result = await graph.query(`CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([${queryEmbedding}])) YIELD node, score RETURN node.name, score ORDER BY score DESC`);
for (const record of result.data) {
console.log(`Product: ${record['node.name']}, Similarity: ${record['score']}`);
}
// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}").execute().await?;
// Insert products with embeddings (embeddings would come from your model)
let embedding = model.encode("laptop computer"); // Your embedding model
graph.query(&format!("CREATE (p:Product {name: 'Laptop', embedding: vecf32({:?})})", embedding)).execute().await?;
// Search for similar products
let query_embedding = model.encode("notebook pc");
let result = graph.query(&format!("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({:?})) YIELD node, score RETURN node.name, score ORDER BY score DESC", query_embedding)).execute().await?;
for record in result.data() {
println!("Product: {}, Similarity: {}", record["node.name"], record["score"]);
}
// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");
// Insert products with embeddings (embeddings would come from your model)
float[] embedding = model.encode("laptop computer"); // Your embedding model
graph.query(String.format("CREATE (p:Product {name: 'Laptop', embedding: vecf32(%s)})", Arrays.toString(embedding)));
// Search for similar products
float[] queryEmbedding = model.encode("notebook pc");
ResultSet result = graph.query(String.format("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32(%s)) YIELD node, score RETURN node.name, score ORDER BY score DESC", Arrays.toString(queryEmbedding)));
for (Record record : result) {
System.out.printf("Product: %s, Similarity: %s%n", record.get("node.name"), record.get("score"));
}
# Create vector index for product embeddings
GRAPH.QUERY DEMO_GRAPH "CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}"
# Insert products with embeddings (embeddings would come from your model)
GRAPH.QUERY DEMO_GRAPH "CREATE (p:Product {name: 'Laptop', embedding: vecf32([0.1, 0.2, ...])})"
# Search for similar products
GRAPH.QUERY DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([0.15, 0.18, ...])) YIELD node, score RETURN node.name, score ORDER BY score DESC"
Troubleshooting
Common Issues:
- Dimension mismatch: Ensure all vectors have the same dimension as specified in the index
- Wrong similarity function: Use cosine for normalized vectors, euclidean for unnormalized
- Poor recall: Increase efRuntime or efConstruction parameters
- Slow queries: Decrease efRuntime or reduce k (number of results)
- High memory usage: Reduce M parameter or use lower-dimensional embeddings
Frequently Asked Questions 5
What similarity functions are supported for vector indexes?
FalkorDB supports euclidean distance and cosine similarity. Use cosine for normalized vectors (e.g., from embedding models) and euclidean for unnormalized vectors.
How do I query a vector index?
Use CALL db.idx.vector.queryNodes('Label', 'attribute', k, vecf32([...])) where k is the number of nearest neighbors to return and the last argument is your query vector.
What vector dimensions are supported?
Vector indexes support any dimension, but all vectors for a given index must have the same dimension as specified during index creation. Mismatched dimensions will cause errors.
Can I create vector indexes on relationships?
Yes. Use CALL db.idx.vector.queryRelationships('RelType', 'attribute', k, query_vector) to perform similarity search on relationship vector properties.
How do I improve vector search recall?
Increase the efRuntime parameter for higher recall at the cost of query latency. For build-time quality, increase efConstruction and M parameters when creating the index.