Vector indexing

FalkorDB’s vector index uses the HNSW (Hierarchical Navigable Small World) algorithm with cosine similarity or Euclidean distance, supporting 1–4096-dimensional vectors.

With the introduction of the vector data-type a new type of index was introduced. A vector index is a dedicated index for indexing and searching through vectors.

To create this type of index use the following syntax:

CREATE VECTOR INDEX FOR <entity_pattern> ON <entity_attribute> OPTIONS <options>

The options are:

{
   dimension: INT, // Required, length of the vector to be indexed
   similarityFunction: STRING, // Required, currently only euclidean or cosine are allowed
   M: INT, // Optional, maximum number of outgoing edges per node. default 16
   efConstruction: INT, // Optional, number of candidates during construction. default 200
   efRuntime: INT // Optional, number of candidates during search. default 10
}

For example, to create a vector index over all Product nodes description attribute use the following syntax:

python javascript rust java shell


graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}")


await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");


graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;


graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");


CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}

Similarly to create a vector index over all Call relationships summary attribute use the following syntax:

python javascript rust java shell


graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}")


await graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");


graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;


graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");


CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}

Important: When creating a vector index, both the vector dimension and similarity function must be provided. Currently, the only supported similarity functions are ‘euclidean’ or ‘cosine’.

Understanding Vector Index Parameters

Required Parameters

dimension: The length of the vectors to be indexed. Must match the dimensionality of your embeddings (e.g., 128, 384, 768, 1536).
similarityFunction: The distance metric used for similarity search:
- euclidean: Euclidean distance (L2 norm). Best for embeddings where magnitude matters.
- cosine: Cosine similarity. Best for normalized embeddings where direction matters more than magnitude.

Note: The supported dimension range is 1–4096.

Optional Parameters

These parameters control the HNSW (Hierarchical Navigable Small World) index structure:

M (default: 16): Maximum number of connections per node in the graph
- Higher values improve recall but increase memory usage and build time
- Recommended range: 12-48
- Use 16-32 for most applications
efConstruction (default: 200): Number of candidates evaluated during index construction
- Higher values improve index quality but slow down indexing
- Recommended range: 100-400
- Use 200-300 for balanced quality/speed
efRuntime (default: 10): Number of candidates evaluated during search
- Higher values improve recall but slow down queries
- Can be adjusted per-query for speed/accuracy tradeoffs
- Recommended: Start with 10, increase if recall is insufficient

Inserting vectors

To create a new vector use the vecf32 function as follows:

python javascript rust java shell


graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})")


await graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");


graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})").execute().await?;


graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");


CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})

The above query creates a new Product node with a description attribute containing a vector.

Query vector index

Vector indices are used to search for similar vectors to a given query vector using the similarity function as a measure of “distance”.

To query the index use either db.idx.vector.queryNodes for node retrieval or db.idx.vector.queryRelationships for relationships.

CALL db.idx.vector.queryNodes(
    label: STRING,
    attribute: STRING,
    k: INTEGER,
    query: VECTOR
) YIELD node, score

CALL db.idx.vector.queryRelationships(
    relationshipType: STRING,
    attribute: STRING,
    k: INTEGER,
    query: VECTOR
) YIELD relationship, score

To query up to 10 similar Product descriptions to a given query description vector issue the following procedure call:

python javascript rust java shell


result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node")


const result = await graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");


let result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node").execute().await?;


ResultSet result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");


CALL db.idx.vector.queryNodes(
    'Product',
    'description',
    10,
    vecf32(<array_of_vector_elements>)
    ) YIELD node

The procedure can yield both the indexed entity assigned to the found similar vector in addition to a similarity score of that entity.

Deleting a vector index

To remove a vector index, simply issue the drop index command as follows:

DROP VECTOR INDEX FOR <entity_pattern> (<entity_attribute>)

For example, to drop the vector index over Product description, invoke:

python javascript rust java shell


graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)")


await graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");


graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)").execute().await?;


graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");


DROP VECTOR INDEX FOR (p:Product) ON (p.description)

Index Management

Listing Vector Indexes

To view all indexes (including vector) in your graph, use:

CALL db.indexes()

Vector indexes are marked with type VECTOR and show the dimension and similarity function in the options field.

Verifying Vector Index Usage

To verify that a vector index is being used, examine the query execution plan:

python javascript rust java shell


# Query using vector index
query_vector = [2.1, 0.82, 1.3]
result = graph.explain(f"CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32({query_vector})) YIELD node RETURN node")
print(result)
# Output shows: ProcedureCall | db.idx.vector.queryNodes


// Query using vector index
const queryVector = [2.1, 0.82, 1.3];
const result = await graph.explain(`CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([${queryVector}])) YIELD node RETURN node`);
console.log(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes


// Query using vector index
let result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node").execute().await?;
println!("{}", result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes


// Query using vector index
float[] queryVector = {2.1f, 0.82f, 1.3f};
String result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node");
System.out.println(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes


# Query using vector index
GRAPH.EXPLAIN DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node"
# Output shows: ProcedureCall | db.idx.vector.queryNodes

Performance Tradeoffs and Best Practices

When to Use Vector Indexes

Vector indexes are essential for:

Semantic search: Finding similar items based on meaning, not just keywords
Recommendation systems: Discovering similar products, content, or users
RAG (Retrieval Augmented Generation): Retrieving relevant context for LLMs
Duplicate detection: Finding near-duplicate items based on embeddings
Image/audio similarity: When using vision or audio embedding models

Performance Considerations

Benefits:

Enables efficient approximate nearest neighbor (ANN) search
Scales to millions of vectors with sub-linear query time
Supports both node and relationship vectors

Costs:

Memory usage: Vector indexes are memory-intensive
- A 1M vector index with 768 dimensions (float32) requires ~3GB of memory
- Formula: vectors × dimensions × 4 bytes + HNSW overhead (~20%)
Build time: Index construction can be slow for large datasets
Approximate results: Returns approximate (not exact) nearest neighbors
No support for filtering: Vector queries don’t combine well with property filters

Recommendations:

Choose appropriate vector dimensions (balance between quality and cost)
Use cosine similarity for normalized embeddings (e.g., from OpenAI, Sentence Transformers)
Use euclidean distance for unnormalized data
Tune M and efConstruction based on your accuracy requirements
Consider batch indexing for large datasets
Monitor memory usage carefully

Similarity Function Tradeoffs

Cosine Similarity:

Best for: Text embeddings, normalized vectors
Measures: Angular distance between vectors
Range: -1 to 1 (1 = identical direction)
Use when: Vector magnitude is not meaningful

Euclidean Distance:

Best for: Unnormalized data, physical measurements
Measures: Straight-line distance between vectors
Range: 0 to ∞ (0 = identical)
Use when: Both direction and magnitude matter

Example: Realistic Vector Search

python javascript rust java shell


# Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}")

# Insert products with embeddings (embeddings would come from your model)
embedding = model.encode("laptop computer")  # Your embedding model
graph.query(f"CREATE (p:Product {name: 'Laptop', embedding: vecf32({embedding.tolist()})})")

# Search for similar products
query_embedding = model.encode("notebook pc")
result = graph.query(f"CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({query_embedding.tolist()})) YIELD node, score RETURN node.name, score ORDER BY score DESC")
for record in result.result_set:
    print(f"Product: {record[0]}, Similarity: {record[1]}")


// Create vector index for product embeddings
await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");

// Insert products with embeddings (embeddings would come from your model)
const embedding = await model.encode("laptop computer");  // Your embedding model
await graph.query(`CREATE (p:Product {name: 'Laptop', embedding: vecf32([${embedding}])})`);

// Search for similar products
const queryEmbedding = await model.encode("notebook pc");
const result = await graph.query(`CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([${queryEmbedding}])) YIELD node, score RETURN node.name, score ORDER BY score DESC`);
for (const record of result.data) {
    console.log(`Product: ${record['node.name']}, Similarity: ${record['score']}`);
}


// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}").execute().await?;

// Insert products with embeddings (embeddings would come from your model)
let embedding = model.encode("laptop computer");  // Your embedding model
graph.query(&format!("CREATE (p:Product {name: 'Laptop', embedding: vecf32({:?})})", embedding)).execute().await?;

// Search for similar products
let query_embedding = model.encode("notebook pc");
let result = graph.query(&format!("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({:?})) YIELD node, score RETURN node.name, score ORDER BY score DESC", query_embedding)).execute().await?;
for record in result.data() {
    println!("Product: {}, Similarity: {}", record["node.name"], record["score"]);
}


// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");

// Insert products with embeddings (embeddings would come from your model)
float[] embedding = model.encode("laptop computer");  // Your embedding model
graph.query(String.format("CREATE (p:Product {name: 'Laptop', embedding: vecf32(%s)})", Arrays.toString(embedding)));

// Search for similar products
float[] queryEmbedding = model.encode("notebook pc");
ResultSet result = graph.query(String.format("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32(%s)) YIELD node, score RETURN node.name, score ORDER BY score DESC", Arrays.toString(queryEmbedding)));
for (Record record : result) {
    System.out.printf("Product: %s, Similarity: %s%n", record.get("node.name"), record.get("score"));
}


# Create vector index for product embeddings
GRAPH.QUERY DEMO_GRAPH "CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}"

# Insert products with embeddings (embeddings would come from your model)
GRAPH.QUERY DEMO_GRAPH "CREATE (p:Product {name: 'Laptop', embedding: vecf32([0.1, 0.2, ...])})"

# Search for similar products
GRAPH.QUERY DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([0.15, 0.18, ...])) YIELD node, score RETURN node.name, score ORDER BY score DESC"

Troubleshooting

Common Issues:

Dimension mismatch: Ensure all vectors have the same dimension as specified in the index
Wrong similarity function: Use cosine for normalized vectors, euclidean for unnormalized
Poor recall: Increase efRuntime or efConstruction parameters
Slow queries: Decrease efRuntime or reduce k (number of results)
High memory usage: Reduce M parameter or use lower-dimensional embeddings

Frequently Asked Questions 5

What similarity functions are supported for vector indexes?

FalkorDB supports euclidean distance and cosine similarity. Use cosine for normalized vectors (e.g., from embedding models) and euclidean for unnormalized vectors.

How do I query a vector index?

Use CALL db.idx.vector.queryNodes('Label', 'attribute', k, vecf32([...])) where k is the number of nearest neighbors to return and the last argument is your query vector.

What vector dimensions are supported?

Vector indexes support any dimension, but all vectors for a given index must have the same dimension as specified during index creation. Mismatched dimensions will cause errors.

Can I create vector indexes on relationships?

Yes. Use CALL db.idx.vector.queryRelationships('RelType', 'attribute', k, query_vector) to perform similarity search on relationship vector properties.

How do I improve vector search recall?

Increase the efRuntime parameter for higher recall at the cost of query latency. For build-time quality, increase efConstruction and M parameters when creating the index.

Vector indexing

Understanding Vector Index Parameters

Required Parameters

Optional Parameters

See Also

Inserting vectors

Query vector index

Deleting a vector index

Index Management

Listing Vector Indexes

Verifying Vector Index Usage

Performance Tradeoffs and Best Practices

When to Use Vector Indexes

Performance Considerations

Similarity Function Tradeoffs

Example: Realistic Vector Search

Troubleshooting