Vector indexing
With the introduction of the vector data-type a new type of index was introduced. A vector index is a dedicated index for indexing and searching through vectors.
To create this type of index use the following syntax:
CREATE VECTOR INDEX FOR <entity_pattern> ON <entity_attribute> OPTIONS <options>
The options are:
{
dimension: INT, // Required, length of the vector to be indexed
similarityFunction: STRING, // Required, currently only euclidean or cosine are allowed
M: INT, // Optional, maximum number of outgoing edges per node. default 16
efConstruction: INT, // Optional, number of candidates during construction. default 200
efRuntime: INT // Optional, number of candidates during search. default 10
}
For example, to create a vector index over all Product nodes description attribute use the following syntax:
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}")
await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
CREATE VECTOR INDEX FOR (p:Product) ON (p.description) OPTIONS {dimension:128, similarityFunction:'euclidean'}
Similarly to create a vector index over all Call relationships summary attribute use the following syntax:
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}")
await graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}").execute().await?;
graph.query("CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}");
CREATE VECTOR INDEX FOR ()-[e:Call]->() ON (e.summary) OPTIONS {dimension:128, similarityFunction:'euclidean'}
Important: When creating a vector index, both the vector dimension and similarity function must be provided. Currently, the only supported similarity functions are ‘euclidean’ or ‘cosine’.
Understanding Vector Index Parameters
Required Parameters
- dimension: The length of the vectors to be indexed. Must match the dimensionality of your embeddings (e.g., 128, 384, 768, 1536).
- similarityFunction: The distance metric used for similarity search:
euclidean: Euclidean distance (L2 norm). Best for embeddings where magnitude matters.cosine: Cosine similarity. Best for normalized embeddings where direction matters more than magnitude.
Optional Parameters
These parameters control the HNSW (Hierarchical Navigable Small World) index structure:
- M (default: 16): Maximum number of connections per node in the graph
- Higher values improve recall but increase memory usage and build time
- Recommended range: 12-48
- Use 16-32 for most applications
- efConstruction (default: 200): Number of candidates evaluated during index construction
- Higher values improve index quality but slow down indexing
- Recommended range: 100-400
- Use 200-300 for balanced quality/speed
- efRuntime (default: 10): Number of candidates evaluated during search
- Higher values improve recall but slow down queries
- Can be adjusted per-query for speed/accuracy tradeoffs
- Recommended: Start with 10, increase if recall is insufficient
Inserting vectors
To create a new vector use the vecf32 function as follows:
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})")
await graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})").execute().await?;
graph.query("CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})");
CREATE (p: Product {description: vecf32([2.1, 0.82, 1.3])})
The above query creates a new Product node with a description attribute containing a vector.
Query vector index
Vector indices are used to search for similar vectors to a given query vector using the similarity function as a measure of “distance”.
To query the index use either db.idx.vector.queryNodes for node retrieval or db.idx.vector.queryRelationships for relationships.
CALL db.idx.vector.queryNodes(
label: STRING,
attribute: STRING,
k: INTEGER,
query: VECTOR
) YIELD node, score
CALL db.idx.vector.queryRelationships(
relationshipType: STRING,
attribute: STRING,
k: INTEGER,
query: VECTOR
) YIELD relationship, score
To query up to 10 similar Product descriptions to a given query description vector issue the following procedure call:
result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node")
const result = await graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");
let result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node").execute().await?;
ResultSet result = graph.query("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32(<array_of_vector_elements>)) YIELD node");
CALL db.idx.vector.queryNodes(
'Product',
'description',
10,
vecf32(<array_of_vector_elements>),
) YIELD node
The procedure can yield both the indexed entity assigned to the found similar vector in addition to a similarity score of that entity.
Deleting a vector index
To remove a vector index, simply issue the drop index command as follows:
DROP VECTOR INDEX FOR <entity_pattern> (<entity_attribute>)
For example, to drop the vector index over Product description, invoke:
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)")
await graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)").execute().await?;
graph.query("DROP VECTOR INDEX FOR (p:Product) ON (p.description)");
DROP VECTOR INDEX FOR (p:Product) ON (p.description)
Index Management
Listing Vector Indexes
To view all indexes (including vector) in your graph, use:
CALL db.indexes()
Vector indexes are marked with type VECTOR and show the dimension and similarity function in the options field.
Verifying Vector Index Usage
To verify that a vector index is being used, examine the query execution plan:
# Query using vector index
query_vector = [2.1, 0.82, 1.3]
result = graph.explain(f"CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32({query_vector})) YIELD node RETURN node")
print(result)
# Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
const queryVector = [2.1, 0.82, 1.3];
const result = await graph.explain(`CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([${queryVector}])) YIELD node RETURN node`);
console.log(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
let result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node").execute().await?;
println!("{}", result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
// Query using vector index
float[] queryVector = {2.1f, 0.82f, 1.3f};
String result = graph.explain("CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node");
System.out.println(result);
// Output shows: ProcedureCall | db.idx.vector.queryNodes
# Query using vector index
GRAPH.EXPLAIN DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'description', 10, vecf32([2.1, 0.82, 1.3])) YIELD node RETURN node"
# Output shows: ProcedureCall | db.idx.vector.queryNodes
Performance Tradeoffs and Best Practices
When to Use Vector Indexes
Vector indexes are essential for:
- Semantic search: Finding similar items based on meaning, not just keywords
- Recommendation systems: Discovering similar products, content, or users
- RAG (Retrieval Augmented Generation): Retrieving relevant context for LLMs
- Duplicate detection: Finding near-duplicate items based on embeddings
- Image/audio similarity: When using vision or audio embedding models
Performance Considerations
Benefits:
- Enables efficient approximate nearest neighbor (ANN) search
- Scales to millions of vectors with sub-linear query time
- Supports both node and relationship vectors
Costs:
- Memory usage: Vector indexes are memory-intensive
- A 1M vector index with 768 dimensions (float32) requires ~3GB of memory
- Formula:
vectors × dimensions × 4 bytes + HNSW overhead (~20%)
- Build time: Index construction can be slow for large datasets
- Approximate results: Returns approximate (not exact) nearest neighbors
- No support for filtering: Vector queries don’t combine well with property filters
Recommendations:
- Choose appropriate vector dimensions (balance between quality and cost)
- Use cosine similarity for normalized embeddings (e.g., from OpenAI, Sentence Transformers)
- Use euclidean distance for unnormalized data
- Tune M and efConstruction based on your accuracy requirements
- Consider batch indexing for large datasets
- Monitor memory usage carefully
Similarity Function Tradeoffs
Cosine Similarity:
- Best for: Text embeddings, normalized vectors
- Measures: Angular distance between vectors
- Range: -1 to 1 (1 = identical direction)
- Use when: Vector magnitude is not meaningful
Euclidean Distance:
- Best for: Unnormalized data, physical measurements
- Measures: Straight-line distance between vectors
- Range: 0 to ∞ (0 = identical)
- Use when: Both direction and magnitude matter
Example: Realistic Vector Search
# Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}")
# Insert products with embeddings (embeddings would come from your model)
embedding = model.encode("laptop computer") # Your embedding model
graph.query(f"CREATE (p:Product {name: 'Laptop', embedding: vecf32({embedding.tolist()})})")
# Search for similar products
query_embedding = model.encode("notebook pc")
result = graph.query(f"CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({query_embedding.tolist()})) YIELD node, score RETURN node.name, score ORDER BY score DESC")
for record in result.result_set:
print(f"Product: {record[0]}, Similarity: {record[1]}")
// Create vector index for product embeddings
await graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");
// Insert products with embeddings (embeddings would come from your model)
const embedding = await model.encode("laptop computer"); // Your embedding model
await graph.query(`CREATE (p:Product {name: 'Laptop', embedding: vecf32([${embedding}])})`);
// Search for similar products
const queryEmbedding = await model.encode("notebook pc");
const result = await graph.query(`CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([${queryEmbedding}])) YIELD node, score RETURN node.name, score ORDER BY score DESC`);
for (const record of result.data) {
console.log(`Product: ${record['node.name']}, Similarity: ${record['score']}`);
}
// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}").execute().await?;
// Insert products with embeddings (embeddings would come from your model)
let embedding = model.encode("laptop computer"); // Your embedding model
graph.query(&format!("CREATE (p:Product {name: 'Laptop', embedding: vecf32({:?})})", embedding)).execute().await?;
// Search for similar products
let query_embedding = model.encode("notebook pc");
let result = graph.query(&format!("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32({:?})) YIELD node, score RETURN node.name, score ORDER BY score DESC", query_embedding)).execute().await?;
for record in result.data() {
println!("Product: {}, Similarity: {}", record["node.name"], record["score"]);
}
// Create vector index for product embeddings
graph.query("CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}");
// Insert products with embeddings (embeddings would come from your model)
float[] embedding = model.encode("laptop computer"); // Your embedding model
graph.query(String.format("CREATE (p:Product {name: 'Laptop', embedding: vecf32(%s)})", Arrays.toString(embedding)));
// Search for similar products
float[] queryEmbedding = model.encode("notebook pc");
ResultSet result = graph.query(String.format("CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32(%s)) YIELD node, score RETURN node.name, score ORDER BY score DESC", Arrays.toString(queryEmbedding)));
for (Record record : result) {
System.out.printf("Product: %s, Similarity: %s%n", record.get("node.name"), record.get("score"));
}
# Create vector index for product embeddings
GRAPH.QUERY DEMO_GRAPH "CREATE VECTOR INDEX FOR (p:Product) ON (p.embedding) OPTIONS {dimension:768, similarityFunction:'cosine', M:32, efConstruction:200}"
# Insert products with embeddings (embeddings would come from your model)
GRAPH.QUERY DEMO_GRAPH "CREATE (p:Product {name: 'Laptop', embedding: vecf32([0.1, 0.2, ...])})"
# Search for similar products
GRAPH.QUERY DEMO_GRAPH "CALL db.idx.vector.queryNodes('Product', 'embedding', 5, vecf32([0.15, 0.18, ...])) YIELD node, score RETURN node.name, score ORDER BY score DESC"
Troubleshooting
Common Issues:
- Dimension mismatch: Ensure all vectors have the same dimension as specified in the index
- Wrong similarity function: Use cosine for normalized vectors, euclidean for unnormalized
- Poor recall: Increase efRuntime or efConstruction parameters
- Slow queries: Decrease efRuntime or reduce k (number of results)
- High memory usage: Reduce M parameter or use lower-dimensional embeddings