Bulk Loader

The falkordb-bulk-loader is a Python utility for building FalkorDB graphs from CSV files. It uses the GRAPH.BULK endpoint to import nodes and relationships efficiently in binary batches — much faster than issuing individual CREATE queries.

Requirements

  • Python 3.10 or later
  • A running FalkorDB instance (see Get Started)

Installation

pip install falkordb-bulk-loader

Quick Start

Given two CSV files — Person.csv (nodes) and KNOWS.csv (relationships) — import them into a graph named SocialGraph:

falkordb-bulk-insert SocialGraph \
  -n Person.csv \
  -r KNOWS.csv

The label (for nodes) and relationship type (for relationships) are derived from the CSV filename. Multiple node and relation files can be provided by repeating the flags:

falkordb-bulk-insert SocialGraph \
  -n Person.csv \
  -n Country.csv \
  -r KNOWS.csv \
  -r VISITED.csv

Connecting to FalkorDB

By default the loader connects to redis://127.0.0.1:6379. Use --server-url to point it at a different instance:

falkordb-bulk-insert SocialGraph \
  --server-url redis://myhost:6379 \
  -n Person.csv

Key Options

Flag Extended flag Description
-u --server-url TEXT Server URL (default: redis://127.0.0.1:6379)
-n --nodes TEXT Node CSV file (filename → label)
-N --nodes-with-label TEXT Explicit label followed by node CSV file
-r --relations TEXT Relationship CSV file (filename → type)
-R --relations-with-type TEXT Explicit type followed by relationship CSV file
-o --separator CHAR Field delimiter (default: ,)
-d --enforce-schema Require typed column headers (see below)
-j --id-type TEXT Type of node ID property: STRING or INTEGER
-s --skip-invalid-nodes Skip duplicate node IDs instead of erroring
-e --skip-invalid-edges Skip edges with unknown endpoints instead of erroring
-i --index Label:Property Create a range index after import
-f --full-text-index Label:Property Create a full-text index after import

Enforcing a Schema

By default the loader infers each property’s type. Use --enforce-schema (-d) when you want explicit control. Column headers must follow the name:TYPE format:

User.csv

:ID(User),name:STRING,rank:INT
0,"Alice",5
1,"Bob",8

FOLLOWS.csv

:START_ID(User),:END_ID(User),weight:DOUBLE
0,1,0.9
1,0,0.4
falkordb-bulk-insert SocialGraph \
  --enforce-schema \
  -n User.csv \
  -r FOLLOWS.csv

Accepted type strings: ID, START_ID, END_ID, IGNORE, STRING, INT / INTEGER / LONG, DOUBLE / FLOAT, BOOL / BOOLEAN, ARRAY.

Bulk Updates

The companion command falkordb-bulk-update reads a CSV in batches and issues a parameterized Cypher query for each row — useful for incremental updates or when you want full control over the Cypher:

falkordb-bulk-update SocialGraph \
  --csv User.csv \
  --query "MERGE (:User {id: row[0], name: row[1], rank: row[2]})"

Note: falkordb-bulk-update commits changes incrementally. Sanitize your CSV inputs beforehand to avoid leaving the graph in a partially-updated state.

Diagnostics

Both falkordb-bulk-insert and falkordb-bulk-update install a SIGUSR1 handler at startup. Sending SIGUSR1 to a running loader process writes the tracebacks of all Python threads to stderr, which is useful for diagnosing hangs or unexpectedly slow loads without attaching a debugger:

kill -SIGUSR1 <pid>

This relies on Python’s faulthandler module and is only available on platforms that support SIGUSR1 (i.e., not Windows). On unsupported platforms, registration is silently skipped.

Further Reading

Frequently Asked Questions 5
How much faster is the bulk loader compared to individual CREATE queries?

The bulk loader uses the GRAPH.BULK endpoint to import data in binary batches, which is orders of magnitude faster than issuing individual Cypher CREATE queries. For large datasets (millions of nodes/edges), expect 10-100x speed improvements.

What Python version is required for the bulk loader?

The falkordb-bulk-loader requires Python 3.10 or later. Install it with pip install falkordb-bulk-loader.

How are node labels and relationship types determined from CSV files?

By default, the filename determines the label or relationship type. For example, Person.csv creates nodes with label Person, and KNOWS.csv creates relationships of type KNOWS. Use -N or -R flags to specify explicit labels/types.

Can I update existing data with the bulk loader?

Use falkordb-bulk-update for incremental updates. It reads a CSV in batches and executes a parameterized Cypher query for each row. Note that it commits changes incrementally, so sanitize your CSV inputs beforehand.

What data types are supported in schema-enforced mode?

When using --enforce-schema, supported types include: STRING, INT/INTEGER/LONG, DOUBLE/FLOAT, BOOL/BOOLEAN, ARRAY, plus special types ID, START_ID, END_ID, and IGNORE.