Online Data Migration from SQL Sources to FalkorDB
The DM-SQL-to-FalkorDB repository provides Rust-based CLI loaders for initial load and ongoing sync from SQL sources into FalkorDB.
It also includes an optional control plane (web UI + REST API) for config authoring, schema/template scaffolding, run orchestration, state management, and persisted metrics/log history.
Supported sources
- BigQuery
- ClickHouse
- Databricks (Databricks SQL / warehouses)
- MariaDB
- MySQL
- PostgreSQL
- Snowflake
- Spark SQL (via Livy sessions)
- SQL Server
When to use this approach
Use these tools when you want:
- A one-time migration from SQL tables into a FalkorDB property graph
- Ongoing one-way sync so FalkorDB stays updated as rows change in the source system
Prerequisites
- Rust toolchain (Cargo)
- Network access to your source system
- A reachable FalkorDB endpoint (for example
falkor://127.0.0.1:6379) - Node.js + npm (optional; only needed for control plane UI)
Configurations typically reference environment variables for credentials/secrets (for example $BIGQUERY_ACCESS_TOKEN, $DATABRICKS_TOKEN, $MYSQL_URL, $POSTGRES_URL, $SNOWFLAKE_PASSWORD, $SPARK_AUTH_TOKEN, $SQLSERVER_CONNECTION_STRING).
Getting the tools
git clone https://github.com/FalkorDB/DM-SQL-to-FalkorDB.git
cd DM-SQL-to-FalkorDB
How the loaders work (high level)
Each loader uses JSON/YAML configuration to define:
- How to read rows from the source (
table+ optional filter, or custom query) - How rows map to graph entities:
- Nodes (labels, keys, property mappings)
- Edges (relationship type, direction, endpoint matching rules)
- Per-mapping execution mode (
fullorincremental) - Optional soft-delete behavior
- State backend for incremental watermarks (typically file-backed JSON)
Common concepts
- Declarative mapping: mappings define extraction + graph writes.
- Idempotent writes: loaders use Cypher
UNWIND+MERGE. - Incremental safety: watermarks move only after successful writes.
- Restart safety: failed runs can resume from persisted state.
- Index handling: loaders apply explicit
falkordb.indexesplus inferred indexes for node keys and edge endpoint matches. - Observability: loaders expose Prometheus-style metrics with global and per-mapping counters.
Option A: Run a loader directly (CLI)
BigQuery → FalkorDB
cd BigQuery-to-FalkorDB/bigquery-to-falkordb
cargo build --release
# One-shot run
cargo run --release -- --config ../bigquery_sample.yaml
# Continuous sync
cargo run --release -- --config ../bigquery_sample.yaml --daemon --interval-secs 60
ClickHouse → FalkorDB
cd ClickHouse-to-FalkorDB
cargo build --release
# One-shot run
cargo run --release -- --config clickhouse.incremental.yaml
# Continuous sync
cargo run --release -- --config clickhouse.incremental.yaml --daemon --interval-secs 60
Databricks → FalkorDB
cd Databricks-to-FalkorDB/databricks-to-falkordb
cargo build --release
cargo run --release -- --config path/to/config.yaml
Note: Databricks currently supports one-shot runs only (no daemon/purge flags).
MariaDB → FalkorDB
cd MariaDB-to-FalkorDB
cargo build --release
# One-shot run
cargo run --release -- --config mariadb.incremental.yaml
# Continuous sync
cargo run --release -- --config mariadb.incremental.yaml --daemon --interval-secs 60
MySQL → FalkorDB
cd MySQL-to-FalkorDB
cargo build --release
# One-shot run
cargo run --release -- --config mysql.incremental.yaml
# Continuous sync
cargo run --release -- --config mysql.incremental.yaml --daemon --interval-secs 60
PostgreSQL → FalkorDB
cd PostgreSQL-to-FalkorDB/postgres-to-falkordb
cargo build --release
# One-shot run
cargo run --release -- --config path/to/config.yaml
# Continuous sync
cargo run --release -- --config path/to/config.yaml --daemon --interval-secs 60
Note: PostgreSQL currently supports daemon mode but not purge flags.
Snowflake → FalkorDB
cd Snowflake-to-FalkorDB
cargo build --release
# One-shot run
cargo run --release -- --config path/to/config.yaml
# Continuous sync
cargo run --release -- --config path/to/config.yaml --daemon --interval-secs 300
Spark SQL (Livy) → FalkorDB
cd Spark-to-FalkorDB/spark-to-falkordb
cargo build --release
cargo run --release -- --config path/to/config.yaml
Note: Spark currently supports one-shot runs only (no daemon/purge flags).
SQL Server → FalkorDB
cd SQLServer-to-FalkorDB
cargo build --release
# One-shot run
cargo run --release -- --config sqlserver.incremental.yaml
# Continuous sync
cargo run --release -- --config sqlserver.incremental.yaml --daemon --interval-secs 60
Optional purge modes
Purge flags are supported by: BigQuery, ClickHouse, MariaDB, MySQL, Snowflake, SQL Server.
# Purge full graph before loading
cargo run --release -- --config path/to/config.yaml --purge-graph
# Purge selected mappings
cargo run --release -- --config path/to/config.yaml --purge-mapping customers
Schema introspection + template scaffolding
Supported by: BigQuery, ClickHouse, Databricks, MariaDB, MySQL, PostgreSQL, Snowflake, Spark, SQL Server.
# Print normalized source schema summary
cargo run --release -- --config path/to/config.yaml --introspect-schema
# Generate starter mapping template
cargo run --release -- --config path/to/config.yaml --generate-template --output scaffold.yaml
Notes:
- Generated templates are scaffolds and should be reviewed before production use.
- For tools that support daemon/purge modes, scaffold flags cannot be combined with daemon/purge flags in the same run.
Option B: Use the control plane (web UI + API)
The control plane discovers tools by scanning for tool.manifest.json and provides UI/API features to:
- Create and edit per-tool YAML/JSON configurations
- Preview extracted source schema
- Generate scaffold templates from source metadata
- Visualize graph topology from mappings
- Start runs (one-shot and daemon where supported)
- Stop active runs
- Stream live logs (SSE) and view persisted logs for historical runs
- Inspect and clear per-config incremental state/watermarks
- Persist and view per-tool/per-mapping metrics snapshots in SQLite
Start the server:
cd control-plane/server
# Optional: require API key on /api (except /api/health)
export CONTROL_PLANE_API_KEY="..."
cargo run --release
Default server URL: http://localhost:3003
Configuration (environment variables):
CONTROL_PLANE_BIND(default0.0.0.0:3003)CONTROL_PLANE_REPO_ROOT(optional; repository root for manifest scan)CONTROL_PLANE_DATA_DIR(defaultcontrol-plane/data/)CONTROL_PLANE_UI_DIST(defaultcontrol-plane/ui/dist/)CONTROL_PLANE_API_KEY(optional bearer token requirement)
Selected API endpoints:
GET /api/healthGET /api/tools,GET /api/tools/:tool_idPOST /api/tools/:tool_id/scaffold-templatePOST /api/tools/:tool_id/schema-graph-previewGET /api/configs,POST /api/configsGET /api/configs/:config_id,PUT /api/configs/:config_idGET /api/configs/:config_id/state,POST /api/configs/:config_id/state/clearGET /api/runs,POST /api/runsGET /api/runs/:run_id,POST /api/runs/:run_id/stopGET /api/runs/:run_id/events(SSE)GET /api/runs/:run_id/logsGET /api/metrics(optional?config_id=<uuid>)GET /api/metrics/:tool_id(optional?config_id=<uuid>)
Notes:
- Runs execute locally on the host machine running the control plane server.
- Runtime artifacts are persisted under
CONTROL_PLANE_DATA_DIR, including a SQLite DB and per-run files. - SSE auth with API key may use query-string token fallback because browser
EventSourcedoes not support custom auth headers.
UI development (optional):
cd control-plane/ui
npm install
npm run dev
Metrics feature
All current SQL loaders expose Prometheus-style metrics with:
- Global counters:
- total runs
- failed runs
- rows fetched
- rows written
- rows deleted
- Per-mapping counters:
- runs
- failed runs
- rows fetched
- rows written
- rows deleted
Default metrics ports:
- BigQuery:
9995 - ClickHouse:
9991 - Databricks:
9994 - MariaDB:
9997 - MySQL:
9995 - PostgreSQL:
9993 - Snowflake:
9992 - Spark:
9997 - SQL Server:
9996
Use --metrics-port (or each loader’s corresponding *_TO_FALKORDB_METRICS_PORT environment variable) to override defaults, especially when running multiple loaders concurrently.
Operational tips
- Define node mappings before edge mappings (edges depend on nodes).
- Choose stable keys for
MERGE(primary keys are usually best). - Treat scaffold-generated mappings as a starting point; always review relationship semantics and incremental/delete logic.
- Use
RUST_LOG=info(ordebug) for richer loader diagnostics. - Keep state files and control-plane data on durable storage for long-running sync setups.
Additional resources
- DM-SQL-to-FalkorDB repository: https://github.com/FalkorDB/DM-SQL-to-FalkorDB
- FalkorDB docs: https://docs.falkordb.com/