RDF to FalkorDB Migration

A comprehensive multi-step process to migrate data from RDF (TTL) based graph data stores into FalkorDB property graph database.

Overview

RDF data files contain triplets of subject-predicate-object which can specify property assignment to an entity or a relationship assignment to another entity. This migration tool bridges the gap between RDF and FalkorDB by:

Extracting the schema from RDF data files
Generating a configurable JSON configuration file
Exporting nodes and edges to CSV files formatted for FalkorDB
Loading the CSV files into FalkorDB

The process ensures complete data migration including all entities, relationships, properties, and metadata.

Features

Schema Extraction: Automatically extracts ontology from RDF/TTL files
Configurable Mapping: JSON-based configuration for customizing data mapping
URI Shortening: Optional conversion of long URIs to shorter representations
CSV Export: Generates properly formatted CSV files for nodes and edges
Flexible Loading: Multiple loading options with batch processing support
Data Preview: Optional CSV output preview during export

Prerequisites

Python 3.6+
Required Python packages (see requirements.txt in the repository)
FalkorDB instance (local, Docker, or Cloud)
RDF/TTL data files

Installation

Clone the migration repository:

git clone https://github.com/FalkorDB/rdf-to-falkordb.git
cd rdf-to-falkordb

Install required dependencies:

pip3 install -r requirements.txt

Step 1: Locating Your RDF (TTL) Data File

Place your TTL file(s) in the import directory. The repository includes several sample files that you can use for testing.

Step 2: Extracting & Reviewing Ontology

In this step, you will extract the ontology (schema) from the TTL file. The rdf_to_csv_extractor.py script handles both ontology extraction and data export.

Basic Ontology Extraction

python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> --extract-ontology config/<your-config.json>

Command Line Options

Option	Description	Required	Default
`ttl_file`	Path to the TTL/RDF file to process	Yes	-
`--config`	Path to ontology configuration JSON file	No	-
`--extract-ontology`	Extract ontology and save to specified file	No	extracted_ontology_config.json
`--output-dir`	Output directory for CSV files	No	csv_output
`--shorten-uris`	Convert long URIs to shorter representations	No	False
`--uri-prefixes`	JSON file with custom URI prefix mappings	No	-
`--csv-output-peek`	Show preview (first 3 lines) of generated CSV files	No	False

Review Configuration

After extracting the ontology, review the generated JSON configuration file. You can modify the configuration to customize how RDF triplets are mapped to FalkorDB nodes and relationships.

Step 3: Extracting Data from TTL File

Once you have reviewed and optionally customized the ontology configuration, export the data to CSV files.

Basic Data Export

python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> --config config/<your-config.json> --output-dir <your-project-subfolder>

Advanced Export with URI Shortening

python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> \
  --config config/<your-config.json> \
  --shorten-uris \
  --output-dir <your-project-subfolder> \
  --csv-output-peek

Output Structure

The export script generates the following files:

Node CSV Files:

Format: nodes_<NodeType>.csv
Contains: Entity IDs, labels, and properties
Example: nodes_InChIkey.csv, nodes_LCMSFeature.csv

Edge CSV Files:

Format: edges_<EdgeType>.csv
Contains: Source ID, source label, target ID, target label, relationship type
Example: edges_HAS_CANOPUS_ANNOTATION.csv

Step 4: Loading Data into FalkorDB

You have two options for loading the exported CSV files into FalkorDB:

Option 1: Using Python Loader (Included in Repository)

The Python loader (falkordb_csv_loader.py) provides a straightforward way to load CSV files directly into FalkorDB.

Basic Usage

python3 falkordb_csv_loader.py <graph-name> --csv-dir <your-project-subfolder>

Advanced Usage

python3 falkordb_csv_loader.py <graph-name> \
  --host localhost \
  --port 6379 \
  --username myuser \
  --password mypass \
  --csv-dir <your-project-subfolder> \
  --batch-size 5000 \
  --merge-mode \
  --stats

Command-Line Options

Option	Description	Default
`graph_name`	Target graph name in FalkorDB (required)	-
`--host`	FalkorDB host	localhost
`--port`	FalkorDB port	6379
`--username`	FalkorDB username (optional)	-
`--password`	FalkorDB password (optional)	-
`--batch-size`	Batch size for loading	5000
`--stats`	Show graph statistics after loading	False
`--csv-dir`	Directory containing CSV files	csv_output
`--merge-mode`	Use MERGE instead of CREATE for upsert behavior	False

Example Output

Loading nodes from vgf/nodes_Material.csv...
  Read 1 rows from vgf/nodes_Material.csv
[2025-08-03 13:50:21] Batch complete: Loaded 1 nodes (Duration: 0:00:00.000857)
[2025-08-03 13:50:21] ✅ Loaded 1 Material nodes (Duration: 0:00:00.002075)

Loading edges from vgf/edges_HAS_CANOPUS_ANNOTATION.csv...
  Read 233 rows from vgf/edges_HAS_CANOPUS_ANNOTATION.csv
[2025-08-03 13:51:55] Batch complete: Loaded 233 edges (Duration: 0:00:00.101047)
[2025-08-03 13:51:55] ✅ Loaded 233 HAS_CANOPUS_ANNOTATION relationships (Duration: 0:00:00.102690)

✅ Successfully loaded data into graph 'VGF141'

Option 2: Using FalkorDB Rust Loader (Recommended for Large Datasets)

For better performance with large datasets, use the FalkorDB Rust Loader.

Installation

git clone https://github.com/FalkorDB/FalkorDB-Loader-RS
cd FalkorDB-Loader-RS
cargo build --release

Basic Usage

./target/release/falkordb-loader my_graph --csv-dir <your-project-subfolder>

Advanced Usage

./target/release/falkordb-loader my_graph \
  --host localhost \
  --port 6379 \
  --username myuser \
  --password mypass \
  --csv-dir <your-project-subfolder> \
  --batch-size 5000 \
  --merge-mode \
  --stats \
  --progress-interval 1000

Performance Features

The Rust loader provides significant advantages:

Async Operations: All database operations use async/await for better concurrency
Batch Processing: Processes multiple records per query (configurable batch size)
Memory Efficient: Streams data from CSV files without loading everything into memory
Progress Tracking: Real-time progress updates during loading
Error Handling: Comprehensive error handling with detailed logging

Example Migration Flow

Here’s a complete example migrating a VGF141 dataset:

1. Extract Ontology

python3 rdf_to_csv_extractor.py import/VGF141.ttl --extract-ontology config/vgf141_config.json

2. Review and Export Data

python3 rdf_to_csv_extractor.py import/VGF141.ttl \
  --config config/vgf141_config.json \
  --shorten-uris \
  --output-dir vgf \
  --csv-output-peek

3. Load into FalkorDB

python3 falkordb_csv_loader.py VGF141 --csv-dir vgf --merge-mode --stats

Data Mapping

URI Shortening

The tool supports converting long URIs to shorter, more manageable representations. This is particularly useful for:

Reducing storage requirements
Improving readability
Simplifying queries
Custom prefix mappings via JSON configuration

Property Handling

The migration process preserves:

Simple scalar properties (strings, numbers, booleans)
Complex nested values
Lists and arrays
Metadata and annotations

Troubleshooting

Common Issues

Missing Dependencies: Ensure all Python packages from requirements.txt are installed
File Not Found: Verify the TTL file path is correct and accessible
Memory Issues: For very large RDF files, consider processing in smaller chunks
URI Format Issues: Review URI prefix mappings if shortened URIs are not formatted correctly

Debug Tips

Use --csv-output-peek to preview generated CSV files during export
Enable verbose logging by modifying the script
Test with smaller sample datasets first
Verify the ontology configuration matches your data structure

Additional Resources

Next Steps

Explore FalkorDB Cypher Language for querying your graph
Learn about FalkorDB Operations for production deployments
Check out FalkorDB Integration options

Frequently Asked Questions 5

How does the RDF to FalkorDB migration handle RDF triples?

RDF subject-predicate-object triples are mapped to FalkorDB nodes and relationships. Property assignments become node properties, and relationship predicates become edges between nodes.

What RDF formats are supported?

The migration tool primarily supports TTL (Turtle) format. Place your .ttl files in the import directory and run the extraction script.

Should I use the Python loader or the Rust loader?

Use the Python loader for small to medium datasets. For large datasets (millions of triples), use the FalkorDB Rust Loader for significantly better performance with async operations and streaming.

What does URI shortening do?

URI shortening converts long RDF URIs to shorter, more readable representations. This reduces storage requirements, improves query readability, and simplifies Cypher queries. Use --shorten-uris to enable it.

Can I customize how RDF types map to FalkorDB labels?

Yes. After extracting the ontology, review and edit the generated JSON configuration file to customize how RDF types, predicates, and properties map to FalkorDB node labels, edge types, and properties.