RDF to FalkorDB Migration
A comprehensive multi-step process to migrate data from RDF (TTL) based graph data stores into FalkorDB property graph database.
Overview
RDF data files contain triplets of subject-predicate-object which can specify property assignment to an entity or a relationship assignment to another entity. This migration tool bridges the gap between RDF and FalkorDB by:
- Extracting the schema from RDF data files
- Generating a configurable JSON configuration file
- Exporting nodes and edges to CSV files formatted for FalkorDB
- Loading the CSV files into FalkorDB
The process ensures complete data migration including all entities, relationships, properties, and metadata.
Features
- Schema Extraction: Automatically extracts ontology from RDF/TTL files
- Configurable Mapping: JSON-based configuration for customizing data mapping
- URI Shortening: Optional conversion of long URIs to shorter representations
- CSV Export: Generates properly formatted CSV files for nodes and edges
- Flexible Loading: Multiple loading options with batch processing support
- Data Preview: Optional CSV output preview during export
Prerequisites
- Python 3.6+
- Required Python packages (see requirements.txt in the repository)
- FalkorDB instance (local, Docker, or Cloud)
- RDF/TTL data files
Installation
- Clone the migration repository:
git clone https://github.com/FalkorDB-POCs/rdf-to-falkordb.git
cd rdf-to-falkordb
- Install required dependencies:
pip3 install -r requirements.txt
Step 1: Locating Your RDF (TTL) Data File
Place your TTL file(s) in the import directory. The repository includes several sample files that you can use for testing.
Step 2: Extracting & Reviewing Ontology
In this step, you will extract the ontology (schema) from the TTL file. The rdf_to_csv_extractor.py
script handles both ontology extraction and data export.
Basic Ontology Extraction
python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> --extract-ontology config/<your-config.json>
Command Line Options
Option | Description | Required | Default |
---|---|---|---|
ttl_file | Path to the TTL/RDF file to process | Yes | - |
--config | Path to ontology configuration JSON file | No | - |
--extract-ontology | Extract ontology and save to specified file | No | extracted_ontology_config.json |
--output-dir | Output directory for CSV files | No | csv_output |
--shorten-uris | Convert long URIs to shorter representations | No | False |
--uri-prefixes | JSON file with custom URI prefix mappings | No | - |
--csv-output-peek | Show preview (first 3 lines) of generated CSV files | No | False |
Review Configuration
After extracting the ontology, review the generated JSON configuration file. You can modify the configuration to customize how RDF triplets are mapped to FalkorDB nodes and relationships.
Step 3: Extracting Data from TTL File
Once you have reviewed and optionally customized the ontology configuration, export the data to CSV files.
Basic Data Export
python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> --config config/<your-config.json> --output-dir <your-project-subfolder>
Advanced Export with URI Shortening
python3 rdf_to_csv_extractor.py import/<rdf-ttl-file.ttl> \
--config config/<your-config.json> \
--shorten-uris \
--output-dir <your-project-subfolder> \
--csv-output-peek
Output Structure
The export script generates the following files:
Node CSV Files:
- Format:
nodes_<NodeType>.csv
- Contains: Entity IDs, labels, and properties
- Example:
nodes_InChIkey.csv
,nodes_LCMSFeature.csv
Edge CSV Files:
- Format:
edges_<EdgeType>.csv
- Contains: Source ID, source label, target ID, target label, relationship type
- Example:
edges_HAS_CANOPUS_ANNOTATION.csv
Step 4: Loading Data into FalkorDB
You have two options for loading the exported CSV files into FalkorDB:
Option 1: Using Python Loader (Included in Repository)
The Python loader (falkordb_csv_loader.py
) provides a straightforward way to load CSV files directly into FalkorDB.
Basic Usage
python3 falkordb_csv_loader.py <graph-name> --csv-dir <your-project-subfolder>
Advanced Usage
python3 falkordb_csv_loader.py <graph-name> \
--host localhost \
--port 6379 \
--username myuser \
--password mypass \
--csv-dir <your-project-subfolder> \
--batch-size 5000 \
--merge-mode \
--stats
Command-Line Options
Option | Description | Default |
---|---|---|
graph_name | Target graph name in FalkorDB (required) | - |
--host | FalkorDB host | localhost |
--port | FalkorDB port | 6379 |
--username | FalkorDB username (optional) | - |
--password | FalkorDB password (optional) | - |
--batch-size | Batch size for loading | 5000 |
--stats | Show graph statistics after loading | False |
--csv-dir | Directory containing CSV files | csv_output |
--merge-mode | Use MERGE instead of CREATE for upsert behavior | False |
Example Output
Loading nodes from vgf/nodes_Material.csv...
Read 1 rows from vgf/nodes_Material.csv
[2025-08-03 13:50:21] Batch complete: Loaded 1 nodes (Duration: 0:00:00.000857)
[2025-08-03 13:50:21] ✅ Loaded 1 Material nodes (Duration: 0:00:00.002075)
Loading edges from vgf/edges_HAS_CANOPUS_ANNOTATION.csv...
Read 233 rows from vgf/edges_HAS_CANOPUS_ANNOTATION.csv
[2025-08-03 13:51:55] Batch complete: Loaded 233 edges (Duration: 0:00:00.101047)
[2025-08-03 13:51:55] ✅ Loaded 233 HAS_CANOPUS_ANNOTATION relationships (Duration: 0:00:00.102690)
✅ Successfully loaded data into graph 'VGF141'
Option 2: Using FalkorDB Rust Loader (Recommended for Large Datasets)
For better performance with large datasets, use the FalkorDB Rust Loader.
Installation
git clone https://github.com/FalkorDB/FalkorDB-Loader-RS
cd FalkorDB-Loader-RS
cargo build --release
Basic Usage
./target/release/falkordb-loader my_graph --csv-dir <your-project-subfolder>
Advanced Usage
./target/release/falkordb-loader my_graph \
--host localhost \
--port 6379 \
--username myuser \
--password mypass \
--csv-dir <your-project-subfolder> \
--batch-size 5000 \
--merge-mode \
--stats \
--progress-interval 1000
Performance Features
The Rust loader provides significant advantages:
- Async Operations: All database operations use async/await for better concurrency
- Batch Processing: Processes multiple records per query (configurable batch size)
- Memory Efficient: Streams data from CSV files without loading everything into memory
- Progress Tracking: Real-time progress updates during loading
- Error Handling: Comprehensive error handling with detailed logging
Example Migration Flow
Here’s a complete example migrating a VGF141 dataset:
1. Extract Ontology
python3 rdf_to_csv_extractor.py import/VGF141.ttl --extract-ontology config/vgf141_config.json
2. Review and Export Data
python3 rdf_to_csv_extractor.py import/VGF141.ttl \
--config config/vgf141_config.json \
--shorten-uris \
--output-dir vgf \
--csv-output-peek
3. Load into FalkorDB
python3 falkordb_csv_loader.py VGF141 --csv-dir vgf --merge-mode --stats
Data Mapping
URI Shortening
The tool supports converting long URIs to shorter, more manageable representations. This is particularly useful for:
- Reducing storage requirements
- Improving readability
- Simplifying queries
- Custom prefix mappings via JSON configuration
Property Handling
The migration process preserves:
- Simple scalar properties (strings, numbers, booleans)
- Complex nested values
- Lists and arrays
- Metadata and annotations
Troubleshooting
Common Issues
- Missing Dependencies: Ensure all Python packages from requirements.txt are installed
- File Not Found: Verify the TTL file path is correct and accessible
- Memory Issues: For very large RDF files, consider processing in smaller chunks
- URI Format Issues: Review URI prefix mappings if shortened URIs are not formatted correctly
Debug Tips
- Use
--csv-output-peek
to preview generated CSV files during export - Enable verbose logging by modifying the script
- Test with smaller sample datasets first
- Verify the ontology configuration matches your data structure
Additional Resources
Next Steps
- Explore FalkorDB Cypher Language for querying your graph
- Learn about FalkorDB Operations for production deployments
- Check out FalkorDB Integration options