LOAD CSV
LOAD CSV FROM 'file://actors.csv' AS row
MERGE (a:Actor {name: row[0]})
LOAD CSV FROM
accepts a string path to a CSV file, the file is parsed line by line, the current line is accessible through the variable specified by AS
. Each parsed value is treated as a string
, use the right conversion functions e.g. toInteger
to cast a value to its appropriate type.
Additional clauses can follow and accesses the row
variable
IMPORTING DATA
Importing local files
FalkorDB defines a data directory see configuration Under which local CSV files should be stored. All file://
URIs are resolved relative to that directory.
In the following example we’ll load the actors.csv
file into FalkorDB.
actors.csv
Lee Pace | 1979 |
Vin Diesel | 1967 |
Chris Pratt | 1979 |
Zoe Saldana | 1978 |
LOAD CSV FROM 'file://actors.csv'
AS row
MERGE (a:Actor {name: row[0], birth_year: toInteger(row[1])})
RETURN a.name, a.birth_year
Note that we’ve used indices e.g. row[0]
to access the value at the corresponding column.
In case the CSV contains a header row e.g.
actors.csv
name | birthyear |
---|---|
Lee Pace | 1979 |
Vin Diesel | 1967 |
Chris Pratt | 1979 |
Zoe Saldana | 1978 |
Then we should use the WITH HEADERS
variation of the LOAD CSV
clause
LOAD CSV WITH HEADERS FROM 'file://actors.csv'
AS row
MERGE (a:Actor {name: row[name], birth_year: toInteger(row[birthyear])})
RETURN a.name, a.birth_year
Note when a header row exists and WITH HEADER
is specified the row
variable is no longer an array
but rather a map
, accessing the individual elements is done via their column name.
Importing data from multiple CSVs
Building on our previous example we’ll introduce a second csv file acted_in.csv
which ties actors to movies they’ve acted in
acted_in.csv
actor | movie |
---|---|
Lee Pace | The Fall |
Vin Diesel | Fast & Furious |
Chris Pratt | Passengers |
Zoe Saldana | Avatar |
We’ll create a new graph connecting actors to the movies they’ve acted in
Load actors:
LOAD CSV WITH HEADER FROM 'file://actors.csv'
AS row
MERGE (a:Actor {name:row['name']})
Load movies and create ACTED_IN
relations:
LOAD CSV WITH HEADER FROM 'file://acted_in.csv'
AS row
MATCH (a:Actor {name: row['actor']})
MERGE (m:Movie {title: row['movie']})
MERGE (a)-[:ACTED_IN]->(m)
Importing remote files
FalkorDB supports importing remote CSVs via HTTPS. Here’s an example loading the bigmac data-set from calmcode.io:
LOAD CSV WITH HEADERS FROM 'https://calmcode.io/static/data/bigmac.csv' AS row
RETURN row LIMIT 4
1) 1) "ROW"
2) 1) 1) "{date: 2002-04-01, currency_code: PHP, name: Philippines, local_price: 65.0, dollar_ex: 51.0, dollar_price: 1.27450980392157}"
2) 1) "{date: 2002-04-01, currency_code: PEN, name: Peru, local_price: 8.5, dollar_ex: 3.43, dollar_price: 2.47813411078717}"
3) 1) "{date: 2002-04-01, currency_code: NZD, name: New Zealand, local_price: 3.6, dollar_ex: 2.24, dollar_price: 1.60714285714286}"
4) 1) "{date: 2002-04-01, currency_code: NOK, name: Norway, local_price: 35.0, dollar_ex: 8.56, dollar_price: 4.088785046728971}"
Dealing with a large number of columns or missing entries
Loading data from CSV files that miss entries may cause complications. We’ve solved this (and made it useful for cases involving loading a large number of columns) with the following approach:
Assuming this is the CSV file we’re loading:
missing_entries.csv
name | birthyear |
---|---|
Lee Pace | 1979 |
Vin Diesel | |
Chris Pratt | |
Zoe Saldana | 1978 |
Note: both Vin Diesel and Chris Pratt are missing their birthyear entry
When creating Actor nodes, there is no need to explicitly define each column as done previously. The following query creates an empty Actor node and assigns the current CSV row to it. This process automatically sets the node’s attribute set to match the values of the current row:
LOAD CSV FROM 'file://missing_entries.csv' AS row
CREATE (a:Actor)
SET a = row
RETURN a
1) 1) "a"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Zoe Saldana"
2) 1) "birthyear"
2) "1978"
2) 1) 1) 1) "id"
2) (integer) 1
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Chris Pratt"
3) 1) 1) 1) "id"
2) (integer) 2
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Vin Diesel"
4) 1) 1) 1) "id"
2) (integer) 3
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Lee Pace"
2) 1) "birthyear"
2) "1979"