sim.jaccard
Description
Computes the Jaccard similarity coefficient between two sets (lists). The Jaccard index measures the similarity between two sets by dividing the size of their intersection by the size of their union. It returns a value between 0 (no similarity) and 1 (identical sets).
Syntax
flex.sim.jaccard(list1, list2)
Parameters
| Parameter | Type | Required | Description | |———–|——|———-|————-| | list1 | list | Yes | The first list to compare | | list2 | list | Yes | The second list to compare |
Returns
Type: number (float)
A value between 0 and 1 representing the Jaccard similarity coefficient:
1.0indicates identical sets0.0indicates no common elements- Returns
nullfor invalid inputs
Examples
Example 1: Basic Set Similarity
// Compare two tag lists
RETURN flex.sim.jaccard(['tag1', 'tag2', 'tag3'], ['tag2', 'tag3', 'tag4']) AS similarity
Output:
similarity
----------
0.5
(2 common elements / 4 total unique elements = 0.5)
Example 2: Finding Similar Documents by Tags
// Find documents with similar tags to a reference document
MATCH (ref:Document {id: 'doc123'})
MATCH (other:Document)
WHERE other.id <> ref.id
WITH ref, other, flex.sim.jaccard(ref.tags, other.tags) AS similarity
WHERE similarity > 0.3
RETURN other.title, similarity
ORDER BY similarity DESC
LIMIT 10
Example 3: User Interest Matching
// Find users with similar interests
MATCH (u1:User {id: $userId})
MATCH (u2:User)
WHERE u1 <> u2
WITH u1, u2, flex.sim.jaccard(u1.interests, u2.interests) AS match_score
WHERE match_score > 0.5
RETURN u2.name, u2.interests, match_score
ORDER BY match_score DESC
Notes
- Treats input lists as sets (duplicates within each list don’t affect the result)
- Returns
nullif either input is not an array - Order of elements doesn’t matter
- Works with any comparable data types (strings, numbers, etc.)
- Ideal for comparing categorical attributes, tags, or interest lists
See Also
- text.levenshtein - Edit distance for string comparison
- coll.intersection - Get common elements between sets
- coll.union - Combine sets