Community Detection using Label Propagation (CDLP)
Overview
The Community Detection using Label Propagation (CDLP) algorithm identifies communities in networks by propagating labels through the graph structure. Each node starts with a unique label, and through iterative propagation, nodes adopt the most frequent label among their neighbors, naturally forming communities where densely connected nodes share the same label.
CDLP serves as a powerful algorithm in scenarios such as:
- Social network community detection
- Biological network module identification
- Web page clustering and topic detection
- Market segmentation analysis
- Fraud detection networks
Algorithm Details
CDLP initializes by assigning each node a unique label (typically its node ID). The algorithm then iteratively updates each node’s label to the most frequent label among its neighbors. During each iteration, nodes are processed in random order to avoid deterministic bias. The algorithm continues until labels stabilize (no changes occur) or a maximum number of iterations is reached. The final labels represent community assignments, where nodes sharing the same label belong to the same community.
The algorithm’s strength lies in its ability to discover communities without requiring prior knowledge of the number of communities or their sizes. It runs in near-linear time and mimics epidemic contagion by spreading labels through the network.
Performance
CDLP operates with a time complexity of O(m + n) per iteration, where:
- n represents the total number of nodes
- m represents the total number of edges
The algorithm typically converges within a few iterations, making it highly efficient for large-scale networks.
Syntax
CALL algo.labelPropagation([config])
Parameters
The procedure accepts an optional configuration Map with the following parameters:
| Name | Type | Default | Description |
|---|---|---|---|
nodeLabels |
Array | All labels | Array of node labels to filter which nodes are included in the computation |
relationshipTypes |
Array | All relationship types | Array of relationship types to define which edges are traversed |
maxIterations |
Integer | 10 | Maximum number of iterations to run the algorithm |
Return Values
The procedure returns a stream of records with the following fields:
| Name | Type | Description |
|---|---|---|
node |
Node | The node entity included in the community |
communityId |
Integer | Identifier of the community the node belongs to |
Examples
Let’s take this Social Network as an example:
(Alice)---(Bob)---(Charlie) (Kate)
| | |
(Diana) | (Eve)---(Frank)
| | | |
(Grace)--(Henry) (Iris)--(Jack)
There are 3 different communities that should emerge from this network:
- Alice, Bob, Charlie, Diana, Grace, Henry
- Eve, Frank, Iris, Jack
- Any isolated nodes
Create the Graph
CREATE
(alice:Person {name: 'Alice'}),
(bob:Person {name: 'Bob'}),
(charlie:Person {name: 'Charlie'}),
(diana:Person {name: 'Diana'}),
(eve:Person {name: 'Eve'}),
(frank:Person {name: 'Frank'}),
(grace:Person {name: 'Grace'}),
(henry:Person {name: 'Henry'}),
(iris:Person {name: 'Iris'}),
(jack:Person {name: 'Jack'}),
(kate:Person {name: 'Kate'}),
(alice)-[:KNOWS]->(bob),
(bob)-[:KNOWS]->(charlie),
(alice)-[:KNOWS]->(diana),
(bob)-[:KNOWS]->(henry),
(diana)-[:KNOWS]->(grace),
(grace)-[:KNOWS]->(henry),
(charlie)-[:KNOWS]->(eve),
(eve)-[:KNOWS]->(frank),
(eve)-[:KNOWS]->(iris),
(frank)-[:KNOWS]->(jack),
(iris)-[:KNOWS]->(jack)
Example: Detect all communities in the network
CALL algo.labelPropagation() YIELD node, communityId RETURN node.name AS name, communityId ORDER BY communityId, name
Expected Results
| name | communityId |
|————|————-|
| Alice | 0 |
| Bob | 0 |
| Charlie | 0 |
| Diana | 0 |
| Grace | 0 |
| Henry | 0 |
| Eve | 2 |
| Frank | 2 |
| Iris | 2 |
| Jack | 2 |
| Kate | 10 |
Example: Detect communities with limited iterations
CALL algo.labelPropagation({maxIterations: 5}) YIELD node, communityId
Example: Focus on specific node types
CALL algo.labelPropagation({nodeLabels: ['Person']}) YIELD node, communityId
Example: Use only certain relationship types
CALL algo.labelPropagation({relationshipTypes: ['KNOWS', 'FRIENDS_WITH']}) YIELD node, communityId
Example: Combine node and relationship filtering
CALL algo.labelPropagation({
nodeLabels: ['Person'],
relationshipTypes: ['KNOWS']
}) YIELD node, communityId
Example: Group communities together
CALL algo.labelPropagation() YIELD node, communityId
RETURN collect(node.name) AS community_members, communityId, count(*) AS community_size
ORDER BY community_size DESC
Expected Results
| community_members | communityId | community_size |
|———————————————————-|————-|—————-|
| ["Alice", "Bob", "Charlie", "Diana", "Grace", "Henry"] | 0 | 6 |
| ["Eve", "Frank", "Iris", "Jack"] | 2 | 4 |
| ["Kate"] | 10 | 1 |
Example: Find the largest communities
CALL algo.labelPropagation() YIELD node, communityId
RETURN communityId, collect(node) AS nodes, count(*) AS size
ORDER BY size DESC
LIMIT 1
Frequently Asked Questions 5
What is the syntax for running CDLP?
Use CALL algo.labelPropagation() YIELD node, communityId. Optionally pass a configuration map with nodeLabels, relationshipTypes, or maxIterations.
How does CDLP differ from WCC?
WCC finds disconnected components (nodes unreachable from each other). CDLP detects densely connected communities within a connected graph by propagating labels iteratively.
Do I need to specify the number of communities in advance?
No. CDLP automatically discovers communities without requiring prior knowledge of how many communities exist or their sizes.
What does maxIterations control?
It sets the maximum number of label propagation rounds. The default is 10. The algorithm may converge earlier if labels stabilize. Increase it for very large or complex graphs.
Are CDLP results deterministic?
Results may vary slightly between runs because nodes are processed in random order during each iteration. The overall community structure should remain consistent for well-defined clusters.