To compute the scoring functions of a graph and its clusters, we call scoring_functions. Here we try it for the well known Zachary’s karate club graph, using the faction memberships as clusters. type=global gives us the scores of each cluster, while type=global gives us the weighted mean of the local scores, plus some additional global scores
data(karate, package="igraphdata")
scoring_functions(karate, V(karate)$Faction, type="local")
#> This graph was created by an old(er) igraph version.
#> ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
#> For now we convert it on the fly...
#> size internal density edges inside av degree FOMD expansion cut ratio
#> 1 16 0.8250000 99 6.187500 0.5000000 1.375000 0.07638889
#> 2 18 0.7189542 110 6.111111 0.3333333 1.222222 0.07638889
#> conductance norm cut max ODF average ODF flake ODF density ratio
#> 1 0.10000000 0.1909091 0.3636364 0.05651941 0 0.9074074
#> 2 0.09090909 0.1909091 0.4117647 0.09140548 0 0.8937500
#> modularity
#> 1 NA
#> 2 NA
scoring_functions(karate, V(karate)$Faction, type="global")
#> size internal density edges inside av degree FOMD expansion
#> [1,] 17.05882 0.7688581 104.8235 6.147059 0.4117647 1.294118
#> cut ratio conductance norm cut max ODF average ODF flake ODF
#> [1,] 0.07638889 0.09518717 0.1909091 0.3891161 0.07498851 0
#> density ratio modularity graph_order n_clusters mean_cluster_size
#> [1,] 0.900177 0.3714661 34 2 17
#> coverage global density ratio
#> [1,] 0.9047619 0.8004386
Additionally, each of the scores is available individually as its own function, grouped together in the package as the cluster scoring functions family. They are simply called as follows, with the graph and the membership vector as arguments, and return a vector with the scores for each community:
To showcase the randomization process, we apply it to the Zachary’s karate club graph, with the default settings (positive weights with no upper bound, which suits this graph):
data(karate, package="igraphdata")
E(karate)
#> This graph was created by an old(er) igraph version.
#> ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
#> For now we convert it on the fly...
#> + 78/78 edges from 4b458a1 (vertex names):
#> [1] Mr Hi --Actor 2 Mr Hi --Actor 3 Mr Hi --Actor 4 Mr Hi --Actor 5
#> [5] Mr Hi --Actor 6 Mr Hi --Actor 7 Mr Hi --Actor 8 Mr Hi --Actor 9
#> [9] Mr Hi --Actor 11 Mr Hi --Actor 12 Mr Hi --Actor 13 Mr Hi --Actor 14
#> [13] Mr Hi --Actor 18 Mr Hi --Actor 20 Mr Hi --Actor 22 Mr Hi --Actor 32
#> [17] Actor 2--Actor 3 Actor 2--Actor 4 Actor 2--Actor 8 Actor 2--Actor 14
#> [21] Actor 2--Actor 18 Actor 2--Actor 20 Actor 2--Actor 22 Actor 2--Actor 31
#> [25] Actor 3--Actor 4 Actor 3--Actor 8 Actor 3--Actor 9 Actor 3--Actor 10
#> [29] Actor 3--Actor 14 Actor 3--Actor 28 Actor 3--Actor 29 Actor 3--Actor 33
#> [33] Actor 4--Actor 8 Actor 4--Actor 13 Actor 4--Actor 14 Actor 5--Actor 7
#> [37] Actor 5--Actor 11 Actor 6--Actor 7 Actor 6--Actor 11 Actor 6--Actor 17
#> + ... omitted several edges
rewired_karate <- rewireCpp(karate, weight_sel = "max_weight")
E(rewired_karate)
#> + 125/125 edges from 7015654 (vertex names):
#> [1] Actor 11--Actor 33 Actor 14--Actor 26 Actor 5 --Actor 10 Actor 20--Actor 31
#> [5] Actor 4 --Actor 9 Actor 3 --Actor 28 Actor 12--Actor 28 Actor 30--Actor 33
#> [9] Actor 25--John A Actor 11--Actor 21 Actor 14--Actor 27 Actor 2 --Actor 28
#> [13] Actor 2 --Actor 31 Actor 6 --Actor 28 Actor 5 --Actor 21 Actor 6 --Actor 7
#> [17] Actor 7 --Actor 26 Actor 3 --Actor 32 Actor 11--Actor 23 Actor 19--Actor 30
#> [21] Actor 17--Actor 29 Actor 13--Actor 24 Actor 31--Actor 33 Actor 15--Actor 29
#> [25] Actor 3 --John A Actor 20--Actor 31 Actor 3 --Actor 33 Actor 28--Actor 32
#> [29] Actor 22--Actor 23 Actor 30--Actor 32 Actor 16--Actor 30 Actor 26--Actor 28
#> [33] Actor 3 --Actor 24 Actor 8 --Actor 29 Actor 9 --John A Actor 7 --Actor 24
#> [37] Actor 10--Actor 33 Actor 12--Actor 29 Actor 14--Actor 28 Actor 2 --Actor 11
#> + ... omitted several edges
If the graph is directed, the rewireCpp function automatically detects it and internally runs the implementation for directed graphs. The following example is a food network (where edges indicate predator-prey relationships) from the igraphdata package:
data(foodwebs, package="igraphdata")
rewired_ChesLower <- rewireCpp(foodwebs$ChesLower, weight_sel = "max_weight")
#> This graph was created by an old(er) igraph version.
#> ℹ Call `igraph::upgrade_graph()` on it to use with the current igraph version.
#> For now we convert it on the fly...
Now we compute the scoring functions for the karate club graph. By default the clustering algorithms are Louvain, label propagation and Walktrap, but the function can take any list of clustering algorithms for igraph graphs.
# this corresponds to the club each member ended up with after the split,
# which we could consider the ground truth clustering for this graph.
significance_table_karate <- evaluate_significance(karate,
alg_list=list(Louvain=cluster_louvain,
"label prop"= cluster_label_prop,
walktrap=cluster_walktrap),
gt_clustering=V(karate)$Faction)
significance_table_karate
#> Louvain label prop walktrap ground truth
#> size 10.2352941 10.6470588 10.00000000 17.05882353
#> internal density 1.3025641 1.5606746 1.32254902 0.76885813
#> edges inside 52.0588235 59.7058824 51.82352941 104.82352941
#> av degree 4.9411765 5.1470588 5.05882353 6.14705882
#> FOMD 0.2941176 0.2941176 0.26470588 0.41176471
#> expansion 3.7058824 3.2941176 3.47058824 1.29411765
#> cut ratio 0.1508403 0.1383356 0.14540629 0.07638889
#> conductance 0.2699425 0.2543939 0.25484480 0.09518717
#> norm cut 0.3991755 0.3798402 0.38131607 0.19090909
#> max ODF 0.5153612 0.5298443 0.51172969 0.38911607
#> average ODF 0.2241025 0.2090724 0.18493603 0.07498851
#> flake ODF 0.1176471 0.1176471 0.08823529 0.00000000
#> density ratio 0.8724358 0.8844357 0.86846364 0.90017702
#> modularity 0.3904504 0.3825608 0.41116042 0.37146614
#> clustering coef 0.6535989 0.7082762 0.61381521 0.53406848
#> graph_order 34.0000000 34.0000000 34.00000000 34.00000000
#> n_clusters 4.0000000 5.0000000 4.00000000 2.00000000
#> mean_cluster_size 8.5000000 6.8000000 8.50000000 17.00000000
#> coverage 0.7272727 0.7575758 0.74458874 0.90476190
#> global density ratio 0.7085396 0.7356171 0.74273256 0.80043860
#> VIdist_to_GT 0.8537243 1.2105283 0.87293838 0.00000000
With the function evaluate_significance_r
we compute the
scoring functions as above, and we compare the results to those of a
distribution of randomized graphs.
significance_table_karate_r <- evaluate_significance_r(karate,
alg_list=list(Louvain=cluster_louvain,
"label prop"= cluster_label_prop,
walktrap=cluster_walktrap),
gt_clustering=V(karate)$Faction,
weight_sel="max_weight",
n_reps=10)
Now we generate a graph from a stochastic block model in which we set very strong clusters (the elements in the diagonal of the matrix are much larger than the rest, so the probability of intra-cluster edges is much higher than that of inter-cluster edges).
pm <- matrix (c(.3, .001, .001, .003,
.001, .2, .005, .002,
.001, .005, .2, .001,
.003, .002, .001, .3), nrow=4, ncol=4)
g_sbm <- sample_sbm(100, pref.matrix=pm, block.sizes=c(25,25,25,25))
E(g_sbm)$weight <- 1
significance_table_sbm <- evaluate_significance(g_sbm)
significance_table_sbm
#> Louvain label prop walktrap
#> size 2.502000e+01 23.76000000 25.00000000
#> internal density 2.520710e-01 0.27396667 0.25166667
#> edges inside 7.543000e+01 73.33000000 75.50000000
#> av degree 3.020000e+00 3.00000000 3.02000000
#> FOMD 4.600000e-01 0.45000000 0.45000000
#> expansion 2.400000e-01 0.28000000 0.24000000
#> cut ratio 3.196871e-03 0.00359987 0.00320000
#> conductance 4.513497e-02 0.05908052 0.04522613
#> norm cut 5.763449e-02 0.07181492 0.05777475
#> max ODF 3.224286e-01 0.34642857 0.34642857
#> average ODF 4.548088e-02 0.05756421 0.04473088
#> flake ODF 0.000000e+00 0.00000000 0.00000000
#> density ratio 9.845293e-01 0.98457668 0.98450012
#> modularity 6.975283e-01 0.69433851 0.69734573
#> clustering coef 2.348967e-01 0.23550981 0.23633871
#> graph_order 1.000000e+02 100.00000000 100.00000000
#> n_clusters 4.000000e+00 5.00000000 4.00000000
#> mean_cluster_size 2.500000e+01 20.00000000 25.00000000
#> coverage 9.617834e-01 0.95541401 0.96178344
#> global density ratio 9.745416e-01 0.97213711 0.97456954
And as an example of usage for a complete weighted graphs with weights bounded between 0 and 1, we have a graph built from correlations of currency exchange rate returns, in particular from January 2009 with the 13 most traded currencies. In this case we have to set w_max to 1 to keep the upper bound when rewiring the edges.
Here we perform a nonparametric bootstrap to the karate club graph and the same selection of algorithms. For each instance, the set of vertices is resampled, the induced graph is obtained by taking the new set of vertices with the induced edges from the original graph, and the clustering algorithms are applied. Then, these results are compared to the induced original clusterings using several metrics: the variation of information (VI), normalized reduced mutual information (NRMI) and both adjusted and regular Rand index (Rand and adRand):
b_karate <- boot_alg_list(g=karate, return_data=FALSE, R=99)
b_karate
#> Louvain label prop walktrap
#> VI 0.2491793 0.3216292 0.2513669
#> AMI 0.6584134 0.4866224 0.6375407
#> NRMI 0.6358807 0.4236883 0.6694808
#> Rand 0.8630692 0.7627073 0.8604404
#> AdRand 0.6636540 0.5192900 0.6629174
#> n_clusters 5.4141414 4.6666667 5.7575758
And the same for the stochastic block model graph:
b_sbm <- boot_alg_list(g=g_sbm, return_data=FALSE, R=99)
b_sbm
#> Louvain label prop walktrap
#> VI 0.1227883 0.1792560 0.1118923
#> AMI 0.8073225 0.7141275 0.8186548
#> NRMI 0.8369593 0.7856362 0.8583335
#> Rand 0.9453097 0.9190613 0.9501806
#> AdRand 0.8467059 0.7632400 0.8603283
#> n_clusters 6.8585859 8.8686869 7.0909091
We can clearly see that for all metrics, the results are much more stable, which makes sense because we created the sbm graph with very strong clusters.