Statistical Cluster Characterization
(click here to hide)
Cluster characterization is done by considering the p-values for each cluster-
attribute pair, assuming their linked nodes are randomly distributed. The FDR and Bonferroni correction for multiple test comparison
are used to correct the p-value in order to take into account the fact that the null random
hypothesis is tested for all the links of the adjacency network and not just once.
The details of the methodology are presented in the following paper:
M. Tumminello, S. Miccichè, F. Lillo, J. Varho, J. Piilo, R. N. Mantegna,
Community characterization of heterogeneous complex systems,
J. Stat. Mech., P01019, (2011)
This program can be used after a community characterization algorithm has been applied to a network
and a certain community structure has therefore
been obtained. The program takes as an input (i)
a file that associates to each node its cluster
in Pajek format (for an example, please see file pra-qi-bonf.clu
below) and (ii)
a file of attributes
associated to each node (for an example, please see file pra-qi-pacs.tag
below). The attribute file should be formatted like a Pajek vector file, except each
line may contain any number of "double-quoted" attributes.
Submit a network in Pajek .net format, a cluster file in Pajek
.clu format and an attribute file to find over-expressed
attributes in each cluster.
Submit a Network