Statistical Cluster Characterization

Introduction
(click here to hide)

Cluster characterization is done by considering the p-values for each cluster- attribute pair, assuming their linked nodes are randomly distributed. The FDR and Bonferroni correction for multiple test comparison are used to correct the p-value in order to take into account the fact that the null random hypothesis is tested for all the links of the adjacency network and not just once.

The details of the methodology are presented in the following paper:

M. Tumminello, S. Miccichè, F. Lillo, J. Varho, J. Piilo, R. N. Mantegna, Community characterization of heterogeneous complex systems, J. Stat. Mech., P01019, (2011)


README

This program can be used after a community characterization algorithm has been applied to a network and a certain community structure has therefore been obtained. The program takes as an input (i) a file that associates to each node its cluster in Pajek format (for an example, please see file pra-qi-bonf.clu below) and (ii) a file of attributes associated to each node (for an example, please see file pra-qi-pacs.tag below). The attribute file should be formatted like a Pajek vector file, except each line may contain any number of "double-quoted" attributes.

Sample files: pra-qi-bonf.net pra-qi-bonf.clu pra-qi-pacs.tag


Program

Submit a network in Pajek .net format, a cluster file in Pajek .clu format and an attribute file to find over-expressed attributes in each cluster.

Submit a Network

The source codes were written by Jan Varho. These web pages are maintained by Marco Cipolla.