The Use of Sparsest Cuts to Reveal the Hierarchical Community Structure of Social Networks

by
Charles F. Mann,
David W. Matula,
and
Eli V. Olinick

This paper appears Social Networks Vol. 30, No. 3, July 2008, pp. 223-234.



Abstract
We show that good community structures can be obtained by partitioning a social network in a succession of divisive sparsest cuts. A network flow algorithm based on fundamental principles of graph theory is introduced to identify the sparsest cuts and an underlying hierarchical community structure of the network via maximum concurrent flow. Matula (1985) established the Maximum Concurrent Flow Problem (MCFP), and papers on divisive vs. agglomerative average linkage hierarchical clustering (e.g., Matula 1983, Matula 1986, and Thompson 1985) provide the basis for partitioning a social network by way of sparsest cuts and/or maximum concurrent flow. The MCFP extends the Ford-Fulkerson (1956 and 1962) source-sink max-flow problem to all-pairs maximum concurrent flow. The density of an (S, T)-cut is |(S, T)| / ( | S | • | T | ) where |(S, T)| is the number of edges (equivalently, links or bonds) between communities S and T with | S | • | T | being the maximum number of edges possible. The minimum density cut in the network is the sparsest cut. When the edges are weighted, the density is the average weight of the (S, T)-cut edges, with absent edges treated as edges of zero weight. Sparsest cuts (i.e., minimum density) of the remaining components are iteratively determined via linear programming until a multipartite cut is identified that is more constraining to concurrent flow than any sparsest cut. Empirical results on real-world networks with known hierarchical structures and random networks with embedded communities are used to compare this sparsest cut community structure criterion to the weighted Girvan-Newman community structure criterion (e.g., Newman 2004) that is based on edge betweenness centrality. A new measure of accuracy M is defined to evaluate the results of the graph partitionings found by these criteria.


Code
Our code for the algorithm described in the paper is implemented in A Modeling Language for Mathematical Programming (AMPL) and uses the CPLEX linear programming (LP) solver to solve the MCFP.

The AMPL run file script.txt gives the sequence of AMPL commands to run our algorithm on the Florentine families network shown in Figure 1 of the paper. To use the run file on a system with AMPL and CPLEX (or another LP solver compatible with AMPL) download the following files:

  1. Model file for the LP formulation of the Maximum Concurrent Flow problem: MCFModelFile.txt
  2. Commands file for the MCF cut community structure algorithm: MCFCutAlgorithm.txt
  3. Data file for the Florentine families network: FlorentineFamilies.txt
If you don't have AMPL and/or a compatible LP solver installed on your system, you can use our code by downloading the three files listed above and then following these instructions for running the code on the NEOS server.

Questions about running the code should be addressed to Eli Olinick.



Data Files
Follow our data file format to create new data files. Data files for example networks in the paper are given below:


Program Output
The program output is illustrated using the output for the Florentine families network.