February 24, 2015

Introduction

Who cares and why did we make this?

Networks

  • Many relationships easily conceptualized as a graph/network
  • A graph is defined as a collection of nodes (entities) and edges (relationships)
  • A community is defined as a group of nodes in a graph that share properties
  • Examples of such relationships include:
    • social networks (sociology)
    • the world wide web (computer science)
    • protein networks (biology)

The Problem

  • Current methodology for community detection often involves an algorithmic approach; partitions a graph into node clusters iteratively before stopping criterion
  • First define an objective function and then optimize
  • Many different objective functions possible, providing many ways to split a graph
  • The optimization of an objective function is typically an NP-hard problem
  • The number of possible partitions of the network requires \(2^n\) complexity

Current Solutions

  • Heuristics used to optimize the objective function in a reasonable amount of time
  • Heuristic-based clustering is useful because this offers an automated way to perform community detection

The main elements of the problem themselves [graph clustering], i.e. the concepts of community and partition, are not rigorously defined, and require some degree of arbitrariness and/or common sense. (Fortunato, 2010)

  • Heuristics are not the solution

Leverage the Human Visual System

  • Communities are often fuzzily-defined human concepts
  • Address this by adding element of human judgement
  • Introduce a novel visualization-based community detection tool, gravicom

gravicom Functionality

  • Allows users to
    • Visually direct and interact with the steps of community detection
    • Assess the created clusters through visual and quantitative summaries
  • Standalone exploratory tool for graph data
  • Initial state to be passed to a community detection algorithm in order reduce the complexity of optimization

Demo

http://andeek.shinyapps.io/gravicom

  • Shiny: Server-client interaction
  • D3: User interface and graph layout
  • igraph/rjson: Data formatting

Football Example

Football Example (Cont'd)

Conference Teams Identified Proportion Accuracy
SEC Vanderbilt, Florida, Louisiana State, South Carolina, Mississippi, Arkansas, Auburn, Kentucky, Georgia, Mississippi State, Alabama, Tennessee 1.50 100%
MAC Central Florida, Western Michigan, Miami Ohio, Ohio, Bowling Green State, Marshall, Ball State, Akron, Buffalo, Northern Illinois, Eastern Michigan, Toledo, Central Michigan, Kent 1.46 92.9%
Big 12 Kansas State, Iowa State, Kansas, Texas A& M, Texas Tech, Baylor, Missouri, Texas, Oklahoma State, Colorado, Oklahoma, Nebraska 1.44 100%
ACC Duke, Wake Forest, Virginia, Florida State, Clemson, North Carolina, Maryland, Georgia Tech, North Carolina State 1.44 100%
Pac-10 Arizona, Oregon State, Washington, Washington State, Arizona State, UC LA, Stanford, Southern California, Oregon, California 1.33 100%
Big 10 Ohio State, Penn State, Michigan, Michigan State, Purdue, Minnesota, Northwestern, Illinois, Iowa, Wisconsin, Indiana 1.22 100%
WAC Nevada, Fresno State, Texas Christian, Tulsa, Hawaii, Rice, Southern Methodist, San Jose State, Texas El Paso 1.20 88.9%
Mountain West Brigham Young, San Diego State, Boise State, Wyoming, New Mexico, Nevada Las Vegas, Utah, North Texas, Utah State, New Mexico State, Colorado State, Arkansas State, Idaho, Air Force 0.96 57.1%
C-USA Cincinnati, Louisville, Houston, Tulane, Southern Mississippi, Army, Memphis, East Carolina, Alabama Birmingham 0.91 100%
Big East Boston College, Miami Florida, Virginia Tech, Syracuse, Temple, West Virginia, Connecticut, Pittsburgh, Rutgers 0.83 88.9%
Big West Middle Tennessee State, Louisiana Lafayette, Louisiana Monroe, Louisiana Tech 0.26 75%
Independent Notre Dame, Notre Dame, Navy, Navy 0.00 100%
  • Through manual specification of conferences, we were able to correctly classify 91.3% of the football teams into their conferences

Graphical Devices

Theory behind the curtain

Importance of Graph Layout

  • Graph layout is an assignment of a Cartesian coordinate to each node for display in 2D or 3D
  • Layout of a graph significantly affects the number of communities that users detect within a graph
  • Humans used to detect communities \(\Rightarrow\) special attention needs to be paid to the layout being used
  • Location of a node spatially relative to other nodes in a cluster has a significant effect on user: "adjacent nodes must be placed near to each other if possible" (McGrath, Blythe, and Krackhardt, 1996)
  • A force-directed layout satisfies these requirements by placing repulsive forces on nodes to separate all pairs of nodes with fixed-distance geometric constraints as links

Graph Simplification

  • Difficult to glean meaning from a visual representation in large/complex graphs
  • Replace repeated patterns in a graph by a representation, to simplify a network
  • Fewer nodes and edges to display \(\Rightarrow\) visual complexity of the graph visualization is greatly reduced
  • Allows the user to analyze the network structure more accurately

Conclusions/Further Work

Possible extensions to gravicom

Ideas

  • Integrated algorithmic community detection
    • Combine the benefits of human detection of communities with algorithmic detection
    • Visual detection of communities serves as an initialization step, then pass to iterative algorithm
    • User tracks progress and has power to dynamically set stopping criterion
  • Dynamic temporal graph visualization
    • View a dynamic graph across time – how the edges change between nodes
    • Detect time-dependent communities; Add optional node labels to track progress

Questions?

Thank you!

Appendix

Extra stuff and things.

Page Lifecycle

Tools (Cont'd)

Data Formatting

GML file structure:

## graph
## [
##   directed 0
##   node
##   [
##     id 0
##     label "Node 1"
##     value 100
##   ]
##   node
##   [
##     id 1
##     label "Node 2"
##     value 200
##   ]
##   edge
##   [
##     source 1
##     target 0
##   ]
## ]

JSON file structure:

## {
##   "nodes":
##   [{"id":"n0","v_id":"0","v_label":"Node 1","v_value":"100"}, 
##    {"id":"n1","v_id":"1","v_label":"Node 2","v_value":"200"}], 
##  "edges":
##   [{"source":0, "target":1}]
## }

References

Bostock, Michael, Vadim Ogievetsky, and Jeffrey Heer. 2011. “D3: Data-Driven Documents.” IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis). http://vis.stanford.edu/files/2011-D3-InfoVis.pdf.

Couture-Beil, Alex. 2013. Rjson: JSON for R. http://CRAN.R-project.org/package=rjson.

Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.sf.net.

Fortunato, Santo. 2010. “Community Detection in Graphs.” Physics Reports 486 (3). Elsevier: 75–174.

Girvan, M., and M. E. J. Newman. 2002. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences 99 (12): 7821–26. doi:10.1073/pnas.122653799.

McGrath, Cathleen, Jim Blythe, and David Krackhardt. 1996. “Seeing Groups in Graph Layouts1.” Connections 19 (2): 22–29.

RStudio Inc. 2013. Shiny: Web Application Framework for R. http://CRAN.R-project.org/package=shiny.