Motivation

gravicom is a a web application for facilitating the detection of community structures in graphs through direct user interaction. Built on the R package Shiny and the JavaScript library D3. There is no installation necessary to use gravicom, only an internet connection and a modern brower with JavaScript capabilities. Additionally, gravicom is open source software, feel free to send me a pull request.

Graphs

Many relationships are easily conceptualized as a graph/network. A graph is defined as a collection of nodes (entities) and edges (relationships). Examples of such relationships include: social networks (sociology), the world wide web (computer science), and protein networks (biology).

Social network of a karate club. Internet map 1024 - transparent The protein interaction network of Treponema pallidum

Community Detection

A community is defined as a group of nodes in a graph that share properties and can be characterized as having an easility recognizable structure. This community structure is a collection of nodes which are densely linked to nodes within the community and sparsely linked to notes outside the community. Current methodology for community detection often involves an algorithmic approach, which is acheived by partitioning a graph into node clusters iteratively before stopping criterion is met. In this approach, first an objective function must be defined and then optimized.

Typically the search for an optimal objective function value is an NP-hard problem due to the fact that the number of possible partitions of the network requires \(2^n\) complexity. Current solutions to this issue use heuristics to optimize the objective function in a reasonable amount of time. Heuristic-based clustering is useful because this offers an automated way to perform community detection.

The main elements of the problem themselves [graph clustering], i.e. the concepts of community and partition, are not rigorously defined, and require some degree of arbitrariness and/or common sense. (Fortunato, 2010)

Clearly, heuristics are not the solution.

The Human Visual System

Communities are often fuzzily-defined human concepts and the detection of communities is an ill-posed problem. We address this by adding element of human judgement through leveraging the human visual system through used of a novel visualization-based community detection tool, gravicom.

Functionality

gravicom acts as a standalone exploratory tool for graph data or as an initial state to be passed to a community detection algorithm in order reduce the complexity of optimization.

Users are able to

  • Visually direct and interact with the steps of community detection
  • Assess the created clusters through visual and quantitative summaries
  • Save and export graphs
  • Upload their own graphs for community detection and visualization

Interface

The interface of gravicom can be thought of as six main components.

  1. Navbar
  2. Control Panel
  3. Data Management
  4. Connection Table
  5. Graph Display
  6. Tabset

Components making up the user interface of gravicom.

Each part provides a means for the user to interact with gravicom, either through controls that allow user input to gravicom or through direct interaction with diagnostics and visualization of a graph. Their placement on the gravicom interface can be seen above.

Navbar

The top navigation bar includes informational buttons. The first is a link to gravicom. The second is a link to this documentation page. The third is a link to the GitHub repository where the code for gravicom is housed. The final button is a link to my websites, which contain contact information if there are any questions or comments.

Control Panel

The control panel serves as the starting point for a user's session in gravicom. It contains the means for a user to select a data set, save and download a data set, and numerical summaries of the graph (such as a diagnostic connection table explained below).

Data Management

The data management component is made up of two main parts, data selection and data download. The data selection can be accomplished in two ways, the first being a drop down to select pre-loaded data sets to display. Currently there are two well-known network data sets provided in gravicom, a college football data set detailing games played for the 2000 season and a karate club data set detailing social interactions. From the dropdown the user can change the data set to display in the graph. The second approach to data selection gives the user the ability to upload his own data set. Upon clicking the "Upload new data set" checkbox, a file selection control appears which gives the user the ability to upload his own graph data to explore with gravicom.

Uploading new data

The work performed by a user in visualizing graphs and community structure can also be downloaded as a data set from gravicom with current communities stored. This feature can be used as a save point in processing a graph or as a means to export changes made in gravicom to another tool.

Connection Table

The connection table is a quantitative diagnostic tool for the user in assessing the strength of a community structure in a graph. The idea behind the connection table is that a community of nodes will have proportionally more edge connections within the node cluster compared to edge connections to nodes outside the community. The table displays the number of edge connections within a user's selection of nodes in the graph and the number of connections from nodes in a user's selection to nodes not in the selection. The comparison of these two numbers can give the user a rough idea of if the plausibility or extend to which the node selection constitutes a community. To aid in the comparison, there is also a column that displays the ratio of number of connections within a node selection to the number of connections outside the selection.

Graph Display

The graph display shows an interactive graphical representation of the selected (or uploaded) graph data. Upon load, the graph displays all nodes and edges in the data set using a force-directed layout algorithm. The user has several ways to interact with the graph: drag, select, and group.

Interacting with the graph

The user has three ways to interact with the graph: drag, select, and group. A user can drag a node at any time. Upon dragging, the force-directed layout is rerun, giving an altered view of the graph. The image below shows a graph in the process of being dragged.

A graph being dragged in gravicom.

Selecting and Grouping

Selection and grouping of nodes are actions intended to work together. In order to group nodes, a user first determines a node cluster or potential community based on a visual appraisal of the graph. To select nodes the user clicks and drags a selection box around nodes. See the figure below for the results of selection in the interface. The shift key can also be used for multiple selections. Upon selection, the connection table is updated and the user can evaluate the selection as a community and alter the selection if need be (the shift key selection is useful in this step).

Selecting in gravicom.

Selected nodes can be grouped together into one consolidated "super-node" or grouped-node. Once grouped, the size and edges of the super-node represent the number of nodes in the potential community and the number of edges to the grouped community nodes, respectively. The force-directed layout is again run, showing the new graph with previous nodes grouped. This is illustrated in the image below. This process of node grouping can be repeated until all nodes have been grouped or until the user is satisfied that all communities have been selected. Additionally, grouped nodes can be ungrouped by clicking on a grouped node.

A graph being grouped in gravicom

Ungrouping

Additionally, super-nodes can be ungrouped by clicking the grouped supernode. This will remove the contained nodes from a group and allow the user to start searching for communities from the previous point in time.

Your Data

One of the great features of gravicom that allows it to be widely used is the ability of the users to upload their own datasets for visualization and community detection. The process is very simple, with only the requirement that the files be in a specific format.

GML File Structure

GML stands for graph modelling language and is a hierarchical ASCII-based file format for describing graphs. The basic structure is displayed below.

graph
[
  directed 0
  node
  [
    id 0
    label "Node 1"
    value 100
  ]
  node
  [
    id 1
    label "Node 2"
    value 200
  ]
  edge
  [
    source 1
    target 0
  ]
]

In order for a user to use their own graph data with gravicom, she will need to have a file in this format. Specifically, the id values need to be integers (not characters). One tool that is capable of converting files to the GML format is igraph.

Once a user has her data in this format, the upload to gravicom is straight forward. Simply click 'Upload New Dataset', then 'Choose file', and navigate to the location of the GML file. The data will be stored temporarily on the server while in use and then removed when the user's session is complete.

Saving

The work performed by a user in visualizing graphs and community structure can be downloaded as a data set from gravicom with current communities stored. This feature can be used as a save point in processing a graph or as a means to export changes made in gravicom to another tool.

To save a graph, click save dataset and a GML file with the current status of the graph groupings will be saved to your computer's downloads folder. This file can later be reloaded into gravicom using the process detailed in the previous section of this tutorial.

Graph Layout

Graph layout is the assignment of a Cartesian coordinate to each node for display in two or three dimensions. Layout of a graph significantly affects the number of communities that users detect within a graph. Since we are using humans to detect communities within gravicom, special attention was paid to the layout being used in the application.

According to McGrath, Blythe, and Krackhardt, location of a node spatially relative to other nodes in a cluster has a significant effect on user. In fact, they state, "adjacent nodes must be placed near to each other if possible" in their 1996 paper, The effect of graph layout on inference from social network data.

Force-Directed Layout

One layout that achieve this desired spatial orienation of relative nodes the the force-directed layout. The force-directed layout is a physics based algorithm for graph drawing in which repulsive forces placed on the nodes (charged particles) are used to separate all pairs of nodes. The links are maximal fixed-distance geometric constraints with the physical equivalent of springs placed on their ends holding nodes that are connected only so far apart. In this way, groups of nodes sharing multiple edges are pulled in closer proximity. Additionally, a pseudo-gravity force keeps nodes centered in the visible area and avoids expulsion of disconnected subgraphs.

Motifs

It can be difficult to glean meaning from a visual representation in large/complex graphs. For this reason, it is usseful to replace repeated patterns in a graph by a representation, to simplify a network. These repeated patterns are called motifs. One such motif is a clique, or a 'perfect' community. In gravicom we use this idea to replace communities with a 'super node' representation. This practice results in fewer nodes and edges to display. thus reducing visual complexity of the graph visualization and allowing the user to analyze the network structure more accurately.