An Algorithm for Clustering Biological Networks

Photo by Sangharsh Lohakare on Unsplash

An Algorithm for Clustering Biological Networks

Modality: Technological

Prof. Roded Sharan, School of Computer Science, Faculty of Exact Sciences TAU

The Need

Clustering is a critical step in any data analysis pipeline. In the era of big data and with the emergence of large scale networks, high quality network clustering is in need. In the biomedical domain applications include the identification of protein modules, the discovery of populations of cells and the grouping of clinical cohorts. Recent efforts in this area culminated in popular algorithms such as Louvain and Leiden but these are based on greedy steps that tend to get stuck in sub-optimal solutions. Thus, there is a need for a high-quality, scalable and robust clustering framework.

Our Solution

TAU (Tel Aviv University) is a state-of-the-art clustering algorithm and Python package that efficiently explores the solution space using a genetic algorithm. We benchmark TAU on synthetic and real data sets and show its superiority over previous methods both in terms of the modularity of the computed solution and its similarity to a ground-truth partition when such exists.

Technology Highlights

Optimization via a genetic algorithm
Parallel Execution: Multi-core processing using Python's multiprocessing module.
Modular Core: Object-oriented design for extensibility.
Flexible Input Formats: Adjacency list, edge list CSV, Pandas DataFrame, igraph and networkx objects.

Applications