An Algorithm for Clustering Biological Networks
An Algorithm for Clustering Biological Networks Modality: Technological |
|
The Need
Clustering is a critical step in any data analysis pipeline. In the era of big data and with the emergence of large scale networks, high quality network clustering is in need. In the biomedical domain applications include the identification of protein modules, the discovery of populations of cells and the grouping of clinical cohorts. Recent efforts in this area culminated in popular algorithms such as Louvain and Leiden but these are based on greedy steps that tend to get stuck in sub-optimal solutions. Thus, there is a need for a high-quality, scalable and robust clustering framework.
Our Solution
TAU (Tel Aviv University) is a state-of-the-art clustering algorithm and Python package that efficiently explores the solution space using a genetic algorithm. We benchmark TAU on synthetic and real data sets and show its superiority over previous methods both in terms of the modularity of the computed solution and its similarity to a ground-truth partition when such exists.
Technology Highlights
- Optimization via a genetic algorithm
- Parallel Execution: Multi-core processing using Python's multiprocessing module.
- Modular Core: Object-oriented design for extensibility.
- Flexible Input Formats: Adjacency list, edge list CSV, Pandas DataFrame, igraph and networkx objects.
Applications
- Biological network analysis
- Exploratory data analysis
Development Status
- Done: Fully refactored OOP implementation
- Done: Command-line runnable and scriptable
- Ongoing: Parallelization optimization
- Ongoing: Test suite under construction
Contributors
Prof. Roded Sharan, Gal Gilad, and Hillel Charbit
Lead Developer: Hillel Charbit
Email: hillelch@tauex.tau.ac.il
GitHub: github.com/hillelcharbit/community_TAU
Related researches >>
Related videos >>
Gallery >>
Back to Bio-Computational Research lobby >>