Seminar of the Department of Microbiology

Big and small genes of the global microbiome

Prof. Luis Pedro Coelho - Associate Professor, Fudan University

24.03.2022, 11:00 - Join online



The Global Microbial Gene Catalogue (GMGCv1) aggregates sequences from over 10 thousand metagenomes from 14 habitats and almost 100 thousand isolate genomes into a catalog of 303 million species-level clusters (95% nucleotide identity, termed unigenes). These were further clustered into 30 million protein families (where any statistically significant sequence similarity was used to identity homology even at distant scales). Analysis of this resource showed that the majority of unigenes stem from a relatively small fraction of protein families. Overall the data support a model whereby the majority of unigenes are created by neutral processes within existing families rather than adaptation to local conditions.

This work was published at

More recently, we have extended this effort to include the hitherto neglected fraction of short genes (operationally defined as ≤100 amino acids). As they are difficult to handle using standard bioinformatics approaches, they have generally been left out (including from the GMGCv1 effort). Thus, we are adapting our approaches to extend the same cataloging efforts into the world of small proteins.

