Seminar of the Department of Microbiology
Big and small genes of the global microbiome
24.03.2022, 11:00 - Join online
The Global Microbial Gene Catalogue (GMGCv1) aggregates sequences from over 10 thousand metagenomes from 14 habitats and almost 100 thousand isolate genomes into a catalog of 303 million species-level clusters (95% nucleotide identity, termed unigenes). These were further clustered into 30 million protein families (where any statistically significant sequence similarity was used to identity homology even at distant scales). Analysis of this resource showed that the majority of unigenes stem from a relatively small fraction of protein families. Overall the data support a model whereby the majority of unigenes are created by neutral processes within existing families rather than adaptation to local conditions.
This work was published at https://www.nature.com/articles/s41586-021-04233-4.
More recently, we have extended this effort to include the hitherto neglected fraction of short genes (operationally defined as ≤100 amino acids). As they are difficult to handle using standard bioinformatics approaches, they have generally been left out (including from the GMGCv1 effort). Thus, we are adapting our approaches to extend the same cataloging efforts into the world of small proteins.
Luis (he/they) can be reached at email@example.com