Research database

MSGEP - Modelli statistici generativi per lo studio della evoluzione delle proteine

Duration:
24/09/2024 - 31/12/2026
Principal investigator(s):
Project type:
University cooperation
Funding body:
PRIVATI (Università Italo Francese)
Project identification number:
C2-118
PoliTo role:
Sole Contractor

Abstract

Our model targets the entire protein universe represented by the UniProt database, which houses 230 million protein sequences from over 19,000 protein families. Unlike MSA transformers, our approach directly operates on the protein sequence space, employing self-attention mechanisms inspired by large language models. Parameters are shared across families while accounting for family-specific constraints. This enables the model to represent biophysically valid amino acid interaction modes and encode family-specific information regarding protein structure and active sites. Leveraging energy-based models, our framework offers interpretability by generating energy sequence landscapes that provide insights into mutational effects and protein evolution. Experimental validation of model predictions will be facilitated through collaborations with our network of experimental partners.

Structures

Keywords

ERC sectors

LS2_11 - Computational biology
PE3_15 - Statistical physics: phase transitions, noise and fluctuations, models of complex systems, etc.
PE3_16 - Physics of biological systems

Budget

Total cost: € 4,175.00
Total contribution: € 4,175.00
PoliTo total cost: € 4,175.00
PoliTo contribution: € 4,175.00