MSGEP - Modelli statistici generativi per lo studio della evoluzione delle proteine
Duration:
Principal investigator(s):
Project type:
Funding body:
Project identification number:
PoliTo role:
Abstract
Our model targets the entire protein universe represented by the UniProt database, which houses 230 million protein sequences from over 19,000 protein families. Unlike MSA transformers, our approach directly operates on the protein sequence space, employing self-attention mechanisms inspired by large language models. Parameters are shared across families while accounting for family-specific constraints. This enables the model to represent biophysically valid amino acid interaction modes and encode family-specific information regarding protein structure and active sites. Leveraging energy-based models, our framework offers interpretability by generating energy sequence landscapes that provide insights into mutational effects and protein evolution. Experimental validation of model predictions will be facilitated through collaborations with our network of experimental partners.
Structures
Keywords
ERC sectors
Budget
Total cost: | € 4,175.00 |
---|---|
Total contribution: | € 4,175.00 |
PoliTo total cost: | € 4,175.00 |
PoliTo contribution: | € 4,175.00 |