MSGEP - Modelli statistici generativi per lo studio della evoluzione delle proteine
Durata:
Responsabile scientifico:
Tipo di progetto:
Ente finanziatore:
Codice identificativo progetto:
Ruolo PoliTo:
Abstract
Our model targets the entire protein universe represented by the UniProt database, which houses 230 million protein sequences from over 19,000 protein families. Unlike MSA transformers, our approach directly operates on the protein sequence space, employing self-attention mechanisms inspired by large language models. Parameters are shared across families while accounting for family-specific constraints. This enables the model to represent biophysically valid amino acid interaction modes and encode family-specific information regarding protein structure and active sites. Leveraging energy-based models, our framework offers interpretability by generating energy sequence landscapes that provide insights into mutational effects and protein evolution. Experimental validation of model predictions will be facilitated through collaborations with our network of experimental partners.
Strutture coinvolte
Parole chiave
Settori ERC
Budget
Costo totale progetto: | € 4.175,00 |
---|---|
Contributo totale progetto: | € 4.175,00 |
Costo totale PoliTo: | € 4.175,00 |
Contributo PoliTo: | € 4.175,00 |