Research database

E-MIMIC - Empowering Multilingual Inclusive Communication (E-MIMIC)

Duration:
24 months (2025)
Principal investigator(s):
Project type:
Nationally funded research - PRIN
Funding body:
MINISTERO (Ministero dell'Università e della Ricerca)
Project identification number:
2022WEFCFP
PoliTo role:
Coordinator

Abstract

Today we observe two interrelated trends: (1) a significant increase in attention to inclusive languages, promoted by academia and policy makers, and (2) unprecedented successes in artificial intelligence and deep learning. The latter is considered one of the most important general-purpose methodologies and has played an important role in automating various language tasks but could also have a significant impact on advancing and promoting inclusive communication. Unfortunately, the proliferation of automated machine translation and conversational agents has further exacerbated the problem of non-inclusive texts. Because they generally rely on English-language, non-gender-specific document corpora, their ability to produce inclusive texts is quite limited. Innovative intelligent systems are urgently needed. This project, called E-MIMIC (Empowering Multilingual Inclusive Communication), is a joint effort of the Deep Learning Natural Language Understanding and Linguistics research communities with the goal of promoting and ensuring equality and inclusion in communication, thus contributing to a more inclusive, innovative, and reflective society. E-MIMIC relies on an innovative use of deep-learning methods for natural language processing trained on a new corpora of formal communication produced by Italian and French linguists in this project. The approach is groundbreaking because for the first time, high-risk concepts such as linguistic and discursive criteria for inclusive communication, data labeling of new corpora of formal communication, and strong human involvement in the data-driven methods are integrated into the core of deep-learning methods for natural language processing to automatically identify non-inclusive text snippets, suggest alternative forms, and produce inclusive text reformulations. Linguistic and discursive criteria for modeling diversity in a community (e.g., gender, special needs, age, ethnicity, and religion) and their intersectionality will be defined to fully represent them in formal communication. These criteria are being used by a large group of linguists to characterize a new corpora of formal communication that various institutions (our universities and public administrations) provide us, and to propose alternative reformulations that reflect the color nuances of our society. Novel strategies for training deep-learning models for natural language processing will be coupled with human-in-the-analytics analysis loop strategies to ensure fair, privacy-friendly, and responsible data processing and to provide fair and unbiased models capable of correctly recognizing natural languages. Finally, an intelligent user interface will be developed to effectively interact with E-MIMIC, highlight portions of text that should be rewritten, see the rank of alternative forms, and create comprehensive text rewording. Using E-MIMIC, we aim to make inclusive communication accessible to a wide range of users.

Structures

Partners

  • POLITECNICO DI TORINO - AMMINISTRAZIONE CENTRALE - Coordinator
  • ALMA MATER STUDIORUM UNIVERSITA' DI BOLOGNA
  • UNIVERSITA' DI ROMA - TOR VERGATA

Keywords

ERC sectors

PE6_7 - Artificial intelligence, intelligent systems, multi agent systems
SH4_9 - Theoretical linguistics; computational linguistics
SH4_11 - Pragmatics, sociolinguistics, discourse analysis

Sustainable Development Goals

Obiettivo 5. Raggiungere l’uguaglianza di genere ed emancipare tutte le donne e le ragazze

Budget

Total cost: € 309,877.00
Total contribution: € 250,000.00
PoliTo total cost: € 135,478.00
PoliTo contribution: € 116,968.00