CAMOUFLAGE - Controllable AnonyMizatiOn throUgh diFfusion-based image coLlection GEneration
Duration:
Principal investigator(s):
Project type:
Funding body:
Project identification number:
PoliTo role:
Abstract
Current social media generate a tremendous amount of visual material, that can be exploited by researchers operating in social media research, digital humanities, and marketing. However, privacy regulations impose significant restrictions to both data collection and sharing. The CAMOUFLAGE project aims at exploiting recent advances in controlled image synthesis to generate a synthetic version of an image corpora with similar characteristics to a target collection, while at the same time removing all personally identifiable information to ensure the anonymity of the user who published the image. Solving this ambitious goal will require tackling three distinct, yet related, research objectives: to design and implement controllable image synthesis that retains the visual and semantic content of a target image; to determine whether the resulting synthetic images can be considered successfully anonymized; and whether the synthetic collection is semantically equivalent to the original collection. The CAMOUFLAGE synthesizer will be based on diffusion models that extract some non-sensitive data from the original image and exploit it to force the model to preserve the composition of the image, under a predetermined measure of “equivalence”, while removing personal identifiers. Of course, the notion of “equivalence” depends on the objectives and needs of the users: ideally, we wish that conclusions drawn on the synthetic dataset would be valid on the original collection as well. As a motivating example and case study, CAMOUFLAGE will focus on the semiotic analysis of visual big data, specifically of a collection of profile pictures, tagged with socio-demographic data, acquired from Facebook and Instagram. Difference analysis scenarios will be considered, from the large-scale automatic extraction of quantitative information with pre-trained neural networks, to the visual analysis by expert semioticians. If successful, CAMOUFLAGE will not only deliver a useful tool and anonymized assets to the community, but may also bring novel insights into the existing limitations and biases of generative models.
Structures
Keywords
ERC sectors
Sustainable Development Goals
Budget
Total cost: | € 49,750.00 |
---|---|
Total contribution: | € 49,750.00 |
PoliTo total cost: | € 49,750.00 |
PoliTo contribution: | € 49,750.00 |