RAEL: Robustness Analysis of Foundation Models

Date 01/04/2024 - 31/03/2025
Type Privacy Protection & Cryptography, Machine Learning
Partner armasuisse
Partner contact Gerome Bovet
EPFL Laboratory Signal Processing Laboratory 4

Pre-trained foundation models are widely used in deep learning applications due to their advanced capabilities and extensive training on large datasets. However, these models may have safety risks because they are trained on potentially unsafe internet-sourced data. Additionally, fine-tuned specialized models built on these foundation models often lack proper behavior verification, making them vulnerable to adversarial attacks and privacy breaches, even after alignment with human feedback. There are currently no techniques for safely editing these models, especially to remove unwanted behaviors like backdoors. The project aim is to study and explore these attacks in for foundation models.