Pre-trained foundation models are widely used in deep learning applications due to their advanced capabilities and extensive training on large datasets. However, these models may have safety risks because they are trained on potentially unsafe internet-sourced data. Additionally, fine-tuned specialized models built on these foundation models often lack proper behavior verification, making them vulnerable to adversarial attacks and privacy breaches, even after alignment with human feedback. There are currently no techniques for safely editing these models, especially to remove unwanted behaviors like backdoors. The project aim is to study and explore these attacks in for foundation models.
RAEL: Robustness Analysis of Foundation Models
Date | 01/04/2024 - 31/03/2025 |
Type | Privacy Protection & Cryptography, Machine Learning |
Partner | armasuisse |
Partner contact | Gerome Bovet |
EPFL Laboratory | Signal Processing Laboratory 4 |