RAEL: Robustness Analysis of Foundation Models

Date

01/04/2024 - 31/03/2025

Type

Privacy Protection & Cryptography, Machine Learning

Partner

armasuisse

Partner contact

Gerome Bovet

EPFL Laboratory

Signal Processing Laboratory 4

Pre-trained foundation models are widely used in deep learning applications due to their advanced capabilities and extensive training on large datasets. However, these models may have safety risks because they are trained on potentially unsafe internet-sourced data. Additionally, fine-tuned specialized models built on these foundation models often lack proper behavior verification, making them vulnerable to adversarial attacks and privacy breaches, even after alignment with human feedback. There are currently no techniques for safely editing these models, especially to remove unwanted behaviors like backdoors. The project aim is to study and explore these attacks in for foundation models.