Pitfalls in Fine-Tuning LLMs

On the 19th of June 2024 the C4DT Factory organized a hands-on workshop to show what can go wrong when Large Language Models (LLMs) are fine-tuned. It was a pleasure working with our partners from armasuisse, FOITT (BIT), ELCA, ICRC, Kudelski Security, SICPA, Swiss Post, and Swissquote.

LLMs take the world by storm, but for many applications they miss the expert knowledge. This can be solved using fine-tuning: training LLMs with a specialized dataset. This allows LLMs to answer questions about subjects they were not trained before. However, doing this fine-tuning comes with some problems!

The first part of the hands-on workshop provided tools to measure the quality of LLMs. Are they aligned to only answer benign questions? Are the answers of good quality? Do they know private or confidential information?

Then we continued applying different fine-tuning algorithms to LLMs and measured these qualities again. It turned out that most algorithms reduce the alignment, and sometimes even the quality of the answers! In the worst case, the fine-tuned models will spill confidential information found in the training data!

In the end we looked at how to mitigate these problems by using the adapted fine-tuning method.

You can find the presentations here:

For the code of the hands-on exercisers, you can go to our github repository: https://github.com/c4dt/pitfalls_in_fine_tuning_llms