
While public LLM APIs are convenient, they store all queries on providers’ servers. Running open LLMs locally offers privacy and offline access, though setup can be challenging depending on hardware and model requirements. ‘Anyway’ addresses this by distributing queries across multiple GPUs with dynamic scaling.
Professor Guerraoui’s lab is developing “Anyway”, a tool that can run large language models (LLMs) on server hardware. The tool uses the lab’s research into distributed systems to perform inference on a dynamic group of GPUs. This allows for on-demand scaling and the straightforward installation of various open LLMs on local server hardware. Join us on Monday 24th of November from 10am to 11:45am and from 1:00pm to 4:30pm in room BC329 at EPFL for a hands-on workshop to find out more! You can sign up for the morning and/or the afternoon here: Signup Sheet
Agenda
Morning – Presentation by Prof. Guerraoui
- 10:00am – Welcome coffee in BC410 at EPFL
- 10:30am – Challenges of distributed systems and how to run LLM inference
- 11:45am – Lunch
Afternoon – Hands-on workshop in collaboration with Prof. Guerraoui’s lab
- 1:00pm – Introduction and configuration of laptops
- 1:30pm – Exercises on laptops and EPFL’s RPC cluster
- 4:00pm – Wrap-up
Morning – Presentation
1h15 presentation: Prof. Guerraoui presents the work done in his lab regarding fault tolerance in distributed systems. Starting with the challenges encountered of offering a reliable service in the presence of faulty servers or changing resources, and continuing to discuss in which settings it makes sense to elect a leader or not.
Then Prof. Guerraoui will give more details on how this applies to LLMs: what possibilities we have in training, but specifically during inference, if more than one GPU is available. There are different trade-offs to be considered with regard to speed and security when working in a dynamic system where resources are ever changing.
The target audience are project leaders and decision makers participating in projects related to running LLMs locally. But it is also an introduction for the afternoon hands-on workshop for the software engineers.
Afternoon – Hands-on Workshop
3h30 hands-on workshop: The C4DT Factory team, together with the postdocs Geovani Rizk and Gauthier Voron from Prof. Guerraoui’s lab, present the challenges of running a distributed LLM inference system of various sizes.
The first installation will be locally on the laptops of the participants. This will show the challenges encountered when nodes come and go as the requests are handled. We will dive into the technical details of inference on LLMs, the different data structures needed to be exchanged between the nodes, and how to allow for parallel requests of different users.
In a second part, we’ll discuss the technical parameters influencing the choice of GPUs and LLMs when installing on servers. You’ll learn the parameters involved, not only regarding the size of the LLM, but also the number of parallel requests and the maximum context. Then we’ll use the EPFL GPUs to run some LLMs, which will show you how to use Kubernetes to distribute a load on servers, how to observe the nodes, and dynamically adding or removing nodes.
Finally we’ll connect the LLMs to the tools running on the laptops, like Visual Studio Code, Zeditor, or Claude Code. Based on this you’ll learn the parameters needed when multiple tools access in parallel an LLM.
The target audience for the afternoon is lead software engineers of our partners, who already have some basic knowledge of LLMs, but would like to learn more, and see whether it’s feasible to install them locally.
Sign-up and Contact
You can sign up for the different parts (morning, afternoon, or both) of the hands-on workshop here:
Signup Sheet – in case of problems, sign up via email
If you have questions, please get in contact with linus.gasser@epfl.ch