"Anyway" - Distributed LLMs Made Easy

While public LLM APIs are convenient, they store all queries on providers’ servers. Running open LLMs locally offers privacy and offline access, though setup can be challenging depending on hardware and model requirements. ‘Anyway’ addresses this by distributing queries across multiple GPUs with dynamic scaling.

Prof. Guerraoui works on fault tolerance in distributed systems. This includes challenges encountered when offering a reliable service in the presence of faulty servers or changing resources, specifically in which settings it makes sense to elect a leader or not. This applies to LLMs during training, but more importantly during inference, if more than one GPU is available. There are different trade-offs to be considered with regard to speed and security when working in a dynamic system where resources are ever changing.

On a real-time system, Anyway can solve the challenges encountered when nodes come and go as the requests are handled. During inference on LLMs, different data structures need to be exchanged between the nodes, to allow for parallel requests of different users. Technical parameters which influence the choice of GPUs and LLMs when installing on servers: the size of the LLM, the number of parallel requests and the maximum context.

Once Anyway is set up in a system, it offers a standard API to connect editors, agents, and other applications to the LLM.

To learn more about anyway, you can visit the website of the project: https://anyway.dev/