Google DeepMind has a new way to look inside anAI’s “mind”
MIT Technology Review
—
14/11/2024
Researchers at Google DeepMind are making progress in interpretability of AI models using “sparse autoencoders”, a tool that helps shine light on the inner workings of a model’s logic, how it works, and when it errs. By identifying these features of a model, the model can, in principle, be steered away from undesirable outcomes like biased output, inappropriate content or incorrect answers. This is a positive step towards developing AI that is explainable and trustworthy.