Llamafile: The Simplest Way to Run an LLM Locally

Llamafile is the new simplest solution to run an LLM locally on your computer and prototype quickly. It works in just 3 steps:

  1. Download a Llamafile from a repository (like Hugging Face)
  2. Make it a binary (chmod +x)
  3. Run the executable

A Llamafile combines model weights and the necessary code into a single multi-GB file, sometimes even including a local server with a web UI. On my M1 Mac I get roughly 35ms latency and 28 tokens per second for multimodal input.

Check it out on GitHub.