Llamafile: The Simplest Way to Run an LLM Locally
Llamafile is the new simplest solution to run an LLM locally on your computer and prototype quickly. It works in just 3 steps:
- Download a Llamafile from a repository (like Hugging Face)
- Make it a binary (
chmod +x) - Run the executable
A Llamafile combines model weights and the necessary code into a single multi-GB file, sometimes even including a local server with a web UI. On my M1 Mac I get roughly 35ms latency and 28 tokens per second for multimodal input.
Check it out on GitHub.