🔥 Meta's latest release, Llama 3, has delivered an impressive set of Large Language Model (LLM) benchmarks. While the 8B and 70B parameter models show impressive results against closed-source and open-source models such as Claude, Gemini and Mistral, it's interesting to compare improvements of Llama 3 vs. Llama 2.
💸 With fewer parameters (8B vs. 70B), Llama 3 outperforms its predecessor in each of the listed benchmarks. If you are coming from Llama 2 7B, you'll potentially see two orders of magnitude performance improvements across some benchmarks while only marginally increasing the resources required to serve the new model. If you are currently running Llama 70B you can potentially reduce your GPU memory by almost an order of magnitude by adopting the smaller highly capable 8B model. This saves resources, money and energy.
🔧 What's behind the performance increase? The size of the training data set has increased significantly, now spanning 15 trillion tokens. Additionally, Llama 3 supports a larger context window of 8k, allowing for more nuanced generation and context.
But what does this mean for a typical knowledge worker? How can they easily take advantage of the new Llama 3 LLM and why should they? One of the reasons to consider using Llama 3 is it can be run privately within the confines of a modest laptop or desktop computer. I've been using the exceptionally useful tool 🦙ollama since the end of 2023. When combined with Open Web UI they provide an incredibly productive web based interface to using self-hosted LLMs. If you haven't already tried out ollama, I highly recommend doing so.