
SambaNova breaks Llama 3 speed record with 1,000 tokens per second

There is no one simple speedometer to measure the speed of a generative AI model, but one of the leading approaches is by measuring how many tokens per second a model handles.
Today, SambaNova Systems announced that it has achieved a new milestone in terms of gen AI performance, hitting a whopping 1,000 tokens per second with the Llama 3 8B parameter instruct model. Until now the fastest benchmark for Llama 3 had been claimed by Groq at 800 tokens per second. The 1,000 tokens per second milestone was independently validated by the testing firm Artificial Analysis. The faster speed has numerous enterprise implications that can potentially lead to significant business benefits, such as faster response times, better hardware utilization and lower costs.
