SambaNova breaks Llama 3 speed record with 1,000 tokens per second

Published on May 31, 2024

Image generated by VentureBeat using DALL-E 3

There is no one simple speedometer to measure the speed of a generative AI model, but one of the leading approaches is by measuring how many tokens per second a model handles.

Today, SambaNova Systems announced that it has achieved a new milestone in terms of gen AI performance, hitting a whopping 1,000 tokens per second with the Llama 3 8B parameter instruct model. Until now the fastest benchmark for Llama 3 had been claimed by Groq at 800 tokens per second. The 1,000 tokens per second milestone was independently validated by the testing firm Artificial Analysis. The faster speed has numerous enterprise implications that can potentially lead to significant business benefits, such as faster response times, better hardware utilization and lower costs.

Read the full article here.

SambaNova breaks Llama 3 speed record with 1,000 tokens per second

Microsoft is set to host the African Startups AI Fest – the largest Microsoft Africa virtual event on the continent

Army plans to split up signals intelligence, electronic warfare platform