SambaNova Launches Fastest AI Platform with Llama 3.1 405B at 132 Tokens per Second [Video]

SambaNova Systems has introduced SambaNova Cloud, the fastest AI inference platform available today, powered by its SN40L AI chip. The platform offers developers immediate access to Meta’s Llama 3.1 models, including the 405B model, at full 16-bit precision and at a rate of 132 tokens per second (t/s).

The Llama 3.1 70B model runs at 461 t/s. The service is now open to developers without a waiting list.

Cerebras Inference recently announced that it delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, making it 20 times faster than NVIDIA GPU based hyperscale clouds. Meanwhile, Groq can achieve over 500 tokens per second on the Llama 3.1 70B model.

SambaNova Cloud supports both the Llama 3.1 70B model, designed for agentic AI applications, and the 405B model, the largest open-source AI model available.

According to SambaNova CEO Rodrigo Liang, this versatility …

Watch/Read More