September 25, 2024

Scaling Out GenAI with Message Queues on Kubernetes

Brendan Cooper
Head of Marketing

Load balancers are a staple of scalable, high-throughput, high-availability architectures. They work great to scale web services. When requests take longer, though, things get complicated. Requests can pile up on some backends; bursts of traffic can send the latency through the roof; and when autoscaling kicks in, it might be too late and/or too expensive.

Asynchronous architectures and message queues can help a lot here combined with event-driven autoscaling.

Session Overview

We're going to see how to implement that pattern on Kubernetes, leveraging:

- A popular LLM to generate thousands of completions;
- RabbitMQ and PostgreSQL to store requests and responses;
- Bento to implement API servers, producers, and consumers without writing code;
- Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling;
- Helm and Helmfile to automate deployment as much as possible.

Who should watch:

- DevOps, Platform, and SRE professionals looking for ways to improve their autoscaling practices. 
- Data engineers who want a better understanding of running their workloads on Kubernetes.

Meet our experts

Jerome Petazzoni

Tinkerer Extraordinaire

Part of the Docker founding team. Docker Community Advocate between 2013 and 2018. These days he teaches Kubernetes at Enix, a French Cloud Native shop.

When he's not busy with computers, he collects musical instruments, and can arguably play the theme of Zelda on a dozen of them.

Anton Weiss

Chief Storyteller PerfectScale

Anton has a storied career in creating engaging and informative content that helps practitioners navigate through the complexities of ongoing Kubernetes operations. With previous experience as a CD Unit Leader, Head of DevOps, and CTO and CEO he has worn many hats as a consultant, instructor, and public speaker. He is passionate about leveraging his expertise to support the needs of DevOps, Platform Engineering, and Kubernetes communities.

PerfectScale Lettermark

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Subscribe to our newsletter
Join Jerome Petazzoni to explore event-driven autoscaling and asynchronous architectures with message queues for scalable, high-performance environments.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.