Load balancers are a staple of scalable, high-throughput, high-availability architectures. They work great to scale web services. When requests take longer, though, things get complicated. Requests can pile up on some backends; bursts of traffic can send the latency through the roof; and when autoscaling kicks in, it might be too late and/or too expensive.
Asynchronous architectures and message queues can help a lot here combined with event-driven autoscaling.
Session Overview
We're going to see how to implement that pattern on Kubernetes, leveraging:
- A popular LLM to generate thousands of completions;
- RabbitMQ and PostgreSQL to store requests and responses;
- Bento to implement API servers, producers, and consumers without writing code;
- Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling;
- Helm and Helmfile to automate deployment as much as possible.
Who should watch:
- DevOps, Platform, and SRE professionals looking for ways to improve their autoscaling practices.
- Data engineers who want a better understanding of running their workloads on Kubernetes.
Meet our experts
Jerome Petazzoni
Tinkerer Extraordinaire
Part of the Docker founding team. Docker Community Advocate between 2013 and 2018. These days he teaches Kubernetes at Enix, a French Cloud Native shop.
When he's not busy with computers, he collects musical instruments, and can arguably play the theme of Zelda on a dozen of them.
Anton Weiss
Chief Storyteller PerfectScale
Anton has a storied career in creating engaging and informative content that helps practitioners navigate through the complexities of ongoing Kubernetes operations. With previous experience as a CD Unit Leader, Head of DevOps, and CTO and CEO he has worn many hats as a consultant, instructor, and public speaker. He is passionate about leveraging his expertise to support the needs of DevOps, Platform Engineering, and Kubernetes communities.