March 30, 2023

Solidus Labs Reduces Kubernetes Resilience Issues by 90%

Brendan Cooper
Head of Marketing

Solidus Labs is on a mission to enable safer crypto trading throughout the investment journey across all centralized and DeFi markets. As the founder of industry-leading initiatives, Solidus is deeply committed to ushering in the financial markets of tomorrow.

To support the rapid growth in the crypto markets, and to meet the ever-increasing demand from their clients, Solidus leverages Amazon EKS Kubernetes as the foundation of their application infrastructures. To ensure their environment will scale as the company grows, Solidus utilizes the expertise and services of Develeap. 

Develeap, one of the largest DevOps consultancies in Israel, was responsible for building the initial architecture and providing ongoing support and maintenance of the Solidus environment. This includes the set up of monitoring, observability, and alerting, as well as cost optimization of their environment.

Partnering with Develeap has allowed Solidus to scale its environment to a dozen of multi-regional clusters, spanning its services to clients across the globe. 

Sustaining Resilience During Rapid Growth

The Develeap team implemented many capabilities to ensure the Kubernetes environment was running smoothly and efficiently, for example using Keda for horizontal pod autoscaler (HPA). However, they were lacking the ability to optimally “right-scale” pod resources, leading to performance problems that were causing CPU throttling and out-of-memory (OOM) issues.

To make matters more complicated, their environment is in a state of constant change. “R&D is releasing changes on an hourly basis, and, due to the nature of our business,” Ben Hoffman, R&D Director of Solidus Labs. “Some of our clients send data in large batches, while others use us as a real-time service, making it hard to predict the load fluctuations on our services.”

To proactively address these challenges, the team spent hours improving the stability of their biggest cluster, then replicating the resource capacity throughout the others. But, in an environment that was constantly changing, the results were short-lived and this approach caused unnecessary waste across the smaller clusters. 

“I jumped into a Grafana and pulled in metrics from Prometheus and logs from Logz.io, and made adjustments to the requests based on the different peaks of our environment,” Shemtov Fisher, DevOps Engineer at Solidus Labs/Develeap. “Then a few weeks would pass, and we started seeing throttling and memory issues resurface, leading to a second round of adjustments. When I jumped in a third time, I knew we needed a solution in place to help automate this process. PerfectScale is the exact solution we needed to fill this gap.”

Improving K8s Stability by Reducing CPU Throttling and OOM Issues by 90%

Shortly after implementing PerfectScale, Solidus was able to proactively “right scale” their pod resources, leading to a significant reduction in CPU throttling and OOM issues.  

“We went from multiple issues a day, to maybe one or two issues in the last month,” said Hoffman. “With PerfectScale, we have seen over a 90% reduction helping us ensure our applications have the capacity to meet our customer demand.”

Additionally, PerfectScale has drastically reduced the mean-time-to-resolution (MTTR) for capacity-related issues. 

“Before PerfectScale, the DevOps team would get an alert when an issue occurred, then we would triage the issue to the proper service owner to resolve,” Barak Arzuan, DevOps Engineer Solidus Labs/Develeap. “Depending on the criticality, it could take hours or even more for the service owners to evaluate the issue and provide us with the proper resource requirements. With PerfectScale, we can immediately provide the service providers with evidence on why the issue is happening along with precise recommendations on how to resolve it. This has helped a lot with our day-to-day operations.”

No More Continuous Manual Work for System Health and Cost-efficiency.

Adding additional capacity to improve system resilience and stability tends to come with a price. To mitigate any additional costs, the Develeap team was able to leverage PerfectScale cost-optimization capabilities to move unused resources to areas that needed additional capacity. 

“In some of our clusters, we found significant cost savings opportunities,” explained Arzuan. “We were able to reinvest these savings into our clusters that were lacking resources. This resulted in a fully stable, resilient, and cost-effective environment with no impacts on our budget.” 

“We have a large number of clients, each using our application slightly differently. Keeping our Kubernetes environment optimized is essential for Solidus Labs to ensure our applications have the resources they need to support our customers today, and as our company continues to grow in the future,” said Hoffman. “PerfectScale is removing time-consuming manual tasks we have faced in the past, making it easy to continuously maintain our system’s health and cost-effectiveness.”

PerfectScale Lettermark

Reduce your cloud bill and improve application performance today

Install in minutes and instantly receive actionable intelligence.
Subscribe to our newsletter
See how PerfectScale helped Solidus Labs reduce SLA breaches by improving their Kubernetes resilience.
This is some text inside of a div block.
This is some text inside of a div block.

About the author

This is some text inside of a div block.
more from this author
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.