Trax’s mission is to enable brands and retailers to harness the power of digital technologies to produce the best shopping experiences imaginable. Their industry-leading innovations and excellence through the development of advanced technologies and autonomous data collection methods are driving positive shopper experiences and unlocking revenue opportunities at all points of sale.
Trax’s solution portfolio provides mission-critical metrics, analytics, and services that help their customers save time and money by improving their shopping experience. Kubernetes is a key component of their infrastructure, helping them continuously innovate their solution while providing the scalability to consistently meet the demand. They have grown to a large-scale multi-cloud, multi-cluster environment supporting customers in over 90 countries, including some of the globe's largest enterprises.
Manual Optimizations Led to Minimal Impact on Kubernetes Cost Reduction Goals
At the beginning of the year, the Chief Financial Officer (CFO) laid out aggressive cost savings goals throughout multiple aspects of the organization. For Mark Serdze, Director of Cloud Infrastructure, and his team, this meant quickly taking action to optimize their cloud cost.
They were able to achieve fast results when it came to reducing costs outside of Kubernetes, but when it came to optimizing everything within their clusters they started hitting roadblocks.
“We started optimizing manually with available metrics, using Vertical Pod Autoscaler (VPA), cluster logs, and our monitoring solutions,” explained Serdze. “This approach didn't give us proper clarity and would be challenging to scale efficiently without big development needs. This left us taking ad-hoc, reactive actions that were having minimal impact on our goals.”
The gaps within their current toolset also added some additional friction to the optimization process. After identifying potential areas to reduce costs, there would be a time-consuming back and forth with the team on which actions should be made. This often resulted in them making changes based upon a “hunch”. In many cases, the wrong decisions were made, causing additional work and debate throughout the team, and adding risks that could jeopardize the resiliency of their services.
It became very clear to Serdze and the team that they did not have the proper tooling to help them effectively optimize their Kubernetes environments.
Quickly Optimizing K8s Cost by up to 75%
Shortly after deploying PerfectScale, Serdze, and the team started to get the cost visibility across their Kubernetes environment that they were missing. Across their vast, 200-plus microservice environment, they had clear visibility into what types or resources each service needed, and where were the biggest opportunities to eliminate waste.
The platform's AI-guided intelligence allowed them to quickly take action to start reducing costs. By comparing cost savings with overall resilience, they were able to adjust resources safely and efficiently without compromising performance.
“The cost optimization recommendations were key for us, telling us what actions to take with a clear understanding of the impact each change would have,” Serdze explained. “In one of our clusters, we were able to reduce cost by 75%, saving us over 6-figures in yearly expenses.”
Furthermore, they were impressed by the comprehensive data and intelligence the solution provided across their entire environment. This provided an opportunity to upgrade their cost visibility toolset with virtually no impact on their budget.
“We were able to replace a FinOps tool we were using that didn't provide granular cost details or offer guidance on how to optimize our environment,” explained Serdze. “PerfectScale is a tool built for the engineering teams, not just for finance, which made it easier for us to make the cost impacts we wanted.”
Kubernetes Optimization to Improve Business Metrics
After eliminating the wasted resources, the Trax team turned their attention to identifying additional opportunities for cost optimization. They delved into the insights provided by PerfectScale, seeking ways to make a meaningful impact on their cost-centric business metrics.
“A key metric for us is ‘cost per processing’, which is heavily affected by our Kubernetes efficiency,” said Serdze. “If it gets over a certain amount, we are under a lot of pressure to figure out why and to take actions to reduce it.”
PerfectScale has a unique feature that consolidates every replica of a service into a single view to provide a clear picture of the utilization trends across all replicas, which is especially useful for ephemeral workloads, like Spark or Flink jobs. The Trax team leveraged this capability to better understand the heterogeneous utilization across the replicas of several of their heavily used services. This level of visibility helped them rearchitect some of these services to drive additional cost savings without impacting their resilience or availability.
“We were able to build multiple flavors of the service with different levels of resources and route the incoming requests to the proper service based on the size of the data,” explained Serdze. “This made a big impact on our ‘cost per processing’ metric. PerfectScale surfaced this data instantly, and without them, we would have spent countless hours evaluating hundreds of replicas to generate the same results.”
Finding a Partner in Optimization
On top of being able to achieve quick and effective cost-optimization results, the team at Trax felt a portion of their success was driven by the strategic partnership and support they received from PerfectScale.
“The support during the proof of concept (POC) made a big impact in helping us drive quick results,” said Serdze. “The PerfectScale team sat with us, helped us optimize, and ensured our success in using the platform. I have not seen this level of commitment from other vendors, and I am glad we found a partner we can rely on to help us keep our Kubernetes cost in check as we continue to scale.”