proteanTecs is on a mission to enable the electronics industry to scale. Their cloud-based platform combines deep telemetry data with machine learning to monitor the health and performance of advanced chips, transforming the way the reliability of electronics is achieved.
proteanTecs solutions are a critical component used by key customers across various industries, including semiconductor, data center, and automotive. This comes with the necessity of having an application infrastructure that is stable, scalable, and highly available. To ensure their application can meet the demands of their customers, proteanTecs uses Kubernetes to run its entire platform. They have grown their environment to over 30 clusters across AWS, Azure, and Google Cloud services.
Insufficient Insights for Resource Optimization
The proteanTecs platform has seen substantial growth both from a functionality and an end-user standpoint. In the early stages of the company's history, application stability, and availability were the most critical factors, but as their solution scaled, they began to notice sub-optimized resources that were impacting their budget.
“Our development team builds their microservices and chooses their own resource request and limits,” explained Amit Daniel, Director of DevOps in proteanTecs. “However, the developers tended to allocate risk-averse resources to ensure their services were working as expected. This approach kept our systems available and stable, but it was inefficient from a cost standpoint.”
To eliminate the use of unnecessary resources and reduce cloud costs, the DevOps team was tasked with optimizing their K8s environment, however, their current toolset had several gaps that would make optimization a complex and time-consuming task.
“With tools like DataDog, we can see whatever we want; requests, limits, etc., but they only give us visibility. We needed better insights into what resources our microservices need to run properly,” Daniel said. “PerfectScale provides us with the exact visibility and insights we need to optimize our Kubernetes costs. It compares the resources we have allocated for each service vs actual utilization to identify over-provisioning, then tells us what actions we need to take to cut costs safely.”
AI-Driven Recommendations Cut Costs to $5K While Maintaining Stability
When first introduced to PerfectScale, the proteanTecs team implemented the solution in one of their development environments and started identifying areas to optimize costs.
“Our Dev environment was costing roughly $10k per month,” said Daniel. “By implementing PerfectScale’s recommendations, we were able to get the cost down to $5k without compromising the stability of the environment.”
As the team moved the solution into production environments, they wanted to make sure that the cost optimization efforts would not impact system performance. To ensure services remained stable, the team leveraged PrefectScale’s ability to provide services additional headroom by adjusting the “resilience level”.
Additionally, the team also used PerfectScale to identify services that were lacking the resources needed to perform properly.
“We found throttling, OOM, and container restarts that we did not know were happening. Many issues were not triggering alerts in our monitoring system. Also, if we were not capturing the right metrics, it was challenging and time-consuming to determine the root cause of something like a memory leak.” Daniel explained. “PerfectScale instantly identifies the issues and shows us how to resolve them quickly. As a result, we see that our platform is performing better, which is helping us to provide our customers with a better experience.”
Building a Culture of Cost-Awareness and Stability
Currently, the DevOps team is still optimizing cost, but that will change soon. “I have PerfectScale opened on my browser at all times, but after we do the initial phase of cost-optimization, we plan on expanding the scope to our developers,” Daniel said.
The goal is to improve cost awareness and accountability across the team. The expectations will be for developers to evaluate the cost-effectiveness of their services after every release and deployment, then make the necessary changes to keep their Kubernetes environment continuously optimized.
“As we continue to deploy new innovations to our application, and as the customer base grows, and want to make sure our environment grows as efficiently as possible,” Daniel said. “PerfectScale will be an essential solution in helping us keep our environment cost-effective and stable.”