Skip to main content

What is OpenShift CPU throttling? Turbonomic to the Rescue!

 

What and why is OpenShift CPU throttling? Turbonomic to the rescue!

Problem and Terminology

If you've used Turbonomic to optimize your cluster resources, you've seen it flag certain containers as being throttled. What exactly does that mean and why is it so important to address?  

In Kubernetes, pod CPU requirements are defined in a pod specification by setting CPU requests and limits. The CPU request is the baseline amount of CPU that is allocated to the pod and the CPU limit is how high the CPU allocation can scale, if needed for the pod. You define the CPU requests and limits in terms of millicores (m) where 1000m is one core.  Thus,


1000m = 1 core = 1 vCPU

 

OpenShift, a Kubernetes based container platform, uses the Kubernetes concept of CPU throttling to enforce the CPU limit. The key to understanding throttling is that by default in Kubernetes, CPU allocation is based on a time period (100ms) of CPU and not on available CPU power. So even though your overall CPU utilization is low, response time can still be high. How can this happen?

An Example

Let’s take a look at an example where throttling is causing performance issues:

 


Assume our task actually takes 200ms of CPU time to complete.  How long does this take with the parameters as coded above?

With a CPU enforcement period of 100ms, our 200ms task will get only 20ms of CPU every 100ms. That means the task is throttled for 80ms every 100ms until it finally completes after 920ms! This is alarming especially when this task is part of a microservice based application, where it could be a major cause of latency in the application.  

Now, if we raise the CPU limit from 200m to 1 core, we can complete this task in 200ms because 1 core means it's entitled to run for 100ms in the 100ms enforcement period.  Or said another way, with a 1 core CPU limit, we would have no throttling for this task. 

Throttling percentage

Turbonomic calculates the throttling percentage for each container based on how many 100ms periods the task is throttled divided by the total number of 100ms periods it takes for the task to complete. In the example above, it’s 9 out of 10, or 90%. Ideally, this value should be under 10%.

How can Turbonomic help?

Turbonomic can optimize your OpenShift workload by making various recommendations including moving pods, resizing your container requests and limits, and elastically scaling your cluster. 

To ensure your workload isn’t being throttled, Turbonomic monitors the CPU throttling metrics for all your containers running in Kubernetes, in addition to the traditional CPU and memory usage.  We collect the metrics by running a kubeturbo pod inside your cluster.  In Turbonomic, you can see all your containers and sort them by the CPU throttling percentage, so you can pay extra attention to those being significantly throttled.  Turbonomic flags any throttling greater than 5%.

This view is useful for highlighting containers that deserve consideration. But note that not all throttling is catastrophic. For example, suppose you have a nightly batch job that was configured to allow more CPU for other more critical customer flows. In that case, if there are no downstream dependencies affected, throttling of the batch job may not be an issue. Bottom line, there may be some analysis required on your part to understand which containers are being throttled and whether it matters to your application or not.  

Turbonomic also integrates with service mesh like Istio and APM tools such as Instana, AppD and Dynatrace to collect user response time data for your applications.  That by itself may be a great topic for another day.  But the point is, if there are containerized applications running in your Kubernetes cluster, then Turbonomic will stitch them on top of the containers, giving you the “supply chain” view from top to bottom --  applications consume resources from containers, and containers consume resources from the nodes in your cluster.  You will also see the infrastructure layer where you cluster is deployed, such as vSphere, AWS and IBM cloud.  Thus, the app layer, the container layer, and the infrastructure layer are all connected in the supply chain.

Implementing Turbonomic

Let’s look at an example where Turbonomic was used to eliminate CPU throttling. In the diagram below, the Robot Shop application that was discovered via Instana spans over five containers in your cluster:

From there, we can dive into some of the Robot Shop services, such as the Ratings service shown below that is experiencing high response time beyond the Service Level Objective (SLO).

What is happening?  We click on the recommended action to find out.

Even though the CPU usage of this application is reasonably low (purple line), Turbonomic detects substantial CPU throttling happening in the underlying container and recommends sizing up the CPU limit.  30% throttling is high as nobody wants to be throttled and anything beyond a single digit throttling percentage could impact your application’s performance.  In this example, Turbonomic recommends sizing up the CPU limit from 100ms to 243m. 

How does Turbonomic arrive at that number? Turbonomic carefully analyzes your CPU throttling level historically in comparison with the CPU limit and the CPU usage, and then uses an empirically proven formula to recommend a new CPU limit size to minimize CPU throttling while still complying with the quota limit in your namespace. 

By trusting Turbonomic and executing the recommended actions to control throttling, we get the response time back to a healthy state that the application deserves!

Summary

In conclusion, CPU throttling can cause unnecessary latency in a quality application and deployment architecture, while the actual remedy is trivial!  Through the use of Application Performance Monitoring (APM) tools such as Instana and Application Resource Management (ARM) tooling such as Turbonomic,  these issues can be surfaced and addressed with positive impact on your customer and business critical flows.  

References

https://medium.com/omio-engineering/cpu-limits-and-aggressive-throttling-in-kubernetes-c5b20bd8a718

https://blog.turbonomic.com/kubernetes-cpu-throttling-the-silent-killer-of-response-time-and-what-to-do-about-it

https://engineering.indeedblog.com/blog/2019/12/unthrottled-fixing-cpu-limits-in-the-cloud/

https://www.turbonomic.com/appsoncloudsummit/?wchannelid=p6s0np8neb&wmediaid=t87895i9gj


Comments

Post a Comment

Popular posts from this blog

Your OpenShift node fails to restart. Have you checked your PodDisruptionBudget?

Pam Andrejko, Barry Mosakowski   When running an OpenShift cluster, sometimes you need to update your nodes, for example if you upgrade to a new version of OCP,  or apply a change with an oc patch image.config.openshift.io/cluster command, or apply an ImageContentSourcePolicy . Changes like this often require all the nodes in the cluster to be restarted. The Problem   Occasionally, one of the nodes fails to restart which is problematic, because the updates need to be pushed to every node, and obviously every node in the cluster needs to be Schedulable and Ready . How do you know if you have this problem? Run the command: oc get machineconfigpool Monitor the MACHINECOUNT column which shows the total number of nodes in your cluster and compare it to the values in the R EADYMACHINECOUNT column: NAME     CONFIG                  MACHINECOUNT   READYMACHINECOUNT   master   rendered-master-xxxx         3               0                  worker   rendered-worker-yyyy         6