EC2 High CPU Wait and the EBS Provisioned IOPS Difference

On our small Graphite monitoring server (c4.large), we kept having very high CPU wait times. It would hover around 80% almost continuously. I could scrub back in the graph timeline to see where it started and there were no major changes made on that day. (like adding a fleet of new servers or extra data points) So I was pretty confident it was not an application issue.



Turns out the issue was caused by our server running on a General Purpose SSD and it was hitting the max input/output per sec (IOPS) limit causing requests to go into a general pool.

So now onto switch over to Provisioned IOPS SSD, but first we need to determine the IOPS we need.

Within your AWS console under the Elastic Block Store (EBS), you can monitor each volumes performance. So we're going to take the Read Throughput and Write Throughput to determine our IOPS need.



For example from the graph above, we'll round the Read Throughput up to 1 and the Write Throughput maxes out at 100. So in theory we could set the IOPS to 101 and be "ok", but we'll likely want to give a little more head room and round up to at least 200 or even 300 to be safe.

After changing our volume to use the provisioned SSD, we saw an immediate difference in the CPU wait.



To convert an existing General Purpose SSD (or magnetic too) to a Provisioned IOPS, you'll need to complete a couple steps.

  1. Stop EC2 instances - optional but will prevent any data lose
  2. Create snapshot of volume you're changing
  3. Convert snapshot into a Provisioned IOPS volume.
  4. Detach old non-provisioned volume from EC2 instance
  5. Attached new provisioned volume
  6. Start EC2 instance

comments powered by Disqus