Why is your GKE cluster so slow?

Mauricio Junior
1 min readJan 24, 2021

Because you are not using SSD disk type on your node pools.

Seriously. IOPS is the main reason why you will very often get errors like:

Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

or

dial tcp 10.96.0.1:443: i/o timeout

Turns out IOPs is something really neglected when we learn about Kubernetes. All operations that Kubernetes performs heavily rely on disk performance. The network is also dependent on disk performance. If we have constant autoscaling of pods and nodes, and the Kubernetes services or your services are running in a crappy standard disk, you will very likely have cluster disruption and messages like: “Repairing cluster”. Why do they even offer non SSD disks? As a Desktop user, at least for Windows, it’s nearly impossible to be happy without SSD as it takes forever to boot and applications take forever to load. Now imagine running your application, which heavily rely on disk for network, on a standard disk.

I would keep increasing VM and pod replicas with no success. Turned out I had no SSD.

--

--

Mauricio Junior

Another random dev with a lousy amount of random articles (literally one) that will eventually receive a clap.