GKE WebSockets timeouts in a chat / voice web server running in production.

Lee Boonstra
3 min readNov 30, 2021

I typically don’t write blogs about GKE and infrastructure stuff, I’m not an expert in this space and I am sure lots of people can do a better job than me. I’m a conversational engineer. So with this disclaimer said out loud, I do want to share with you the pain that I struggled with, while deploying my chat / Twilio web server to production. More importantly, if you are facing similar issues, how you can solve this!

My chat server makes use of WebSockets. It’s a Node.js project, running the Express framework with Express-WS as the WebSockets module. I containerized it as a back-end container. I also have a web container, with an Angular website, that talks to it.

It works perfectly! ...on my machine.

I deployed it to Google Cloud with GKE. I could have chosen Google Cloud App Engine Flex or Compute Engine for WebSocket heavy applications as well. But based on my container architecture with deployment scripts and the granularity, I choose Kubernetes.

When I deployed the text chat server, and I would test my application in production, it seemed to work fine. However, when I looked in my browser network tab, or in my Google Cloud logs, I would see that all my web sockets would get killed, a bunch of times per second.

Hmm, well does that matter?

It seems it does, especially once I deployed my virtual agent in a call center using Twilio. I noticed that each phone call would automatically hang up the phone after a certain amount of seconds. Digging into the logs, after first reading my Twilio code millions of times, (assuming I made a programming mistake because working with streams is just hard), I figured that it had to do with the WebSockets time outs.

More specifically: Google Cloud load balancers are not by default configured to handle WebSockets, because by default the load balancers have 30 second timeouts in place that cause connections to close.

I’ve figured that you can create a backend config that updates these timeouts. So I’ve created a yaml file that updates these timeouts to 30 minutes (1800 seconds) per socket, which should be long enough for my phone calls.

backendconfig.yaml

apiVersion: cloud.google.com/v1kind: BackendConfigmetadata:   name: ccai-backendconfigspec:   timeoutSec: 1800   connectionDraining:      drainingTimeoutSec: 1800

Then I would need to update my services yaml file, which exposes the containers, to make use of my backend config. See the line. In the Services i would create an annotation. cloud.google.com/backend-config: ‘{“default”: “ccai-backendconfig”}’

See here my complete services.yaml

apiVersion: v1kind: Servicemetadata:   name: chatserver-service   annotations:      cloud.google.com/app-protocols: ‘{“my-https-port”:”HTTPS”,”my-http-port”:”HTTP”}’      cloud.google.com/backend-config: ‘{“default”: “ccai-backendconfig”}’spec:   type: NodePort   selector:      app: chatserver   ports:   - name: my-http-port     port: 8080     targetPort: 8080---apiVersion: v1kind: Servicemetadata:   name: web-service   annotations:      cloud.google.com/backend-config: ‘{“default”: “ccai-backendconfig”}’spec:   type: LoadBalancer   selector:      app: web   ports:   - name: my-http-port     port: 80     targetPort: 80

Then apply my both yamls through the console:

kubectl apply -f backendconfig.yamlkubectl apply -f services.yaml

After a short wait, the services were deployed. And there we go, after testing the text chat app, I immediately noticed that I didn’t have that many logs! I started calling my virtual agent over the phone, and guess what, the virtual agent didn’t hang up on me! Pffeww! That saved my day. If you found this blog article through Google Search, then hopefully I solved your problem too!

--

--

Lee Boonstra

I’m a Software Engineer Tech Lead at Google. Focusing on Conversational AI & LLMs. Published O’Reilly & Apress author. Twitter: @ladysign