Today I was running a Load Testing of the application we're running in AWS Fargate. The application started to fail at higher number of VUs, returning this response:
time="2020-03-02T09:41:06Z" level=info msg="\"<html>\\r\\n<head><title>502 Bad Gateway</title></head>\\r\\n<body bgcolor=\\\"white\\\">\\r\\n<center><h1>502 Bad Gateway</h1></center>\\r\\n</body>\\r\\n</html>\\r\\n\""
The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target. Check whether the keep-alive duration of the target is shorter than the idle timeout value of the load balancer.
I decided to go with other solution than the original article though. I don't find nginx necessary for our use-case, not to mention that the suggested image uses
python:3.7 as base docker image.
We are currently using
python:3.7-slim, and switching to full ubuntu would increase the final app image size.
That would cost us more money in bandwidth (fetching the image for every deploy) and increase the time the deploy takes.
Instead, I just increased the
The Idle timeout configured at the Load Balancer is 60 second, so
--keep-alive 65 sounded reasonable.
After deploying this quick fix, load tests stopped failing, and the backend is not returning error 502.