One of my Spring-Boot projects was battling with overloaded CPU and unresponsive / slow server response at times when there is more traffic. I have explored and implemented caching but my problem was with excessive connections coming in and server itself becoming slow. I could have setup multiple instances and do some kind of auto-scaling but given limited budget and hardware I wanted to put in some hard limits on my Spring-Boot app as to how much traffic it can take in and when it can give up gracefully (there is no shame in rejecting traffic with HTTP status 503 if the server infrastructure is overloaded).
I found a blog post entry from Netflix on how to tune Apache Tomcat and another article on how to tune Rest Controller code itself to implement a rudimentary Rate Limiter. I was glad to find the RateLimiter implementation in Goggle Guava library which I ultimately ended up using (for now). However I think the annotation driven RateLimiter is also a very good solution which is certainly very powerful and I will take it out for a spin sometime in near future.
The basic lesson learnt from this exercise:
– Tweak Tomcat and keep a watch on the acceptCount parameter which technically puts in a limit of how much traffic reaches your Rest controller.
– Use a RateLimiter on your hot APIs (which have higher latency) and don’t let your application get abused beyond a limit.
– Scale horizontally if the limits set above result in lot of traffic getting rejected.