Recently I had a case with a web server farm where a random node went down every few minutes. I don’t mean any of them rebooted except once or twice, but rather they were slowing down so much that practically stopped serving any requests and were being pulled out from the LVS cluster. The traffic was not any different than usual, all other elements of the system worked perfectly fine (e.g. databases, storage), no one started any backup in the middle of the day as it happens sometimes… so what was happening?

First I am going to describe the setup a little bit. As I already mentioned it was about web servers. Each of them was running Lighttpd that handled the requests coming from the internet. It was configured however only to serve static content, such as images. The requests asking for PHP files were passed down with proxy module to Apache listening on another TCP port.

And so I started investigating the problem. As it turned out the systems were slowing down because

process grew to a few gigabytes eating the entire memory which caused system to start swapping heavily. This usually means death to a busy on-line system. Initially I thought about hitting some Lighttpd bug as nothing else seemed wrong, but after a short while I remembered one thing that can cause such behavior. If you use it as a proxy, it will need to buffer the entire response from the backend server before sending to the client. And indeed I started browsing Apache access log and found entries similar to this appearing every few minutes:

Apparently some bug in PHP code with a loop having far too many iterations and even though Apache and PHP handled it without much hassle, Lighttpd kept allocating memory to fit the entire response into the buffer and caused all that mess.

Although this was an extreme case, it is easy to imagine the situation where the problems will appear with much smaller data being sent through Lighttpd proxy. For example with a PHP script for handling larger file downloads which gets many concurrent requests.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Wai Keen Woon

The store-and-forward nature of lighttpd is good for getting data away from the backends as quickly as possible regardless of client speed, thus releasing them to handle other requests. However, without a limit on buffer size the situation you describe can happen. It’s surprised me before, when I was proxying large files off a backend squid server.

I know two reverse proxies that have hybrid S&F and cut-through forwarding with configureable buffer size – nginx and squid.

davies

I had the problem with Nginx too, when i downloaded a big file (750M), and using Nginx as a reversed proxy before lighttpd.

citrin

nginx with default config in such situation buffer upstream response on disk.

davies

o, I will check it again, thank you!

Dieselstation

Do you think this problem would exists if the setup was reversed? Meaning, let Apache be your prime server and do the proxy forward for the static content to the lighttpd! In that case, lighttpd will not consume as much memory and will do its job efficiently.

Julian

@Dieseelstation:
Such a setup would be nonsense, because you would have to start a ‘big’ apache process just to proxy your static content which is served by lighttpd. Just run both on different ports and do no proxying.

Michael Monashev

nginx forever! 🙂

Peter Zaitsev

I spoke to Jan about this case a while back and he was going to add on disk buffering for large sizes though I’m not sure if this was done in lighty 1.5 or waits for next version.

Dieselstation

Julian & Maciej,

Considering your input, I will change my server setup and see how it behaves over a period of week. Dieselstation serves high res wallpapers, so probably it makes sense to serve images directly through lighttpd instead of proxying the request from apache to lighttpd.

Dieter@be

> The store-and-forward nature of lighttpd is good for getting data away from the backends as quickly as possible regardless of client > speed, thus releasing them to handle other requests.

Are you sure of this? A while back I also assumed such logic and implemented nginx proxying in front of an apache webfarm (+-25 webservers). Too my surprise however, it didn’t help at all: the apache servers were as loaded as before. Later, a more experienced colleague explained me that the Linux kernel will automatically change the size of the outgoing tcp buffers so that they are big enough to hold most of the outgoing reponses entirely, yet small enough to not waste too much ram.
So the result would be that when apache has processed an http request and generated the response, it can put it in the tcp buffer and start handling a new request. I.e. if the outgoing response all fits in the buffer, the kernel buffers the data as long as the client is in the middle of getting his data.

Dieter