Posted by Maciej Dobrzanski |

Recently I had a case with a web server farm where a random node went down every few minutes. I don’t mean any of them rebooted except once or twice, but rather they were slowing down so much that practically stopped serving any requests and were being pulled out from the LVS cluster. The traffic was not any different than usual, all other elements of the system worked perfectly fine (e.g. databases, storage), no one started any backup in the middle of the day as it happens sometimes… so what was happening?
[read more...]