How to fix the OOM killer crashes under Linux

Linux! Linux is great. Linux is Open Source. Any nerd wants to run Linux. But is any part of Linux really that great? This was a good question I wasn’t really able to answer until yesterday. Now I have mixed feelings but understanding the following problem better, gives even a bit more safety, also for my personal life.

During the whole last year I had a lot of situations when one of my virtual machines on the server died due to an OOM killer process. Those crashes were not predictable and happened randomly. Sometimes it didn’t happen for weeks but there were also situations when it crashed after 1 day again. Given that a good list of customers are hosting their websites on it, raises a lot of trouble for me. I did a lot of work in trying to fix particular running services on that host, but nothing helped to stop those crashes. Recently I have even doubled the memory for that machine but without success. It always ran into an out of memory crash.

Given all my former research and attempts to fix the problem, I wasn’t sure what else I could do. But thankfully I have found a website which has the explanation and even offered steps to solve the problem.

So what’s happened? The reason can be explained shortly: The Linux kernel likes to always allocate memory if applications asking for it. Per default it doesn’t really check if there is enough memory available. Given that behavior applications can allocate more memory as really is available. At some point it can definitely cause an out of memory situation. As result the OOM killer will be invoked and will kill that process:

Jun 11 11:35:21 vsrv03 kernel: [378878.356858] php-cgi invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
Jun 11 11:36:11 vsrv03 kernel: [378878.356880] Pid: 8490, comm: php-cgi Not tainted 2.6.26-2-xen-amd64 #1

The downside of this action is that all other running processes are also affected. As result the complete VM didn’t work and needed a restart.

To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:

vm.overcommit_memory = 2
vm.overcommit_ratio = 80

The results look good so far and I hope it will stay that way. The lesson I have learned is to not trust any default setting of the Linux kernel. It really can result in a crappy and unstable behavior.