Linux! Linux is great. Linux is Open Source. Any nerd wants to run Linux. But is any part of Linux really that great? This was a good question I wasn’t really able to answer until yesterday. Now I have mixed feelings but understanding the following problem better, gives even a bit more safety, also for my personal life.
During the whole last year I had a lot of situations when one of my virtual machines on the server died due to an OOM killer process. Those crashes were not predictable and happened randomly. Sometimes it didn’t happen for weeks but there were also situations when it crashed after 1 day again. Given that a good list of customers are hosting their websites on it, raises a lot of trouble for me. I did a lot of work in trying to fix particular running services on that host, but nothing helped to stop those crashes. Recently I have even doubled the memory for that machine but without success. It always ran into an out of memory crash.
Given all my former research and attempts to fix the problem, I wasn’t sure what else I could do. But thankfully I have found a website which has the explanation and even offered steps to solve the problem.
So what’s happened? The reason can be explained shortly: The Linux kernel likes to always allocate memory if applications asking for it. Per default it doesn’t really check if there is enough memory available. Given that behavior applications can allocate more memory as really is available. At some point it can definitely cause an out of memory situation. As result the OOM killer will be invoked and will kill that process:
Jun 11 11:35:21 vsrv03 kernel: [378878.356858] php-cgi invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
Jun 11 11:36:11 vsrv03 kernel: [378878.356880] Pid: 8490, comm: php-cgi Not tainted 2.6.26-2-xen-amd64 #1
The downside of this action is that all other running processes are also affected. As result the complete VM didn’t work and needed a restart.
To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:
vm.overcommit_memory = 2
vm.overcommit_ratio = 80
The results look good so far and I hope it will stay that way. The lesson I have learned is to not trust any default setting of the Linux kernel. It really can result in a crappy and unstable behavior.
[…] To fix OOM killer crashes, put the following in /etc/sysctl.conf (original article here): […]
This might be able to help phusion passenger users too…oh wait maybe they can just setup a monitor to kill large ruby processes, too?
Well, all the stuff above can be used to prevent a crash of a server. As it turned out the underlying issue here was a WordPress plugin for comments, which didn’t play well with php-cgi. There were a dozen of dead processes hanging around.
I am having a similar problem. What plugin was giving the problem?
As I have said above it was a comment plugin. I don’t remember the name anymore. But whenever you have a high memory usage and Linux overcommits check the ps command for suspicious processes. But well, you could suffer from another issue. It’s hard to tell that way.
Nice one Henrik – I was pulling my hair out over this f&*^^%”ing behaviour. Hopefully this will sort out my own issue (I’m a WordPress user too).
you could put something like this in cron:
passenger-memory-stats | perl -ne ‘if (/(\d+)\s+\d+.\d\sMB\s+(\d+\.\d)\sMB\s+Rails:.*(app1|app2)/) { if ($2 > 180) { print “$1\t$2\t$3\n”; system(“kill $1”);}}’
[…] I found a solution from another blog post whose author was having the exact same problem I had. The fix is to […]
Here’s some further documentation on those kernel settings:
https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
Hi. Thanks for your advice. It helped me a lot. My VPS even run much faster. But i have question. How much RAM do your VPS have? I Have only 2GB so with vm.overcommit_ratio = 80 many services dont run. I changed it to 1000 and everything seems to be fine.
I don’t run this server anymore for about nearly 2 years. So I cannot give an answer to that. But afair it were 4GB.
Having ratio set to 1000 doesn’t make sense. Ratio is in percentage so 1000% is not correct, giving it 80% is allowing 80% of the memory .. It should work.
the 1000 does make sense , it only says it is allowing 10 times the physical memory to be allocated on the virtual memory, it actually works wonders when having 2gb physical ram.
Why do you set the overcommit_ratio to 80? If I understand get the documentation right, that means that 20% of the physical RAM won’t be used at all, because the calculation is (SS+physical mem * (overcommit_ratio/100) ) where SS is swap space.
I wrote this blog post 7 years ago. So I cannot really say why exactly that was necessary, and which fixed at least my problem. Meanwhile a lot has been changed, and the layed out steps might not apply anytime longer.
Ok, thank you! 🙂
[…] One of my GCP VM is running at f1-Micro type which only has limited memory, 614MB. From syslog, I found it has a PHP invoked OOM killer messages. This usually means this server essentially ran out of memory and extra memory should add into this system, Based on some posts online, especially this 2010’s post : […]