How to fix the OOM killer crashes under Linux

Linux! Linux is great. Linux is Open Source. Any nerd wants to run Linux. But is any part of Linux really that great? This was a good question I wasn’t really able to answer until yesterday. Now I have mixed feelings but understanding the following problem better, gives even a bit more safety, also for my personal life.

During the whole last year I had a lot of situations when one of my virtual machines on the server died due to an OOM killer process. Those crashes were not predictable and happened randomly. Sometimes it didn’t happen for weeks but there were also situations when it crashed after 1 day again. Given that a good list of customers are hosting their websites on it, raises a lot of trouble for me. I did a lot of work in trying to fix particular running services on that host, but nothing helped to stop those crashes. Recently I have even doubled the memory for that machine but without success. It always ran into an out of memory crash.

Given all my former research and attempts to fix the problem, I wasn’t sure what else I could do. But thankfully I have found a website which has the explanation and even offered steps to solve the problem.

So what’s happened? The reason can be explained shortly: The Linux kernel likes to always allocate memory if applications asking for it. Per default it doesn’t really check if there is enough memory available. Given that behavior applications can allocate more memory as really is available. At some point it can definitely cause an out of memory situation. As result the OOM killer will be invoked and will kill that process:

Jun 11 11:35:21 vsrv03 kernel: [378878.356858] php-cgi invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
Jun 11 11:36:11 vsrv03 kernel: [378878.356880] Pid: 8490, comm: php-cgi Not tainted 2.6.26-2-xen-amd64 #1

The downside of this action is that all other running processes are also affected. As result the complete VM didn’t work and needed a restart.

To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:

vm.overcommit_memory = 2
vm.overcommit_ratio = 80

The results look good so far and I hope it will stay that way. The lesson I have learned is to not trust any default setting of the Linux kernel. It really can result in a crappy and unstable behavior.

Results of the second Add-ons Manager testday

Last Friday, June the 11th, we had our second testday for exploratory testing the new Add-ons Manager. It was again well attended and we had a couple of fantastic discussions across its whole duration. If you weren’t able to attend but interested in details about the discussions, you can read through the chat transcription which lists any specific detail.

Given our last testday end of April, I have continued my idea to see the testday covering more than only the PDT timezone. That means this time it has also been started 8am UTC and ended 5pm PDT. At the beginning we had lesser action but after lunch time in Europe more and more people joined and participated in discussions. The most active people in the channel were aaronmt, aleksej, dark_skeleton, gabe2300, kbrosnan, kinger, mossop, smaug, tchung, tobbi, tonymec, unfocused, wx24, and myself.

After all we were able to identify 11 new bugs which is much lesser than the last time. But I think it speaks for the work which happened in that area the last one and a half month.

We thank everyone who has participated in that testday and made it a success again. Now with the new themes approaching in the near future, another testday has to be targeted. Stay tuned and check for updates regularly.

Second Add-ons Manager redesign testday this Friday

This Friday, June 11th, Mozilla QA is holding the second testday about the new Add-ons Manager for Firefox 4.0. Looking back to the fantastic results we got from the last testday we hope to have a similar attendance this time.

We will concentrate on exploratory testing mostly the complete user interface and the back-end. You will find detailed information in our testday event page.

Please join and help us to make sure we will have the best Add-ons Manager ever in Firefox 4.

Jsbridge 3.5.6 and Mozrunner 2.4.3 available

Within the last two weeks we had to push two maintenance releases for jsbridge and mozrunner which now fix two major issues:

Bug 570790: Due to a broken pyPI package of jsbridge the extension wasn’t working properly on all platforms. At least on Windows the files which have been accidentally added to the tar.gz file, caused an error when loading files from the components and chrome folder of the extension. We have removed all those instances to make sure the extension will work again.

Bug 568839: Installing binary extensions with Mozmill was broken in version 2.4.2. All files which have been extracted from the XPI were handled as ASCII files. With this fix Mozmill is now able to install any version of extensions successfully.

If you still notice any other problems please get in contact with us.