What to do when Firefox crashes under test automation with Selenium

If you have the task to create automated tests for websites you will most likely make use of Selenium when it comes to testing UI interactions. To execute the tests for the various browsers out there each browser vendor offers a so called driver package  which has to be used by Selenium to run each of the commands. In case of Firefox this will be geckodriver.

Within the last months we got a couple of issues reported for geckodriver that Firefox sometimes crashes while the tests are running. This feedback is great, and we always appreciate because it helps us to make Firefox more stable and secure for our users. But to actually being able to fix the crash we would need some more data, which was a bit hard to retrieve in the past.

As first step I worked on the Firefox crash reporter support for geckodriver and we got it enabled in the 0.19.0 release. While this was fine and the crash reporter created minidump files for each of the crashes in the temporarily created user profile for Firefox, this data gets also removed together with the profile once the test has been finished. So copying the data out of the profile was impossible.

As of now I haven’t had the time to improve the user experience here, but I hope to be able to do it soon. The necessary work which already got started will be covered on bug 1433495. Once the patch on that bug has been landed and a new geckodriver version released, the environment variable “MINIDUMP_SAVE_PATH” can be used to specify a target location for the minidump files. Then geckodriver will automatically copy the files to this target folder before the user profile gets removed.

But until that happened a bit of manual work is necessary. Because I had to mention those steps a couple of time and I don’t want to repeat that in the near future again and again, I decided to put up a documentation in how to analyze the crash data, and how to send the data to us. The documentation can be found at:

https://firefox-source-docs.mozilla.org/testing/geckodriver/doc/geckodriver/CrashReports.html

I hope that helps!

How to fix the OOM killer crashes under Linux

Linux! Linux is great. Linux is Open Source. Any nerd wants to run Linux. But is any part of Linux really that great? This was a good question I wasn’t really able to answer until yesterday. Now I have mixed feelings but understanding the following problem better, gives even a bit more safety, also for my personal life.

During the whole last year I had a lot of situations when one of my virtual machines on the server died due to an OOM killer process. Those crashes were not predictable and happened randomly. Sometimes it didn’t happen for weeks but there were also situations when it crashed after 1 day again. Given that a good list of customers are hosting their websites on it, raises a lot of trouble for me. I did a lot of work in trying to fix particular running services on that host, but nothing helped to stop those crashes. Recently I have even doubled the memory for that machine but without success. It always ran into an out of memory crash.

Given all my former research and attempts to fix the problem, I wasn’t sure what else I could do. But thankfully I have found a website which has the explanation and even offered steps to solve the problem.

So what’s happened? The reason can be explained shortly: The Linux kernel likes to always allocate memory if applications asking for it. Per default it doesn’t really check if there is enough memory available. Given that behavior applications can allocate more memory as really is available. At some point it can definitely cause an out of memory situation. As result the OOM killer will be invoked and will kill that process:

Jun 11 11:35:21 vsrv03 kernel: [378878.356858] php-cgi invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0
Jun 11 11:36:11 vsrv03 kernel: [378878.356880] Pid: 8490, comm: php-cgi Not tainted 2.6.26-2-xen-amd64 #1

The downside of this action is that all other running processes are also affected. As result the complete VM didn’t work and needed a restart.

To fix this problem the behavior of the kernel has to be changed, so it will no longer overcommit the memory for application requests. Finally I have included those mentioned values into the /etc/sysctl.conf file, so they get automatically applied on start-up:

vm.overcommit_memory = 2
vm.overcommit_ratio = 80

The results look good so far and I hope it will stay that way. The lesson I have learned is to not trust any default setting of the Linux kernel. It really can result in a crappy and unstable behavior.