Tag Archives: jenkins

Review of automation work – Q4 2015

The last quarter of 2015 is gone and its time to reflect what happened in Q4. In the following you will find a full overview again for the whole quarter. It will be the last time that I will do that. From now on I will post in shorter intervals to specific topics instead of covering everything. This was actually a wish from our latest automation survey which I want to implement now. I hope you will like it.

So during the last quarter my focus was completely on getting our firefox-ui-tests moved into mozilla-central, and to use mozharness to execute firefox-ui-tests in mozmill-ci via the test archive. As result I had lesser time for any other project. So lets give some details…

Firefox UI Tests / Mozharness

One thing you really want to have with tests located in the tree is that those are not failing. So I spent a good amount of time to fix our top failures and all those regressions as caused by UI changes (like the security center) in Firefox as preparation for the move. I got them all green and try my best to keep that state now while we are in the transition.

The next thing was to clean-up the repository and split apart all the different sub folders into their own package. With that others could e.g. depend on our firefox-puppeteer package for their own tests. The whole work of refactoring has been done on bug 1232967. If you wonder why this bug is not closed yet it’s because we still have to wait with the landing of the patch until mozmill-ci production uses the new mozharness code. This will hopefully happen soon and only wait of some other bugs to be fixed.

But based on those created packages we were able to use exactly that code to get our harness, puppeteer, and tests landed on http://hg.mozilla.org/mozilla-central. We also package them into the common.tests.zip archive for use in mozmill-ci. Details about all that can be found on bug 1212609. But please be aware that we still use the Github repository as integration repository. I regularly mirror the code to hg, which has to happen until we can also use the test package for localized builds and update tests.

Beside all that there were also a couple of mozharness fixes necessary. So I implemented a better fetching of the tooltool script, added the uninstall feature, and also setup the handling of crash symbols for firefox-ui-tests. Finally the addition of test package support finished up my work on mozharness for Q4 in 2015.

During all the time I was also sheriffing our test results on Treeherder (e.g. mozilla-central) because we are still Tier-3 level and sheriffs don’t care about it.

Mozmill CI

Our Jenkins based CI system is still called mozmill-ci even it doesn’t really run any mozmill tests anymore. We decided to not change its name given that it will only be around this year until we can run all of our tests in TaskCluster. But lots of changes have been landed, which I want to announce below:

  • We followed Release Engineering and got rid of the OS X 10.8 testing. That means the used Mac minis were ready to get re-imaged with OS X 10.11. The transition worked seamlessly.
  • Enhancements for test report submission to Treeherder by switching over to Hawk credentials and more understandable group names and symbols.
  • Preparation of all the machines of our supported platforms (OSX: 10.6, 10.9, 10.10, 10.11 / Ubuntu: 14.04 / Windows: XP, 7, 8.1) to be able to handle mozharness driven tests.
  • Implemented and updated scripts to run firefox-ui-tests via mozharness and to allow Treeherder to fully parse our logs.

Addons / Tools

I also had some time to work on supporting tools. Together with the help of contributors we got the following done:

mozdownload

  • Release of mozdownload 1.18.1 and 1.19 to cope with the move of builds to AWS

Nightly Tester Tools

Memchaser

So all in all it was a productive quarter with lots of things accomplished. I’m glad that we got all of this done. Now in Q1 it will continue and more interesting work is in-front of me, which I’m excited about. I will announce that soon in my next blog post.

Until then I would like to give a little more insight into our current core team for Firefox automation. A picture taken during our all hands work week in Orlando early in December shows Syd, Maja, myself, and David:

Group Picture

Lets get started into 2016 with lots of ideas, discussions, and enough energy to get those things done.

Firefox Automation report – Q3 2015

It’s time for another Firefox Automation report! It’s incredible how fast a quarter passes by without that I have time to write reports more often. Hopefully it will change soon – news will be posted in a follow-up blog post.

Ok, so what happened last quarter for our projects.

Mozharness

One of my deliverables in Q3 was to create mozharness scripts for our various tests in the firefox-ui-tests repository, so that our custom runner scripts can be replaced. This gives us a way more stable system and additional features like crash report handling, which are necessary to reach the tier 2 level in Treeherder.

After some refactoring of the firefox ui tests, scripts for the functional and update tests were needed. But before those could be implemented I had to spent some time in refactoring some modules of mozharness to make them better configurable for non-buildbot jobs. All that worked pretty fine and finally the entry scripts have been written. Something special for them is that they even have different customers, so extra configuration files had to be placed. In detail it’s us who run the tests in Jenkins for nightly builds, and partly for release builds. On the other side Release Engineering want to run our update tests on their own hardware when releases have to be tested.

By the end of September all work has been finished. If you are interested in more details feel free to check the tracking bug 1192369.

Mozmill-CI

Our Jenkins instance got lots of updates for various new features and necessary changes. All in all I pushed 27 commits which affected 53 files.

Here a list of the major changes:

  • Refactoring of the test jobs has been started so that those can be used for mozharness driven firefox-ui-tests later in Q4. The work has not been finished and will be continued in Q4. Especially the refactoring for report submission to Treeherder even for aborted builds will be a large change.

  • A lot of time had to be spent in fixing the update tests for all the changes which were coming in with the Funsize project of Release Engineering. Due to missing properties in the Mozilla Pulse messages update tests could no longer be triggered for nightly builds. Therefore the handling of Pulse messages has been completely rewritten to allow the handling of similar Pulse messages as sent out from TaskCluster. That work was actually not planned and has been stolen me quite some time from other projects.

  • A separation of functional and remote tests didn’t make that much sense. Especially because both types are actually functional tests. As result they have been merged together into the functional tests. You can still run remote tests only by using --tag remote; similar for tests with local testcases by using `–tag local.

  • We stopped running tests for mozilla-esr31 builds due to Firefox ESR31 is no longer supported.

  • To lower the amount of machines we have to maintain and to getting closer what’s being run on Buildbot, we stopped running tests on Ubuntu 14.10. Means we only run on Ubuntu LTS releases from now on. Also we stopped tests for OS X 10.8. The nodes will be re-used for OS X 10.11 once released.

  • We experienced Java crashes due to low memory conditions of our Jenkins production master again. This was kinda critical because the server is not getting restarted automatically. After some investigation I assumed that the problem is due to the 32bit architecture of the VM. Given that it has 8GB of memory a 64bit version of Ubuntu should have been better used. So we replaced the machine and so far everything looks fine.

  • Totally surprising we had to release once more a bugfix release of Mozmill. This time the framework didn’t work at all due to the enforcement of add-on signing. So Mozmill 2.0.10.2 has been released.

Firefox Automation report – Q2 2015

It’s been a while since I wrote my last Firefox automation report, so lets do another one to wrap up everything happened in Q2 this year. As you may remember from my last report the team has been cut down to only myself, and beside that I was away the whole April. Means only 2 months worth of work will be covered this time.

In general it was a tough quarter for me. Working alone and having to maintain all of our infrastructure and keeping both of our tests (Mozmill and Marionette) green was more work as expected. But it still gave me enough time to finish my two deliverables:

  • Ensure better visibility of Firefox UI test results for sheriffs and developers by uploading them to treeherder
  • Finalize features for Firefox UI update tests, and help to get them running on RelEng hardware

Firefox UI Test Results on Treeherder

With the transition of Mozmill tests to Marionette we no longer needed the mozmill-dashboard. Especially not since nearly no job in our Mozmill CI is running Mozmill anymore. Given that we also weren’t happy with the dashboard solution and the usage of couchdb under the hood, I was happy that a great replacement exist and our Marionette tests can make use of.

Treeherder is the new reporting system for tests being run in buildbot continuous integration. Because of that it covers all products and their appropriate supported branches. That’s why it it’s kinda perfect for us.

Before I was able to get started a bit of investigation had to be done, and I also checked some other projects which make use of treeherder reporting. That information was kinda helpful even with lots of undocumented code. Given that our tests are located outside of the development tree in its own Github repository some integration can be handled more loosely compared to in-tree tests. With that respect we do not appear as Tier-1 or Tier-2 but results are listed under the Tier-3 level, which is hidden by default. But that’s fine given that our goal was to bring up reports to treeherder first. Going into details will be a follow-up task.

For reporting results to Treeherder the Python package treeherder-client can be used. It’s a collection of different classes which help with authentication, collecting job details, and finally uploading the data to the server. It’s documentation can be found on readthedocs and meanwhile contains a good number of example code which makes it easier to get your code implemented.

Before you can actually upload reports the names and symbols of the groups and jobs have to be defined. For our tests I have chosen the three group symbols “Fu”, “Ff”, and “Fr”. Each of those stays for “(F)irefox UI Tests” and the first letter of the testrun name. We currently have “updates”, “functional”, and “remote”. As job name the locale of Firefox will be used. That results in an output like the following:

Treeherder Results

Whether jobs are passing or failing it is recommended to always add the log files as generated by running the tests to the report. It will not happen via data URLs but as artifacts with a link to an upload location. In our case we make use of Amazon S3 as storage service.

With all the pieces implemented we can now report to treeherder and covering Nightly, Aurora (Developer Edition), Beta, and Release builds. As of now reporting only happens against the staging instance of Treeherder, but soon we will be able to report to production. If you want to have a sneak peak how it works, just follow this link for Nightly builds.

More details about Treeherder reporting I will do later this quarter when the next pieces have been implemented.

Firefox UI Update Tests under RelEng

My second deliverable was to assist Armen Zambrano in getting the Firefox UI Update tests run for beta and release builds on Release Engineering infrastructure. This was a kinda important goal for us given that until now it was a manually triggered process with lots of human errors on our own infrastructure. That means lots of failures if you do not correctly setup the configuration for the tests, and a slower processing of builds due to our limited available infrastructure. So moving this out of our area made total sense.

Given that Armen had already done a fair amount of work when I came back from my PTO, I majorly fixed issues for the tests and the libraries as pointed out by him. All that allowed us to finally run our tests on Release Engineering infrastructure even with a couple of failures at the beginning for the first beta. But those were smaller issues and got fixed quickly. Since then we seem to have good results. If you want to have a look in how that works, you should check the Marionette update tests wiki page.

Sadly some of the requirements haven’t been completely finished yet. So the Quality Engineering team cannot stop running the tests themselves. But that will happen once bug 1182796 has been fixed and deployed to production.

Oh, and if you wonder where the results are located… Those are not getting sent to Treeherder but to an internal mailing list as used for every other automation results.

Other Work

Beside the deliverables I got some more work done. Mainly for the firefox-ui-tests and mozmill-ci.

While the test coverage has not really been increased, I had a couple of regressions to fix as caused by changes in Firefox. But we also landed some new features thankfully as contributed by community members. Once all that was done and we agreed to have kinda stable tests, new branches have been created in the repository. That was necessary to add support for each supported version of Firefox down to ESR 38.0, and to be able to run the tests in our Mozmill CI via Jenkins. More about that you will find below. The only task I still haven’t had time for yet was the creation of proper documentation about our tests. I hope that I will find the time in Q3.

Mozmill CI got the most changes in Q2 compared to all the former quarters. This is related to the transition from Mozmill tests to Marionette tests. More details why we got rid of Mozmill tests can be found in this post. With that we decided to get rid of most of the tests and mainly start from scratch by only porting the security and update tests over to Marionette. The complete replacement in Mozmill and all its jobs can be seen on issue 576 on Github. In detail we have the following major changes:

  • Run all jobs with Marionette beside Firefox ESR 31.0 which is not supported by Marionette, and ondemand_update jobs because they still have to be run by Quality Engineering.
  • Reduced number of platforms. We got rid of Windows Vista, Ubuntu 14.10, and OS X 10.7 whereby the latter machines have been re-used for OS X 10.10.
  • No usage of a pre-configured environments anymore, but creating it from fresh for each test-run by installing Python packages from the internal PyPI mirror.
  • Sending test results to treeherder and giving public access for everyone.
  • Stopped sending emails for failures to our mozmill-ci mailing list in favor of having treeherder results.

All changes in Mozmill CI can be seen on Github.

Last but not least we also had two releases of mozdownload in Q2. Both had a good amount of features included. For details you can check the changelog.

I hope that gave you a good quick read on the stuff I was working on last quarter. Maybe in Q3 I will find the time to blog more often and in more detail. Lets see.

Firefox Automation report – week 47/48 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 47 and 48.

Highlights

Most of the work during those two weeks made by myself were related to get [Jenkins](http://jenkins-ci.org/ upgraded on our Mozmill CI systems to the most recent LTS version 1.580.1. This was a somewhat critical task given the huge number of issue as mentioned in my last Firefox Automation report. On November 17th we were finally able to get all the code changes landed on our production machine after testing it for a couple of days on staging.

The upgrade was not that easy given that lots of code had to be touched, and the new LTS release still showed some weird behavior when connecting slave nodes via JLNP. As result we had to stop using this connection method in favor of the plain java command. This change was actually not that bad because it’s better to automate and doesn’t bring up the connection warning dialog.

Surprisingly the huge HTTP session usage as reported by the Monitoring plugin was a problem introduced by this plugin itself. So a simple upgrade to the latest plugin version solved this problem, and we will no longer get an additional HTTP connection whenever a slave node connects and which never was released. Once we had a total freeze of the machine because of that.

Another helpful improvement in Jenkins was the fix for a JUnit plugin bug, which caused concurrent builds to hang, until the former build in the queue has been finished. This added a large pile of waiting time to our Mozmill test jobs, which was very annoying for QA’s release testing work – especially for the update tests. Since this upgrade the problem is gone and we can process builds a lot faster.

Beside the upgrade work, I also noticed that one of the Jenkins plugins in use, it’s actually the XShell plugin, failed to correctly kill the running application on the slave machine in case of an job is getting aborted. The result of that is that following tests will fail on that machine until the not killed job has been finished. I filed a Jenkins bug and did a temporary backout of the offending change in that plugin.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 47 and week 48.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 47 and week 48.

Firefox Automation report – week 45/46 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 45 and 46.

Highlights

In our Mozmill-CI environment we had a couple of frozen Windows machines, which were running with 100% CPU load and 0MB of memory used. Those values came from the vSphere client, and didn’t give us that much information. Henrik checked the affected machines after a reboot, and none of them had any suspicious entries in the event viewer either. But he noticed that most of our VMs were running a very outdated version of the VMware tools. So he upgraded all of them, and activated the automatic install during a reboot. Since then the problem is gone. If you see something similar for your virtual machines, make sure to check that used version!

Further work has been done for Mozmill CI. So were finally able to get rid of all the traces for Firefox 24.0ESR since it is no longer supported. Further we also setup our new Ubuntu 14.04 (LTS) machines in staging and production, which will soon replace the old Ubuntu 12.04 (LTS) machines. A list of the changes can be found here.

Beside all that Henrik has started to work on the next Jenkins v1.580.1 (LTS) version bump for the new and more stable release of Jenkins. Lots of work might be necessary here.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 45 and week 46.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 45 and week 46.

Firefox Automation report – week 43/44 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 43 and 44.

Highlights

In preparation for the QA-wide demonstration of Mozmill-CI, Henrik reorganized our documentation to allow everyone a simple local setup of the tool. Along that we did the remaining deployment of latest code to our production instance.

Henrik also worked on the upgrade of Jenkins to latest LTS version 1.565.3, and we were able to push this upgrade to our staging instance for observation. Further he got the Pulse Guardian support implemented.

Mozmill 2.0.9 and Mozmill-Automation 2.0.9 have been released, and if you are curious what is included you want to check this post.

One of our major goals over the next 2 quarters is to replace Mozmill as test framework for our functional tests for Firefox with Marionette. Together with the A-Team Henrik got started on the initial work, which is currently covered in the firefox-greenlight-tests repository. More to come later…

Beside all that work we have to say good bye to one of our SoftVision team members.October the 29th was the last day for Daniel on the project. So thank’s for all your work!

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 43 and week 44.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 43 and week 44.

Firefox Automation report – week 25/26 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 25 and 26.

Highlights

June the 11th was actually the last Automation Training day for our team in Q3. About the results you can read here. We will implement some changes for the next quarter, when we most likely want to host 2 of them.

Henrik finally got the time to upgrade our Mozmill-CI systems to the lastest LTS version of Jenkins. There were a bit of changes necessary but in general all went fine this time, and we can see some great improvements. Especially the long delays when sending out job results seem to be gone.

Further Henrik investigated the slow behavior with the mozmill-ci production master, when it is under load, e.g. QA runs ondemand update tests for releases of Firefox. The main problem stays with Java, which is taking up about 100% of the CPU. Because of this the integrated web server cannot serve pages in a timely manner. Adding a 2nd CPU to this node gave us way better response times.

Given that the new version of Ubuntu came out already in April, we want to have our Mozmill tests also run on that platform version. So we got new VM spun-up by IT, which we now have to puppetize and bring online. But this may still take a bit, given the remaining blockers for using PuppetAgain.

While talking about Puppet we got the next big change reviewed and landed. With bug 1021230 we now have our own user account, which can be customized to our needs. And that’s what we totally need, given that our infrastructure is so different from the Releng one.

Also for TPS we made progress, so the new TPS-CI production machine came online. Yet it cannot replace the current CI due to still a fair amount of open blockers, but hopefully by end of July we should be able to turn the switch.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 25 and week 26.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 25 and week 26.

Firefox Automation report – week 23/24 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 23 and 24.

Highlights

To continue the training for Mozilla related test frameworks, we had the 3rd automation training day on June 4th. This time lesser people attended, but we were still able to get a couple of tasks done on oneanddone.

Something which bothered us already for a while, is that for our mozmill-tests repository no push_printurl hook was setup. As result the landed changeset URL gets not printed to the console during its landing. Henrik fixed that on bug 1010563 now, which allows an easier copy&paste of the link to our bugs.

Our team started to work on the new continuous integration system for TPS tests. To be able to manage all the upcoming work ourselves, Henrik asked Jonathan Griffin to move the Coversheet repository from his own account to the Mozilla account. That was promptly done.

In week 24 specifically on June 11th we had our last automation training day for quarter 2 in 2014. Given the low attendance from people we might have to do some changes for future training days. One change might be to have the training on another day of the week. Andreea probably will post updates on that soon.

Henrik was also working on getting some big updates out for Mozmill-CI. One of the most important blockers for us was the upgrade of Jenkins to the latest LTS release. With that a couple of issues got fixed, including the long delays in sending out emails for failed jobs. For more details see the full list of changes.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 23 and week 24.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 23 and week 24.

Firefox Automation report – week 21/22 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 21 and 22.

Highlights

To assist everyone from our community to learn more about test automation at Mozilla, we targeted 4 full-day automation training days from mid of May to mid of June. The first training day was planned for May 21rd and went well. Lots of [people were present and actively learning more about automation[https://quality.mozilla.org/2014/05/automation-training-day-may-21st-results/). Especially about testing with Mozmill.

To support community members to get in touch with Mozmill testing a bit easier, we also created a set of one-and-done tasks. Those start from easy tasks like running Mozmill via the Mozmill Crowd extension, and end with creating the first simple Mozmill test.

Something, which hit us by surprise was that with the release of Firefox 30.0b3 we no longer run any automatically triggered Mozmill jobs in our CI. It took a bit of investigation but finally Henrik found out that the problem has been introduced by RelEng when they renamed the product from ”’firefox”’ to ”’Firefox”’. A quick workaround fixed it temporarily, but for a long term stable solution we might need a frozen API for build notifications via Mozilla Pulse.

One of our goals in Q2 2014 is also to get our machines under the control of PuppetAgain. So Henrik started to investigate the first steps, and setup the base manifests as needed for our nodes and the appropriate administrative accounts.

The second automation training day was also planned by Andreea and took place on May 28th. Again, a couple of people were present, and given the feedback on one-and-done tasks, we fine-tuned them.

Last but not least Henrik setup the new Firefox-Automation-Contributor team, which finally allows us now to assign people to specific issues. That was necessary because Github doesn’t let you do that for anyone, but only known people.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 21 and week 22.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 21 and week 22.

Firefox Automation report – week 19/20 2014

In this post you can find an overview about the work happened in the Firefox Automation team during week 19 and 20.

Highlights

When we noticed that our Mozmill-CI production instance is quickly filling up the /data partition, and having nearly no space left to actually run Jenkins, Henrik did a quick check, and has seen that the problem were the update jobs. Instead of producing log files with about 7MB in size, files with more than 100MB each were present. Inspecting those files revealed that the problem were all the SPDY log entries. As a fix Henrik reduced the amount of logging information, so it is still be helpful but it’s not exploding.

In the past days we also have seen a lot of JSBridge disconnects while running our Mozmill tests. Andrei investigated this problem, and it turned out that the reduced delay for add-on installations were the cause of it. Something is most likely messing up with Mozmill and our SOCKS server. Increasing the delay for that dialog fixed the problem for now.

We are using Bugsahoy for a long time now, but we never actually noticed that the Github implementation was somewhat broken when it comes to filtering for languages. To fix that Henrik added all the necessary language mappings. After updating the Github labels for all of our projects, we were seeing a good spike of new contributors interested to work with us.

Individual Updates

For more granular updates of each individual team member please visit our weekly team etherpad for week 19 and week 20.

Meeting Details

If you are interested in further details and discussions you might also want to have a look at the meeting agenda, the video recording, and notes from the Firefox Automation meetings of week 19 and week 20.