Page MenuHomePhabricator

Selenium and PHPUnit: Stop execution on failure
Closed, DeclinedPublic

Description

Follow up from T225248, this is a proposal to use PHPUnit's --stop-on-failure and Mocha's [bail](https://mochajs.org/api/mocha) configuration option to end PHPUnit and Selenium test runs early if there's a failure.

The advantages are:

  • reduced CI resources as we don't need to wait for a 20-30 minute build if we know it's going to fail anyway
  • a little less impact of flaky tests as "recheck" can be started sooner

Event Timeline

Change 517188 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[integration/quibble@master] Stop PHPUnit execution on failure

https://gerrit.wikimedia.org/r/517188

Change 517189 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/core@master] Stop Selenium execution after first test failure

https://gerrit.wikimedia.org/r/517189

Change 517191 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/core@master] Stop PHPUnit execution after first failure

https://gerrit.wikimedia.org/r/517191

Here's an example of Selenium failing early (note I think we'd need to update wdio.conf.js for extensions too?)

Main test build failed.

   wmf-quibble-core-vendor-mysql-hhvm-docker https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/17066/console : FAILURE in 6m 33s
   mwgate-node10-docker https://integration.wikimedia.org/ci/job/mwgate-node10-docker/4436/console : SUCCESS in 53s
   mediawiki-quibble-composer-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composer-mysql-php70-docker/20833/console : FAILURE in 3m 24s
   mediawiki-quibble-vendor-mysql-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php70-docker/21309/console : FAILURE in 3m 20s
   mediawiki-quibble-composertest-php70-docker https://integration.wikimedia.org/ci/job/mediawiki-quibble-composertest-php70-docker/20782/console : SUCCESS in 1m 54s
   mediawiki-core-jsduck-docker https://integration.wikimedia.org/ci/job/mediawiki-core-jsduck-docker/11036/console : SUCCESS in 18s
   mediawiki-core-php70-phan-docker https://integration.wikimedia.org/ci/job/mediawiki-core-php70-phan-docker/31749/console : SUCCESS in 1m 31s

I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.

I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.

For this use case, what would you think about a "check all" command which forces a full run of all suites, gated extensions, browser tests, etc., not stopping on failure? We have to optimize for one or the other case, it seems.

Another variation would be that we drop the running job's priority after the first failure, but still let it continue to run. That might be difficult technically and not save us many resources, however.

I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI).

I don’t doubt that this could be a problem and that it would be really annoying, but I’m wondering if you could share a patch or two with this kind of situation so we could look at a specific example?

And yeah I like the idea of a “check all” command to bypass the stop on failure.

Relatedly, IMO the stop on failure work would be much more worthwhile if a failure in one job in the test pipeline could immediately halt all the other jobs too.

From the point of view that has been commited to the patchs, this will limit the ability to run all tests in my development environment. Is it only possible to change the WMF CI configuration to do this?

I would actually object to this: imagine your change has caused multiple test failures that you weren't able to predict in your dev environment (because you didn't have all extensions installed or your environment is otherwise different from our CI). You'll have to amend your PR with one fix at a time and push it just to see what explodes next.

Yeah, this would make fixing multiple issues very annoying and time-consuming, and would make the life of new developers especially hard (so far we didn't force them set up the tests locally just to have a non-aggravating experience). I don't think it's worth it.

Ideally, the tests would report failure as soon as they find it but continue to run to the end. I doubt that's realistic with Jenkins though. (This post describes a possible way to do it via the cacheResult option, but it would require a much more recent PHPUnit version, and even so it seems very hacky.)

One thing that might be worthwhile is starting with "local" test (core tests in case of core, the extensions' own tests in case of an extension) and only running tests belonging to other repos if the local tests succeed. Since it's almost always the local tests that fail, that would speed things up a lot without forcing too many re-runs on the developer.

Change 517191 abandoned by Kosta Harlan:
Stop PHPUnit execution after first failure

Reason:
Let's handle this in Quibble instead.

https://gerrit.wikimedia.org/r/517191

Change 517189 abandoned by Kosta Harlan:
Stop Selenium execution after first test failure

Reason:
As Awight noted, should be done in Quibble not here.

https://gerrit.wikimedia.org/r/517189

Change 517188 abandoned by Kosta Harlan:
Stop PHPUnit execution on failure

Reason:
We would need consensus on this before pursuing it further.

https://gerrit.wikimedia.org/r/517188

I think with regards to how Quibble executes the various phpunit and wdio tasks, I think we're in a fairly good balance between "all" and "until first failure". Each of these steps should only take a few minutes at most, so it's not holding up results very much to let that sub task finish.

We already have Quibble abort the job as soon as one finished sub task has failures. Eg. it won't start wdio tests if phpunit--unit found errors. This means the job is already terminating earlier e.g. within 2-3 minutes instead of 5-10 minutes in case of such failure. I don't think it's that important to get results even sooner than this given the downside (mentioned by others) of then having a very partial result to work with and potentially having to bounce back and forth many times with CI to get to the bottom of it. The current compromise means you'll have to bounce at most 3 times (lint, unit, integration - more or less), which seems fairly good.

There is however a different reason for why we currently have to wait 20 minutes to get a known-failed result back to the user in Gerrit. It's Zuul. Zuul is currently configured to wait for all jobs in the pipeline before reporting back. It has no early abort. This means on patch for mediawik where we run 3 quibble jobs (vendor+plain, vendor+extensions, composer), if one has failed on the lining stage and the job was terminated, it still waits for the others to finish. Not only that, if CI is short on resources and the other haven't even started yet, and may end up waiting 20+ minutes for them to start, finish, and then report back.

Thoughts? Is the current compromise acceptable? Do we want Zuul to abort peer jobs if one has failed?

Do we want Zuul to abort peer jobs if one has failed?

That’s what I would like to see, yes. See also T225730#5490128 and the follow up comment.

Instead of patching Zuul for the abort-peer-jobs-if-one-failed functionality (see @hashar's note in T225730#5490156), maybe we could do something with build triggers -- define a linter job for core & extensions that does phpcs and eslint, then the success of that job triggers the full ~20 minute set of jobs.

Krinkle closed this task as Declined.EditedMar 25 2020, 11:30 PM

I'm declining this per the above, as it would imho negatively affect developer productivity to only get the first failure within a certain stage. As an extreme example, imagine if PHPCS only informed you about the first style violation in a file. That's not great.

About getting the feedback to the user sooner, we definitely need to do better here but the problem imho is not that the job needs to be marked as failure earlier. We are already doing that. The problem is that we are running multiple jobs and wait for all of them to complete.

I've filed T248531 to further explore that.