Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check for Jenkins results. #13533

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/bash_unit_testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ defaults:
jobs:
Test-gha-functions:
name: Tests in ci/gha_functions.sh
if: github.repository == 'daos-stack/daos'
runs-on: [self-hosted, light]
steps:
- name: Checkout code
Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/jenkins-status.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: Jenkins status report

on:
pull_request:

jobs:
# Check and report Jenkins test results. Should use the check_suite trigger when stable, and
# test the PR that triggered it obviously.
jenkins_check:
name: Check Jenkins results
if: github.repository == 'daos-stack/daos'
runs-on: [self-hosted, light]
steps:
- uses: actions/checkout@v4
- name: Run check
run: ./ci/jenkins_status.py --pr ${{ github.event.pull_request.number }}
13 changes: 9 additions & 4 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ jobs:
# This checks .py files only so misses SConstruct and SConscript files are not checked, rather
# for these files check them afterwards. The output-filter will not be installed for this part
# so regressions will be detected but not annotated.

# In order for new checks to work when landed but when testing PRs which are not rebased then
# it is necessairy to check for the existence of scripts and files before calling them.
# This is because checks which annotate code have to check out the PR code rather than a merge
# commit so the line numbers are correct for reporting. Any checks which simply return failure
# can checkout the (default) merge commit and therefore do not need this protection.
# run: \[ ! -x ci/run_shellcheck.sh \] || ./ci/run_shellcheck.sh

isort:
name: Python isort
runs-on: ubuntu-latest
Expand Down Expand Up @@ -43,10 +51,7 @@ jobs:
- name: Add error parser
run: echo -n "::add-matcher::ci/shellcheck-matcher.json"
- name: Run Shellcheck
# The check will run with this file from the target branch but the code from the PR so
# test for this file before calling it to prevent failures on PRs where this check is
# in the target branch but the PR is not updated to include it.
run: \[ ! -x ci/run_shellcheck.sh \] || ./ci/run_shellcheck.sh
run: ./ci/run_shellcheck.sh

log-check:
name: Logging macro checking
Expand Down
212 changes: 212 additions & 0 deletions ci/jenkins_status.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
#!/usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably okay, but some distros point python -> python2 so personally I tend to use python3 here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we're beyond that now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, but some older (still supported) distros still use python2 by default last I checked. Also, some distros don't symlink python at all. E.g.

$ cat /etc/os-release
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL="https://documentation.suse.com/"

$ which python
which: no python in ...

$ which python3
/usr/bin/python3


"""Parse Jenkins build test results"""

import argparse
import json
import sys
import urllib
from urllib.request import urlopen

JENKINS_HOME = "https://build.hpdd.intel.com/job/daos-stack"


class TestResult:
"""Represents a single Jenkins test result"""

def __init__(self, data, blocks):
name = data["name"]
if "-" in name:
try:
(num, full) = name.split("-", 1)
test_num = int(num)
if full[-5] == "-":
full = full[:-5]
name = f"{full} ({test_num})"
Comment on lines +18 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the original data["name"] look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the weird 8 digits of sha hash at the end of the PR name which is making matching fail.

except ValueError:
pass
self.name = name
self.cname = data["className"]
self.skipped = False
self.passed = False
self.failed = False
assert data["status"] in ("PASSED", "FIXED", "SKIPPED", "FAILED", "REGRESSION")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to gracefully handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an internal script I'm OK with this, if there is another status then we want to know about it so we handle it properly. As we're scraping Jenkins then we can stress-test this code against previous builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with assert is that the entire script will fail without much debug info.

if data["status"] in ("PASSED", "FIXED"):
self.passed = True
elif data["status"] in ("FAILED", "REGRESSION"):
self.failed = True
elif data["status"] == "SKIPPED":
self.skipped = True
self.data = data
self.blocks = blocks

def info(self, prefix=""):
"""Return a string describing the test"""
return f"{prefix}{self.cname}\t\t{self.name}"

def full_info(self):
"""Return a longer string describing the test"""

tcl = []
if self.blocks is not None:
tcl.extend(reversed(self.blocks))

tcl.append(f"{self.cname}.{self.name}")
details = self.data["errorDetails"]
if details:
return " / ".join(tcl) + "\n" + details.replace("\\n", "\n")
return self.info()

# Needed for set operations to compare results across sets.
def __eq__(self, other):
return self.name == other.name and self.cname == other.cname

# Needed to be able to add results to sets.
def __hash__(self):
return hash((self.name, self.cname))

def __str__(self):
return self.name

def __repr__(self):
return f"Test result of {self.cname}"


def je_load(job_name, jid=None, what=None, tree=None):
"""Fetch something from Jenkins and return as native type."""

url = f"{JENKINS_HOME}/job/daos/job/{job_name}"

if jid:
url += f"/{jid}"
if what:
url += f"/{what}"

url += "/api/json"

if tree:
url += f"?tree={tree}"

with urlopen(url) as f: # nosec
return json.load(f)


def show_job(job_name, jid):
"""Parse one job

Return a list of failed test objects"""

if not job_name.startswith("PR-"):
jdata = je_load(job_name, jid=jid, tree="actions[causes]")
if (
"causes" not in jdata["actions"][0]
or jdata["actions"][0]["causes"][0]["_class"]
!= "hudson.triggers.TimerTrigger$TimerTriggerCause"
):
return None

try:
jdata = je_load(job_name, jid=jid, what="testReport")
except urllib.error.HTTPError:
print(f"Job {jid} of {job_name} has no test results")
return None

print(f"Checking job {jid} of {job_name}")

failed = []

assert not jdata["testActions"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graceful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could just remove this, it was me trying to understand the output.

for suite in jdata["suites"]:
for k in suite["cases"]:
tr = TestResult(k, suite["enclosingBlockNames"])
if not tr.failed:
continue
failed.append(tr)
return failed


def test_against_job(all_failed, job_name, count):
"""Check for failures in existing test runs

Takes set of failed tests, returns set of unexplained tests
"""
data = je_load(job_name)
lcb = data["lastCompletedBuild"]["number"]
main_failed = set()
ccount = 0
for build in data["builds"]:
jid = build["number"]
if jid > lcb:
print(f"Job {jid} is of {job_name} is still running, skipping")
failed = show_job(job_name, jid)
if not isinstance(failed, list):
continue
for test in failed:
main_failed.add(test)
ccount += 1
if count == ccount:
break

unexplained = all_failed.difference(main_failed)
if not unexplained:
print(f"Stopping checking at {ccount} builds, all failures explained")
break

ignore = all_failed.intersection(main_failed)
if ignore:
print(f"Tests which failed in the PR and have also failed in {job_name} builds.")
for test in ignore:
print(test.full_info())

return all_failed.difference(main_failed)


def main():
"""Check the results of a PR"""

parser = argparse.ArgumentParser(description="Check Jenkins test results")
parser.add_argument("--pr", type=int, required=True)

args = parser.parse_args()

job_name = f"PR-{args.pr}"

data = je_load(job_name)

lcb = data["lastCompletedBuild"]["number"]

all_failed = set()
for build in data["builds"]:
jid = build["number"]
if jid > lcb:
print(f"Job {jid} is of {job_name} is still running, skipping")
continue
failed = show_job(job_name, jid)
if not isinstance(failed, list):
continue
for test in failed:
all_failed.add(test)
Comment on lines +207 to +208
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI you can add a list to another list like this:

>>> list1 = [1,2,3]
>>> list2 = [4,5,6]
>>> list1.extend(list2)
>>> list1
[1, 2, 3, 4, 5, 6]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_failed is a set, not list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then JFYI

>>> set1 = set([1,2,3])
>>> list2 = [4,5,6]
>>> set1.update(list2)
>>> set1
{1, 2, 3, 4, 5, 6}

break
if not all_failed:
print("No failed tests in PR, returning")
return

print(f"PR had failed {len(all_failed)} tests, checking against landings builds")

all_failed = test_against_job(all_failed, "daily-testing", 14)

if all_failed:
all_failed = test_against_job(all_failed, "weekly-testing", 4)

if all_failed:
all_failed = test_against_job(all_failed, "master", 14)

if all_failed:
print("Tests which only failed in the PR")
for test in all_failed:
print(test.full_info())
sys.exit(1)


if __name__ == "__main__":
main()