Skip to content

Commit

Permalink
Update tools and jobs to publish curated data (#468)
Browse files Browse the repository at this point in the history
The goal of this update is to create a curated data view along with the raw data
view and the npm packages views (see #277).

Curation means applying patches to the raw data and re-generating the
`idlparsed`, `idlnames` and `idlnamesparsed` folders. The latter two will only
contain IDL names targeted at browsers, although note that actual spec filtering
remains a TODO at this stage (see corresponding TODO comments in
`prepare-curated.js` and `prepare-packages.js`).

To create the curated data view, this update introduces new tools:
- a `prepare-curated.js` tool that copies the raw data to the given folder,
applies patches (CSS, elements, IDL) when needed, re-generates the `idlparsed`
folder, re-generates the `idlnames` and `idlnamesparsed` folders and adjusts the
`index.json` and `idlnames.json` files accordingly.
- a `prepare-packages.js` tool (replaces the now gone `packages/prepare.js`)
that copies relevant curated data from the curated folder to the packages
folder.
- a `commit-curated.js` tool that updates the `curated` branch with the contents
of the given curated folder.

Goal is to have the `curated` branch be the one published as GitHub Pages.

The test logic was partially re-written to run the tests against the curated
data, and against both the curated data and the NPM packages data when tests
may yield different results.

A new `curate.yml` job publishes the curated data whenever the crawl data is
updated. The job also takes care or preparing package release PRs as needed,
replacing the previous prepare-xxx-release jobs.

The release workflow becomes:
1. Crawled data is updated (`update-ed.yml`)
2. Curated data and package data get generated (`curate.yml`)
3. Curated data and package data get tested (`curate.yml`)
4. The `curated` branch gets updated with the curated data (`curate.yml`)
5. Npm package pre-release PR gets created (`curate.yml`)
6. Someone reviews and merges the PR
7. New version of npm packages are released (`release-package.yml`)
8. A `Raw data for @webref/ttt@vx.y.z` tag gets added to the relevant commit on
the `main` branch.
9. A `@webref/ttt@vx.y.z` tag gets added to the relevant commit on the `curated`
branch.
10. The `@webref/ttt@latest` tag gets updated to point to the relevant commit on
the `curated` branch.

Note that, in order for a release to be created, curated data needs to have
changed.
  • Loading branch information
tidoust committed Jan 31, 2022
1 parent f352baf commit edb3390
Show file tree
Hide file tree
Showing 27 changed files with 2,412 additions and 564 deletions.
66 changes: 66 additions & 0 deletions .github/workflows/curate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: Curate data & Prepare package PRs

on:
# Runs whenever the `ed` data gets updated on the default branch.
# Crawl pushes to default branch but that push won't trigger the workflow
# (a workflow cannot be triggered by a push made by another workflow). Hence
# the need to react on "workflow_run".
workflow_run:
workflows:
- "Update ED report"
types:
- completed
push:
branches:
- main
paths:
- 'ed/**'
- 'packages/**'
- 'tools/**'
workflow_dispatch:

jobs:
prepare:
runs-on: ubuntu-latest
steps:
- name: Setup node.js
uses: actions/setup-node@v2
with:
node-version: '16'

- name: Checkout webref
uses: actions/checkout@v2

- name: Prepare curated and packages data
# Note that "ci" runs the "prepare" script
run: npm ci

- name: Test curated and packages data
run: npm run test

- name: Commit curated data
run: node tools/commit-curated.js stay-on-curated-branch

- name: Push changes to curated branch
uses: ad-m/github-push-action@v0.6.0
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: curated

- name: Get back to main branch
run: git checkout main

- name: Prepare a pre-release PR for the @webref/css package if needed
run: node tools/prepare-release.js css
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Prepare a pre-release PR for the @webref/elements package if needed
run: node tools/prepare-release.js elements
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Prepare a pre-release PR for the @webref/idl package if needed
run: node tools/prepare-release.js idl
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
47 changes: 0 additions & 47 deletions .github/workflows/prepare-css-release.yml

This file was deleted.

47 changes: 0 additions & 47 deletions .github/workflows/prepare-elements-release.yml

This file was deleted.

47 changes: 0 additions & 47 deletions .github/workflows/prepare-idl-release.yml

This file was deleted.

3 changes: 0 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
name: Test
on:
pull_request: {}
push:
branches:
- main
jobs:
test:
name: Test
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ packages/elements/*.json
!packages/elements/package.json
packages/idl/*.idl
wpt/
curated
47 changes: 36 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,17 @@ This repository contains **machine-readable references of CSS properties, defini

Specifications covered by this repository are technical Web specifications that are directly implemented or that will be implemented by Web browsers; in other words, those that appear in [browser-specs](https://github.com/w3c/browser-specs).

This repository contains **raw** and **automatically-generated** extracts from web browser specifications. These extracts come with no guarantee on validity or consistency. For instance, if a specification defines invalid IDL snippets or uses an unknown IDL type, the corresponding IDL extract in this repository will be invalid as well.
The `main` branch of this repository contains **automatically-generated raw extracts** from web browser specifications. These extracts come with no guarantee on validity or consistency. For instance, if a specification defines invalid IDL snippets or uses an unknown IDL type, the corresponding IDL extract in this repository will be invalid as well.

Curated subsets of the repository content are published as NPM [packages](https://github.com/w3c/webref/tree/main/packages), updated on a weekly basis when the underlying content has changed:
The `curated` branch contains **curated extracts**. Curated extracts are generated from raw extracts in the [ed](ed) folder by applying manually-maintained patches to fix invalid content and provide [validity and consistency guarantees](#curation-guarantees). The `curated` branch is updated automatically whenever the `main` branch is updated, unless patches need to be modified (which requires manual intervention). Curated extracts are published under https://w3c.github.io/webref/ed/.

Additionally, subsets of the curated content get manually reviewed and published as **NPM [packages](https://github.com/w3c/webref/tree/main/packages)** on a weekly basis:
- [@webref/idl](https://www.npmjs.com/package/@webref/idl) contains a [curated](packages/idl#guarantees) version of the [ed/idl](ed/idl) folder.
- [@webref/css](https://www.npmjs.com/package/@webref/css) contains a [curated](packages/css#guarantees) version of the [ed/css](ed/css) folder.
- [@webref/elements](https://www.npmjs.com/package/@webref/elements) contains a [curated](packages/elements#guarantees) version of the [ed/elements](ed/elements) folder.

The NPM packages provide additional validity and consistency guarantees. Unless you are ready to deal with invalid content, we recommend that you rely on the content of these NPM packages instead of on the non-curated content in this repository.
**Important:** Unless you are ready to deal with invalid content, we recommend that you rely on the content of the `curated` branch or NPM packages instead of on the raw content in the `main` branch of this repository.


## Available extracts

Expand All @@ -27,6 +30,7 @@ The following subfolders contain individual machine-readable JSON or text files
- [ed/headings](ed/headings) and [tr/headings](tr/headings): Section headings. One file per specification.
- [ed/idl](ed/idl) and [tr/idl](tr/idl): Raw WebIDL index. One file per specification [series](https://github.com/w3c/browser-specs/#series).
- [ed/idlparsed](ed/idlparsed) and [tr/idlparsed](tr/idlparsed): Parsed WebIDL. One file per specification.
- [ed/ids](ed/ids) and [tr/ids](tr/ids): Fragments defined in the specification. One file per specification.
- [ed/links](ed/links) and [tr/links](tr/links): Links to other documents, along with targeted fragments. One file per specification.
- [ed/refs](ed/refs) and [tr/refs](tr/refs): Normative and informative references to other specifications. One file per specification.

Expand All @@ -39,6 +43,29 @@ This repository uses [Reffy](https://github.com/w3c/reffy), a Web spec explorati
Raw WebIDL extracts are used in [web-platform-tests](https://github.com/web-platform-tests/wpt), please see their [interfaces/README.md](https://github.com/web-platform-tests/wpt/blob/master/interfaces/README.md) for details.


## Curation guarantees

Data curation brings the following guarantees.

### Web IDL extracts

- All IDL files can be parsed by the version of [webidl2.js](https://github.com/w3c/webidl2.js/) referenced in `package.json`.
- `WebIDL2.validate` passes with the exception of the "no-nointerfaceobject" rule about `[LegacyNoInterfaceObject]`, which is in wide use.
- All types are defined by some specification.
- All extended attributes are defined by some specification.
- No duplicate top-level definitions or members.
- No missing or mismatched types in inheritance chains.
- No conflicts when applying mixins and partials.

### CSS extracts

- All CSS files can be parsed by the version of [CSSTree](https://github.com/csstree/csstree) referenced in `package.json`, with the exception of a handful CSS value definitions that, although valid, are not yet supported by CSSTree.

### Elements extracts

- All Web IDL interfaces referenced by elements exist in Web IDL extracts.


## Potential spec anomalies

This repository used to contain analyses of potential spec anomalies, such as missing references and invalid Web IDL definitions. These analyses are now published in the companion [w3c/webref-analysis](https://github.com/w3c/webref-analysis) repository.
Expand All @@ -60,14 +87,12 @@ GitHub Actions workflows are used to automate most of the tasks in this repo.

- [Update ED report](https://github.com/w3c/webref/actions/workflows/update-ed.yml) - crawls the latest version of Editor's Drafts and updates the contents of the [`ed`](ed) folder. Workflow runs every 6 hours. A typical crawl takes about **10mn** to complete.
- [Update TR report](https://github.com/w3c/webref/actions/workflows/update-tr.yml) - crawls the published version of Editor's Drafts and updates the contents of the [`tr`](tr) folder. Workflow runs once per week on Monday. A typical crawl takes about **10mn** to complete.
- [Test](https://github.com/w3c/webref/actions/workflows/test.yml): tests the contents of the repo. Runs each time there is a push against the default branch.
- [Clean up abandoned files](https://github.com/w3c/webref/actions/workflows/cleanup.yml) - Checks the contents of repository to detect orphan crawl files that are no longer targeted by the latest crawl's result and creates a PR to delete these files from the repository. Runs once per week on Wednesday. The crawl workflows does not delete these files automatically because crawl sometimes fails on a spec due to transient network or spec errors.

- [Curate data & Prepare package PRs](https://github.com/w3c/webref/actions/workflows/curate.yml) - runs whenever crawled data gets updated and updates the `curated` branch accordingly (provided all tests pass). The job also creates pull requests to release new versions of NPM packages when needed. Each pull request details the diff that would be released, and bumps the package version in the relevant `packages/xxx/package.json` file.
- [Clean up abandoned files](https://github.com/w3c/webref/actions/workflows/cleanup.yml) - checks the contents of repository to detect orphan crawl files that are no longer targeted by the latest crawl's result and creates a PR to delete these files from the repository. Runs once per week on Wednesday. The crawl workflows does not delete these files automatically because crawl sometimes fails on a spec due to transient network or spec errors.
- [Test](https://github.com/w3c/webref/actions/workflows/test.yml) - runs tests on pull requests.
- [Clean patches when issues/PR are closed](https://github.com/w3c/webref/actions/workflows/clean-patches.yml) - drops patches that no longer need to apply because underlying issues got fixed. Runs once per week.

### Releases to NPM

- [`@webref/css`: Prepare release PR if needed](https://github.com/w3c/webref/actions/workflows/prepare-css-release.yml) - Checks latest CSS extracts and create a pre-release PR if a new version of the `@webref/css` npm package should be released. Runs after each crawl and whenever a push is made to the default branch on CSS files (except when this push is on the `packages` folder to avoid re-entrance issues). The pre-release PR details the diff that would be released, and bumps the package version in [`packages/css/package.json`](packages/css/package.json).
- [`@webref/elements`: Prepare release PR if needed](https://github.com/w3c/webref/actions/workflows/prepare-elements-release.yml) - Checks latest elements extracts and create a pre-release PR if a new version of the `@webref/elements` npm package should be released. Runs after each crawl and whenever a push is made to the default branch on elements files (except when this push is on the `packages` folder to avoid re-entrance issues). The pre-release PR details the diff that would be released, and bumps the package version in [`packages/elements/package.json`](packages/elements/package.json).
- [`@webref/idl`: Prepare release PR if needed](https://github.com/w3c/webref/actions/workflows/prepare-idl-release.yml) - Checks latest IDL extracts and create a pre-release PR if a new version of the `@webref/idl` npm package should be released. Runs after each crawl and whenever a push is made to the default branch on IDL files (except when this push is on the `packages` folder to avoid re-entrance issues). The pre-release PR details the diff that would be released, and bumps the package version in [`packages/idl/package.json`](packages/idl/package.json).
- [`@webref` release: Request review of pre-release PR](https://github.com/w3c/webref/actions/workflows/request-pr-review.yml) - Assigns reviewers to pre-release CSS/IDL PRs if they exist. Runs once per week on Thursday.
- [Publish `@webref` package if needed](https://github.com/w3c/webref/actions/workflows/release-package.yml) - Publishes a new version of the `@webref/css`, `@webref/elements` or `@webref/idl` package to npm and tags the corresponding commit on the default branch. Runs whenever a pre-release PR is merged. Note that the released version is the version that appeared in `packages/css/package.json`, `packages/elements/package.json` or `packages/idl/package.json` **before** the pre-release PR is merged.
- [Publish @webref package if needed](https://github.com/w3c/webref/actions/workflows/release-package.yml) - publishes a new version of the `@webref/css`, `@webref/elements` or `@webref/idl` package to NPM, tags the corresponding commits on the `main` and `curated` branches, and updates the relevant `@webref/xxx@latest` tag to point to the right commit on the `curated` branch. Runs whenever a pre-release PR is merged. Note that the released version is the version that appeared in `packages/css/package.json`, `packages/elements/package.json` or `packages/idl/package.json` **before** the pre-release PR is merged.
- [@webref release: Request review of pre-release PR] - assigns reviewers to NPM package pull requests. Runs once per week.
Loading

0 comments on commit edb3390

Please sign in to comment.