Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Add cron job to update IGC dev driver only #13412

Closed
wants to merge 1 commit into from
Closed

Conversation

jsji
Copy link
Contributor

@jsji jsji commented Apr 15, 2024

We have weekly cron job to update the IGC drivers.
However, the igc dev driver use CI driver from intel-graphic-compiler repo, the CI artifacts will be deleted if the build is 1 week or older, so we need to update the igc dev driver more frequently, eg: at least twice a week to make sure that the driver we use are new enough, or else we will fail to download the artifacts when doing install_driver test or updating the devigc docker contains.

This is to add cron jobs to update igc dev driver only.

Tested: https://github.com/intel/llvm/actions/runs/8695012724

@jsji jsji self-assigned this Apr 15, 2024
@jsji jsji marked this pull request as ready for review April 15, 2024 19:46
@jsji jsji requested a review from a team as a code owner April 15, 2024 19:46
@aelovikov-intel
Copy link
Contributor

What is the process if the update isn't "green" in CI? How will we guarantee it will actually happen at least once a week?

Comment on lines +24 to +39
- name: Determine parameters based on the day of the week
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
# Use the input dayofweek when manually triggered
DAY_OF_WEEK="${{ github.event.inputs.dayofweek }}"
else
# Use the current day of the week when triggered by a schedule
DAY_OF_WEEK=$(date +%u)
fi

if [ "$DAY_OF_WEEK" = "2" ]; then
PARAM=""
else
PARAM="--igc-dev-only"
fi
echo "PARAM=$PARAM" >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. I'd rather move common part into a re-usable workflow and then have two separate cron jobs in distinct .yml files. Also, why can't we just run the existing job frequently enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running existing job freq enough is also ok, but considering the situation of "red" CI -- it might not be realistic to push all driver update more than once per week.
But sure, I am more than happy to do so if we want to try updating ALL drivers more than once a week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we do it once a week is simply because released version aren't updated that frequently and we wanted to avoid running unnecessary task. That said, if nothing is there to update, it should be very fast, so I don't see any reasons for us not to do that.

+ @bader maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. NEO driver was released one a week in the past. These days, NEO release cadence is once a month.

@jsji
Copy link
Contributor Author

jsji commented Apr 15, 2024

What is the process if the update isn't "green" in CI? How will we guarantee it will actually happen at least once a week?

That is a good question, I think we will need to provide some way to tolerate such situations -- eg: if we failed to download the old package, we skip the container update instead of failing. What do you think?

@aelovikov-intel
Copy link
Contributor

What do you think?

I don't see any good solution here, sadly. My preference would be for the team(s) that want development builds of IGC to make a commitment to address any issues revealed in the update PR in 24 hours.

Maybe we can modify the PR creation job to automatically add them as reviewers too.

@jsji
Copy link
Contributor Author

jsji commented Apr 15, 2024

Maybe we can modify the PR creation job to automatically add them as reviewers too.

Sure, we can add them as reviewers, but I would say, there is not guarantee that the team can always address the issues in time. But if that is what we can do the best for now, I can update the job to add reviewers.

@aelovikov-intel
Copy link
Contributor

But if that is what we can do the best for now, I can update the job to add reviewers.

I think we need first to get that commitment from them. If they won't commit, then we cannot make these changes in the CI at all until we find some other solution for the issue.

@jsji
Copy link
Contributor Author

jsji commented Apr 15, 2024

But if that is what we can do the best for now, I can update the job to add reviewers.

I think we need first to get that commitment from them. If they won't commit, then we cannot make these changes in the CI at all until we find some other solution for the issue.

@intel/sycl-matrix-reviewers Can you please comment on the commitment here . Thanks.

@YuriPlyakhin
Copy link
Contributor

But if that is what we can do the best for now, I can update the job to add reviewers.

I think we need first to get that commitment from them. If they won't commit, then we cannot make these changes in the CI at all until we find some other solution for the issue.

@intel/sycl-matrix-reviewers Can you please comment on the commitment here . Thanks.

@aelovikov-intel , @jsji ,
I'm not sure, if sycl matrix reviewers can commit to it.

Instead, I'm suggesting the following:

  1. Keep the last good igc dev driver that we currently use in CI somewhere
  2. Keep the frequency for PRs to update dev igc driver once per week to reduce overhead on CI
  3. If new driver is bad and CI is red, do not merge this PR.
  4. if we fail to download the artifact when doing install_driver, because it was deleted from github, have fallback and use the driver we saved on step 1.

@jsji
Copy link
Contributor Author

jsji commented Apr 17, 2024

Close this in favor of #13441.

@jsji jsji closed this Apr 17, 2024
@jsji jsji deleted the updateigconlycron branch April 23, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants