Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing by threadid() should be avoided #2

Merged
merged 5 commits into from
Aug 6, 2023

Conversation

iuliadmtru
Copy link
Collaborator

@iuliadmtru iuliadmtru commented Jul 6, 2023

Indexing by Threads.threadid() can lead to race conditions, as explained in the blogpost mentioned in this PR. I added a rule to match this case.

threads/index_by_threadid.yml Outdated Show resolved Hide resolved
@aviks
Copy link
Member

aviks commented Jul 6, 2023

Actually, on second thoughts, shouldn't these go in lang/correctness?

@iuliadmtru
Copy link
Collaborator Author

@aviks I moved them.

@KristofferC
Copy link

A common use case for threadid is in code like https://github.com/JuliaLang/julia/blob/d9ad6d2e122ab4efcf8ddc773c1a3a421b64cb25/test/threads_exec.jl#L522-L530 where it is used to write a value from a thread back to a buffer in a thread-safe manner. Is this pattern really problematic?

@aviks
Copy link
Member

aviks commented Jul 7, 2023

So my understanding from that blog post was that with task migration, that code will not do what you want. Am I misunderstanding some nuance, or are you saying that that blog post overstates things?

More generally, I don't think any static analysis rule is fully universal. You need to decide on what works for your use case, and then choose a set of rules to enforce your polices automatically. So in my mind, the benchmark for a rule is "would at least a few people feasibly want to enforce this". None of these are meant to be universally applicable to everybody or every codebase.

@pfitzseb
Copy link
Member

The code Kristoffer linked should always be fine because of the :static preventing task migration. Without that it would (could) still be problematic, because setindex! isn't atomic.

More generally, I don't think any static analysis rule is fully universal. You need to decide on what works for your use case, and then choose a set of rules to enforce your polices automatically. So in my mind, the benchmark for a rule is "would at least a few people feasibly want to enforce this". None of these are meant to be universally applicable to everybody or every codebase.

Is there some way to mark this as a "low confidence check" or something like that? Or can we bump down the severity a notch?

@iuliadmtru
Copy link
Collaborator Author

Is there some way to mark this as a "low confidence check"

Something like this? https://semgrep.dev/docs/contributing/contributing-to-semgrep-rules-repository/#confidence

@aviks
Copy link
Member

aviks commented Jul 11, 2023

The code Kristoffer linked should always be fine because of the :static

OK, we could make this rule match only when :static is not used. Will that make this better?

@KristofferC
Copy link

Without that it would (could) still be problematic, because setindex! isn't atomic.

It depends if the code hits a yield point in setindex! no (which I guess most of the time it doesn't).

@pfitzseb
Copy link
Member

It depends if the code hits a yield point in setindex! no (which I guess most of the time it doesn't).

Yes, but you can't statically determine this.

OK, we could make this rule match only when :static is not used. Will that make this better?

I think so. Imho this rule should be a pattern-inside -- obviously that doesn't catch call cases, e.g.

f(x, y) = x[threadid()] = y
@threads for i in 1:10
  f(x, i)
end

but seems more robust. Excessive amounts of false positives will lead to users disabling rules very quickly.

@iuliadmtru
Copy link
Collaborator Author

@pfitzseb Something like this? https://semgrep.dev/playground/s/B955

@pfitzseb
Copy link
Member

yeah, or even https://semgrep.dev/playground/s/DWjj

@iuliadmtru
Copy link
Collaborator Author

I changed the rule according to @pfitzseb's suggestions (changed patterns to pattern-insides and added a low confidence label) and changed the test cases accordingly. I also changed the file extension from .yml to .yaml.

@aviks aviks merged commit 016b34f into JuliaComputing:main Aug 6, 2023
1 check passed
@iuliadmtru iuliadmtru deleted the index-by-threadid branch August 7, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants