Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(cache): simplify clean method #358

Merged
merged 2 commits into from
Jun 20, 2022

Conversation

agilgur5
Copy link
Collaborator

@agilgur5 agilgur5 commented Jun 12, 2022

Summary

Simplify TsCache's clean method to be more straightforward, less nesting, etc

Details

  • cache().clean() is only called during instantiation in options, so we can instead just call it inside of the constructor
  • there is no need to call this.init() in clean as it's only used in the constructor anyway, which already calls this.init()
    • and if noCache is true, init is basically a no-op too
      • so technically we don't need to call init at all if noCache, but that's a larger refactor that I'm splitting into a separate commit/PR

  • now that clean is just one conditional, we can invert it and return early instead
  • it's also not necessary and less efficient to call emptyDir before remove; remove will unlink all the contents anyway
    • docs here, though this also just normal FS behavior
  • IMO, it's also not necessary to check if it's a directory, we can remove it either way
    • and not necessary to log out if we don't clean it
  • then also just simplify the logic to use a filter instead of a nested if
    • which we already do in several places, so this follows existing code style

Future Reference

This refactor and some upcoming refactors mentioned above are going to make it a good bit easier to unit test this because the code will be simpler with less branches and less nesting.

- `cache().clean()` is only called during instantiation in `options`, so we can instead just call it inside of the constructor
- there is no need to call `this.init()` in `clean` as it's only used in the constructor anyway, which already calls `this.init()`
  - and if `noCache` is `true`, `init` is basically a no-op too
    - so technically we don't need to call `init` _at all_ if `noCache`, but that's a larger refactor that I'm splitting into a separate commit/PR

- now that `clean` is just one conditional, we can invert it and return early instead
- it's also not necessary and less efficient to call `emptyDir` before `remove`; `remove` will unlink all the contents anyway
  - docs here: https://github.com/jprichardson/node-fs-extra/blob/0220eac966d7d6b9a595d69b1242ab8a397fba7f/docs/remove-sync.md
    - though this also just normal FS behavior
- IMO, it's also not necessary to check if it's a directory, we can `remove` it either way
  - and not necessary to log out if we _don't_ clean it
- then also just simplify the logic to use a `filter` instead of a nested `if`
  - which we already do in several places, so this follows existing code style
@agilgur5 agilgur5 added kind: internal Changes only affect the internals, and _not_ the public API or external-facing docs scope: cache Related to the cache labels Jun 12, 2022
@ezolenko
Copy link
Owner

Making clean() private might be ok in current usage, but it makes cache class incomplete on its own. I guess contract changes so client should create new instance in this case, so that's fine.

Checking that what we delete is a directory is needed though, deleting cache is the most dangerous action this plugin does, so we should do as many sanity checks as we can. If the path is suddenly not a directory, something went very wrong. Ideally there should also be a manifest with a file list or something, and if it is well formed, we should be deleting only the files listed inside.

@agilgur5
Copy link
Collaborator Author

agilgur5 commented Jun 14, 2022

Making clean() private might be ok in current usage, but it makes cache class incomplete on its own. I guess contract changes so client should create new instance in this case, so that's fine.

Basically agreed. I recognized that in changing it, but it's only used like this anyway. From the Rollup perspective, clean will basically never be called imperatively outside of the constructor either.

That's more relevant if we were to ever split off the cache implementation into its own library, but otherwise, in the usage here, clean only happens on creation anyway.

I could leave it as public, but I feel that could give an incorrect impression to other readers and contributors if it is literally only used within the constructor.

Checking that what we delete is a directory is needed though, deleting cache is the most dangerous action this plugin does, so we should do as many sanity checks as we can. If the path is suddenly not a directory, something went very wrong

Mmm I see, I think we have different operating assumptions here. I'm thinking more in terms of unit testing this, where it's an invariant that it's a directory, so that situation should never happen and is therefore untestable. "Should never happen" also sounds like dead code to me, so my operating assumption is that it can be removed.
Basically if it's not a directory, I assume that the user or one of their other dev tools threw a file in there with the same prefix, and we definitely don't guarantee anywhere that the cache dir is safe to throw stuff in. Or in general, caches are considered ephemeral in most programming and do get corrupted from time-to-time. So nbd if we delete it.
We still check the prefix too, so it's not a completely random file in that sense either.

I see that your operating assumption is more that random stuff can happen and users can do random things, so be as safe as possible (#243 is a great example of a race condition that really shouldn't happen, but does during parallel processing. that being said, that specific race may actually be fixable and would be a performance improvement -- but we should probably check and not error hard anyway even if the race is handled).
And for what it's worth, that is particularly true for filesystems which themselves lack certain guarantees (due to write buffers, flushing, etc), so I don't necessarily disagree with that operating assumption.

Ideally there should also be a manifest with a file list or something, and if it is well formed, we should be deleting only the files listed inside.

Yea I was thinking something like this gives more guarantees if working from that operating assumption. But at the same time that seems like unnecessary complexity for a marginal safety benefit -- especially for something that really should be considered ephemeral.
From that "unnecessary complexity" perspective, I think you can see where "check if dir" can similarly derive to unnecessary complexity.

I'm fine going with your safety assumption though if that's what you'd prefer.

In that case, I would definitely add a comment that it shouldn't ever happen though, and probably throw a // istanbul-ignore-if comment in since it's effectively untestable without intentionally writing a file there. (which is not a bad test per se, but more a fault tolerance / chaos eng test than a unit test)

@ezolenko
Copy link
Owner

Yeah, cache is ephemeral, but only if that's really our cache folder and the root of it isn't somehow in a bad place (due to bugs in our code, or in the library we use to find cache root, or someplace else). There is a reason rm has a check for root folder, however limited it is.

- as requested in code review, will go with a safety-first operating assumption

- as such, with safety as modus operandus, make the logging more detailed in these scenarios, since they're not supposed to happen
  - and don't check code coverage on these as they're not supposed to happen in normal usage
@agilgur5
Copy link
Collaborator Author

@ezolenko made the requested changes 👍
I added the /* istanbul ignore if */ comments I mentioned for test coverage purposes and made the logging more detailed if entries are skipped, since we'd want more detailed logging if we're going with the safety operating assumption and if "something went very wrong" in these cases.

This is now a smaller change: an if inversion, not calling init unnecessarily, and only calling clean in the constructor (plus the above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: internal Changes only affect the internals, and _not_ the public API or external-facing docs scope: cache Related to the cache
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants