-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recent changes broke types #1396
Comments
Hi @tonyhammainen, we really do have a duplicate on the |
Hi @omri374, thanks for the reply! The intended pattern is to use both libraries sequentially, first getting recognized PII out of analyzer, and inputting that for the anonymizer to anonymize. Or have I misunderstood? If so, this means that we are inputting Tbh, I don't see why you are not including analyzer as a dependency of anonymizer, given the main usage pattern for anonymizer relies on analyzer? If I have misunderstood how anonymizer package should be used do enlighten me! |
I can confirm the issue with pyright (pylance) by using the first example from the quickstart: https://microsoft.github.io/presidio/getting_started/ If the libraries MUST be independent, then the conversion between the two classes should be done by the user. For example this way: from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine, RecognizerResult
text = "My phone number is 212-555-5555"
# Set up the engine, loads the NLP module (spaCy model by default)
# and other PII recognizers
analyzer = AnalyzerEngine()
# Call analyzer to get results
results = analyzer.analyze(text=text, entities=["PHONE_NUMBER"], language="en")
print(results)
# Convert recognizer results to RecognizerResult objects from presidio_anonymizer
results = [
RecognizerResult(
entity_type=result.entity_type,
start=result.start,
end=result.end,
score=result.score,
)
for result in results
]
# Analyzer results are passed to the AnonymizerEngine for anonymization
anonymizer = AnonymizerEngine()
anonymized_text = anonymizer.anonymize(text=text, analyzer_results=results)
print(anonymized_text) Over time, the difference between Another approach might be to refactor the anonymizer so that it depends on the protocol of the Additionally, I'd suggest deprecating one of the |
Hi @eicca, thanks for your suggestions. We should also think of backward compatibilty here. In my view, the optimal solution is to have a shared library. One option is to have the anonymizer as a dependency of the analyzer. The second is to create a presidio-core library which is used by both. |
Thanks for your reply @omri374!
This totally makes sense to me, especially since it won't bring a lot of unnecessary dependencies for people who want to use only analyzer (in fact, just
This also makes sense. This can also be done at a later time after the previous step is done. |
Thanks! A PR adding the anonymizer to the analyzer sounds great. |
Describe the bug
As of
2.2.33
I was not getting any type-related errors, but upgrading to the latest2.2.354
resulted in getting them.error: Call to untyped function "AnonymizerEngine" in typed context [no-untyped-call]
^ Initializing the AnonymizerEngine throws an error. I believe it is because the class has no typed init function
error: Argument "ad_hoc_recognizers" to "analyze" of "AnalyzerEngine" has incompatible type "list[PatternRecognizer]"; expected "list[EntityRecognizer] | None" [arg-type]
^ I do not understand why mypy is not picking the fact that
PatternRecognizer
is a child class ofEntityRecognizer
Argument "analyzer_results" to "anonymize" of "AnonymizerEngine" has incompatible type "list[presidio_analyzer.recognizer_result.RecognizerResult]"; expected "list[presidio_anonymizer.entities.engine.recognizer_result.RecognizerResult]" [arg-type]
^ I believe the above is a result of the
RecognizerResult
class inpresidio-anonymizer
being a copy of the class inpresidio-analyzer
, and mypy doesn't understand their equivalencyTo Reproduce
Steps to reproduce the behavior:
AnonymizerEngine()
Expected behavior
No mypy errors when using the library as expected
The text was updated successfully, but these errors were encountered: