-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect ASCII detection #9
Comments
Hi Clément, These two examples run detection on Unicode strings, the correct test would be:
By design, This holds true for the first test with |
Hi Vladislav, Indeed your answer makes sense. When providing a python unicode string it might make sense to either raise an error (like chardet.detect) or return a specific reference to the internal python unicode encoding (instead of CP1006 which is not really meaningful in that case). When processing an arbitrary string, there's some value in having Detector() telling you that the ASCII encoding is sufficient to decode it (like chardet does, allowing charamel to be a dropped-in replacement). |
Hi,
I think that the test set for this package is too reduced, the default values for very simple strings are wrong:
echo $LANG
en_US.UTF-8
The first one should return ascii and the second one UTF-8.
Thanks in advance for looking into that,
The text was updated successfully, but these errors were encountered: