-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it able to decode other language such as Chinese #2
Comments
Did the error say which line was causing the problems? I don't think I've ever tried it on Chinese characters / Kanji etc. Also, which python version are you using? |
I am running on python 3.8 but had try 3.9 too. Thanks a bunch |
Hi! Same problem trying to use your script, Python 3.7.6, and books in English and Spanish:
Thanks |
Hi @aiturri, Thanks for raising this. Would it be possible for you to share the part of the clipping file that is causing the errors? I'd love to try and fix this but can't reproduce the error. Best, |
Sure! |
Hi @aiturri, I've just pushed a potential fix. Can you download the script again and try? Otherwise you can manually modify line 55 to specify an encoding. with open(source_file, "r", encoding="utf8") as f: Let me know if it does or doesn't work. For the record, the original script worked fine on my machine with your clippings file so I couldn't verify the issue Best, |
Hi @robertmartin8 , I tried again, and still not working: Traceback (most recent call last): I will attach here my original clippings file so you can try, but I will delete as soon as you download it (please, let me know so I can delete (for privacy reasons!)) Thanks again! |
@aiturri OK, I've downloaded it. Feel free to remove |
@aiturri still can't reproduce it – I can parse your file accents and all. I think it's a mac/windows issue. Can you try again? I forgot to add |
_Traceback (most recent call last): |
@aiturri Ok it seems this is related to a particular windows encoding. Other people seem to have had the same issue. (Please save your clippings file beforehand just in case) I've put two fixes: the first just ignores the errors – have a go and see whether it works (the output might be garbled). The second is a new argument to specify the encoding:
It might solve your problem? |
New to python but I think it's having issue decoding Chinese, need encoding="utf-8" maybe?:
Error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 121: character maps to
The text was updated successfully, but these errors were encountered: