Is it able to decode other language such as Chinese #2

l1lsl0th · 2021-03-22T19:19:18Z

New to python but I think it's having issue decoding Chinese, need encoding="utf-8" maybe?:

Error:
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 121: character maps to

robertmartin8 · 2021-03-23T13:54:00Z

Did the error say which line was causing the problems? I don't think I've ever tried it on Chinese characters / Kanji etc.

Also, which python version are you using?

l1lsl0th · 2021-03-27T14:28:08Z

I am running on python 3.8 but had try 3.9 too. Thanks a bunch

aiturri · 2021-05-12T18:35:51Z

Hi! Same problem trying to use your script, Python 3.7.6, and books in English and Spanish:

Traceback (most recent call last): File "KindleClippings.py", line 116, in <module> parse_clippings(source_file, destination) File "KindleClippings.py", line 57, in parse_clippings for highlight in f.read().split("=========="): File "d:\Miniconda3\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1589: character maps to <undefined>

Thanks

robertmartin8 · 2021-05-12T18:37:44Z

Hi @aiturri,

Thanks for raising this. Would it be possible for you to share the part of the clipping file that is causing the errors? I'd love to try and fix this but can't reproduce the error.

Best,
Robert

aiturri · 2021-05-12T19:01:08Z

Sure!
Thanks in advance!

robertmartin8 · 2021-05-12T19:30:04Z

Hi @aiturri,

I've just pushed a potential fix. Can you download the script again and try?

Otherwise you can manually modify line 55 to specify an encoding.

    with open(source_file, "r", encoding="utf8") as f:

Let me know if it does or doesn't work. For the record, the original script worked fine on my machine with your clippings file so I couldn't verify the issue

Best,
Robert

aiturri · 2021-05-12T20:00:41Z

Hi @robertmartin8 , I tried again, and still not working:

Traceback (most recent call last):
File "KindleClippings.py", line 116, in
parse_clippings(source_file, destination)
File "KindleClippings.py", line 88, in parse_clippings
outfile.write(clipping_text + "\n\n...\n\n")
File "d:\Miniconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c3' in position 142: character maps to

I will attach here my original clippings file so you can try, but I will delete as soon as you download it (please, let me know so I can delete (for privacy reasons!))

Thanks again!

robertmartin8 · 2021-05-12T20:03:12Z

@aiturri OK, I've downloaded it. Feel free to remove

robertmartin8 · 2021-05-12T20:10:45Z

@aiturri still can't reproduce it – I can parse your file accents and all. I think it's a mac/windows issue.

Can you try again? I forgot to add encoding="utf8" to a couple of the file opens.

aiturri · 2021-05-12T20:15:56Z

@robertmartin8

_Traceback (most recent call last):
File "KindleClippings.py", line 117, in
parse_clippings(source_file, destination)
File "KindleClippings.py", line 82, in parse_clippings
current_text = textfile.read()
File "d:\Miniconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self.buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 472: invalid start byte

robertmartin8 · 2021-05-12T20:28:06Z

@aiturri Ok it seems this is related to a particular windows encoding. Other people seem to have had the same issue.

(Please save your clippings file beforehand just in case)

I've put two fixes: the first just ignores the errors – have a go and see whether it works (the output might be garbled).

The second is a new argument to specify the encoding:

python KindleClippings.py -encoding=cp1252

It might solve your problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it able to decode other language such as Chinese #2

Is it able to decode other language such as Chinese #2

l1lsl0th commented Mar 22, 2021

robertmartin8 commented Mar 23, 2021

l1lsl0th commented Mar 27, 2021

aiturri commented May 12, 2021

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021 •

edited

Loading

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021 •

edited

Loading

robertmartin8 commented May 12, 2021

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021

robertmartin8 commented May 12, 2021

Is it able to decode other language such as Chinese #2

Is it able to decode other language such as Chinese #2

Comments

l1lsl0th commented Mar 22, 2021

robertmartin8 commented Mar 23, 2021

l1lsl0th commented Mar 27, 2021

aiturri commented May 12, 2021

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021 • edited Loading

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021 • edited Loading

robertmartin8 commented May 12, 2021

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021

robertmartin8 commented May 12, 2021

aiturri commented May 12, 2021 •

edited

Loading

aiturri commented May 12, 2021 •

edited

Loading