epub3 fixed layout conversion #28

God-damnit-all · 2023-06-08T04:22:24Z

I have an epub v3 file that has a fixed layout. The only reader I've found that can even read it at all is Thorium Reader which uses the readium-js library for epub rendering.

Unfortunately there isn't a way to print or convert an epub file in Thorium Reader and it's honestly fairly clunky to use. Trying to convert one with Calibre is a mess and it doesn't look like they're going to be getting any support for it any time soon.

Given the nature of this project, converting things to 1:1 cbz files, I was wondering if you could possibly add support for this.

ToofDerling · 2023-06-08T12:56:19Z

I have working code that converts epub files to cbz, I just need to clean it up and add it to CbzMage. So yeah, I can do that but I won't have time to do it before the end of the month (it's vacation time :)).

In the meantime it's not super hard to do manually: epub files are actually zip archives. So rename the file to .zip and unpack it, then find the directory with the images (it's named EPUB, OEBPS, or OPS), then create a new zip archive containing only the images and finally rename that archive to .cbz.

And the lazy way to do it is to simply rename the epub to cbz. A reader like YacReader will read that without complaining.

God-damnit-all · 2023-06-09T06:05:37Z

I have working code that converts epub files to cbz, I just need to clean it up and add it to CbzMage. So yeah, I can do that but I won't have time to do it before the end of the month (it's vacation time :)).

In the meantime it's not super hard to do manually: epub files are actually zip archives. So rename the file to .zip and unpack it, then find the directory with the images (it's named EPUB, OEBPS, or OPS), then create a new zip archive containing only the images and finally rename that archive to .cbz.

And the lazy way to do it is to simply rename the epub to cbz. A reader like YacReader will read that without complaining.

For CbzMage, I was under the impression the pages in the target file were rendered and the page was then "screenshotted" and then placed into the cbz, is that not the case?

For the epub file I'm wanting to convert, the text isn't part of the image but rather overlayed on top of it.

ToofDerling · 2023-06-09T09:17:59Z

Ah yes, I forgot about that. I have around 180 epub files (mostly European comics bought on Google Books) and only a couple of them needs to be rendered like you describe.

The CbzMage pdf conversion works by rendering the pdf to a series of images, if we can find a tool or library that does something similar for epub files we can handle the troublesome files.

I had a quick look and it seems MuPDF supports epub to cbz conversion directly. There's also a library that might be helpful. Perhaps you can start by checking out MuPDF and let me know how it goes?

God-damnit-all · 2023-06-09T13:18:51Z

Are you sure about that? Last I checked, MuPDF still does not support fixed layout epubs.

ToofDerling · 2023-06-09T15:04:12Z

No, I'm not sure at all as I only had a quick look :)

ToofDerling · 2023-06-13T22:38:06Z

I had time to do a little research, here's one thing you can try: unpack the epub and find the directory with the .xhtml files. Try opening a few of them in a Chromium browser (Chrome, Edge etc) and see if they render okay. It worked for me with a book called Eve (by Una).

I tried MuPDF and like you said it doesn't handle fixed layout epubs. The library I found was also useless. But if we can render the book in a Chromium browser we can snapshot the pages from there.

God-damnit-all · 2023-06-14T16:21:16Z

I had time to do a little research, here's one thing you can try: unpack the epub and find the directory with the .xhtml files. Try opening a few of them in a Chromium browser (Chrome, Edge etc) and see if they render okay. It worked for me with a book called Eve (by Una).

I tried MuPDF and like you said it doesn't handle fixed layout epubs. The library I found was also useless. But if we can render the book in a Chromium browser we can snapshot the pages from there.

As far as I can tell, that does work, yes. The problem is that there's so many pages, and for some reason, almost all the elements seem to be lost if you try to print the page to PDF.

While it's a shame the ability to select the text will be lost, I think the only way of accurately converting it to something other than xhtml will be to take screenshots. There is a meta tag within the head tag called viewport that lists the exact dimensions of the pages, so the screenshot could simply be cropped based off that.

If you provide an email, I can email it to you for testing purposes. You can delete the comment immediately afterward, I'll see it in my email notifications.

God-damnit-all · 2023-06-14T16:24:38Z

As an aside, if you know of any advanced OCR tools that can do OCR based off a specific font & font size, let me know.

ToofDerling · 2023-06-14T16:34:37Z

As an aside, if you know of any advanced OCR tools that can do OCR based off a specific font & font size, let me know.

The only experience I have with OCR is SubtitleEdit which can scan image based movie subtitles. I remember reading about some of the different OCR tools it uses in the documentation. That's all I got.

ToofDerling · 2023-06-14T16:46:32Z

If you provide an email, I can email it to you for testing purposes. You can delete the comment immediately afterward, I'll see it in my email notifications.

Yeah, I'd like to have a look at it. You should have my email now.

Edit: got the book, thanks.

ToofDerling added the enhancement New feature or request label Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epub3 fixed layout conversion #28

epub3 fixed layout conversion #28

God-damnit-all commented Jun 8, 2023

ToofDerling commented Jun 8, 2023 •

edited

Loading

God-damnit-all commented Jun 9, 2023 •

edited

Loading

ToofDerling commented Jun 9, 2023

God-damnit-all commented Jun 9, 2023

ToofDerling commented Jun 9, 2023

ToofDerling commented Jun 13, 2023 •

edited

Loading

God-damnit-all commented Jun 14, 2023

God-damnit-all commented Jun 14, 2023

ToofDerling commented Jun 14, 2023

ToofDerling commented Jun 14, 2023 •

edited

Loading

epub3 fixed layout conversion #28

epub3 fixed layout conversion #28

Comments

God-damnit-all commented Jun 8, 2023

ToofDerling commented Jun 8, 2023 • edited Loading

God-damnit-all commented Jun 9, 2023 • edited Loading

ToofDerling commented Jun 9, 2023

God-damnit-all commented Jun 9, 2023

ToofDerling commented Jun 9, 2023

ToofDerling commented Jun 13, 2023 • edited Loading

God-damnit-all commented Jun 14, 2023

God-damnit-all commented Jun 14, 2023

ToofDerling commented Jun 14, 2023

ToofDerling commented Jun 14, 2023 • edited Loading

ToofDerling commented Jun 8, 2023 •

edited

Loading

God-damnit-all commented Jun 9, 2023 •

edited

Loading

ToofDerling commented Jun 13, 2023 •

edited

Loading

ToofDerling commented Jun 14, 2023 •

edited

Loading