Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epub3 fixed layout conversion #28

Open
God-damnit-all opened this issue Jun 8, 2023 · 10 comments
Open

epub3 fixed layout conversion #28

God-damnit-all opened this issue Jun 8, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@God-damnit-all
Copy link

I have an epub v3 file that has a fixed layout. The only reader I've found that can even read it at all is Thorium Reader which uses the readium-js library for epub rendering.

Unfortunately there isn't a way to print or convert an epub file in Thorium Reader and it's honestly fairly clunky to use. Trying to convert one with Calibre is a mess and it doesn't look like they're going to be getting any support for it any time soon.

Given the nature of this project, converting things to 1:1 cbz files, I was wondering if you could possibly add support for this.

@ToofDerling
Copy link
Owner

ToofDerling commented Jun 8, 2023

I have working code that converts epub files to cbz, I just need to clean it up and add it to CbzMage. So yeah, I can do that but I won't have time to do it before the end of the month (it's vacation time :)).

In the meantime it's not super hard to do manually: epub files are actually zip archives. So rename the file to .zip and unpack it, then find the directory with the images (it's named EPUB, OEBPS, or OPS), then create a new zip archive containing only the images and finally rename that archive to .cbz.

And the lazy way to do it is to simply rename the epub to cbz. A reader like YacReader will read that without complaining.

@ToofDerling ToofDerling added the enhancement New feature or request label Jun 8, 2023
@God-damnit-all
Copy link
Author

God-damnit-all commented Jun 9, 2023

I have working code that converts epub files to cbz, I just need to clean it up and add it to CbzMage. So yeah, I can do that but I won't have time to do it before the end of the month (it's vacation time :)).

In the meantime it's not super hard to do manually: epub files are actually zip archives. So rename the file to .zip and unpack it, then find the directory with the images (it's named EPUB, OEBPS, or OPS), then create a new zip archive containing only the images and finally rename that archive to .cbz.

And the lazy way to do it is to simply rename the epub to cbz. A reader like YacReader will read that without complaining.

For CbzMage, I was under the impression the pages in the target file were rendered and the page was then "screenshotted" and then placed into the cbz, is that not the case?

For the epub file I'm wanting to convert, the text isn't part of the image but rather overlayed on top of it.

@ToofDerling
Copy link
Owner

Ah yes, I forgot about that. I have around 180 epub files (mostly European comics bought on Google Books) and only a couple of them needs to be rendered like you describe.

The CbzMage pdf conversion works by rendering the pdf to a series of images, if we can find a tool or library that does something similar for epub files we can handle the troublesome files.

I had a quick look and it seems MuPDF supports epub to cbz conversion directly. There's also a library that might be helpful. Perhaps you can start by checking out MuPDF and let me know how it goes?

@God-damnit-all
Copy link
Author

Are you sure about that? Last I checked, MuPDF still does not support fixed layout epubs.

@ToofDerling
Copy link
Owner

No, I'm not sure at all as I only had a quick look :)

@ToofDerling
Copy link
Owner

ToofDerling commented Jun 13, 2023

I had time to do a little research, here's one thing you can try: unpack the epub and find the directory with the .xhtml files. Try opening a few of them in a Chromium browser (Chrome, Edge etc) and see if they render okay. It worked for me with a book called Eve (by Una).

I tried MuPDF and like you said it doesn't handle fixed layout epubs. The library I found was also useless. But if we can render the book in a Chromium browser we can snapshot the pages from there.

@God-damnit-all
Copy link
Author

I had time to do a little research, here's one thing you can try: unpack the epub and find the directory with the .xhtml files. Try opening a few of them in a Chromium browser (Chrome, Edge etc) and see if they render okay. It worked for me with a book called Eve (by Una).

I tried MuPDF and like you said it doesn't handle fixed layout epubs. The library I found was also useless. But if we can render the book in a Chromium browser we can snapshot the pages from there.

As far as I can tell, that does work, yes. The problem is that there's so many pages, and for some reason, almost all the elements seem to be lost if you try to print the page to PDF.

While it's a shame the ability to select the text will be lost, I think the only way of accurately converting it to something other than xhtml will be to take screenshots. There is a meta tag within the head tag called viewport that lists the exact dimensions of the pages, so the screenshot could simply be cropped based off that.

If you provide an email, I can email it to you for testing purposes. You can delete the comment immediately afterward, I'll see it in my email notifications.

@God-damnit-all
Copy link
Author

As an aside, if you know of any advanced OCR tools that can do OCR based off a specific font & font size, let me know.

@ToofDerling
Copy link
Owner

As an aside, if you know of any advanced OCR tools that can do OCR based off a specific font & font size, let me know.

The only experience I have with OCR is SubtitleEdit which can scan image based movie subtitles. I remember reading about some of the different OCR tools it uses in the documentation. That's all I got.

@ToofDerling
Copy link
Owner

ToofDerling commented Jun 14, 2023

If you provide an email, I can email it to you for testing purposes. You can delete the comment immediately afterward, I'll see it in my email notifications.

Yeah, I'd like to have a look at it. You should have my email now.

Edit: got the book, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants