Skip to content
This repository has been archived by the owner on May 5, 2023. It is now read-only.

Refactored code and a lot of improvements in speed and file reading #31

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Huggyturd
Copy link

<<< I'am not a native in English >>>
Hello!
I discovered this project in my searches for machine generation of anki decks from books.
I am a programmer with some experience and I make some improvements in the code and add new features.
Now you be able to convert pdf, epub e web pages in a nice manner and a problem with gpu memory when using cuda it's resolved.
I was able to convert a pdf file with 900 pages with a gtx 1050 ti in approximately 1.2 hour.

- refatoring code
- added cuda clear cache for gpu memory optimization
- resolved comment typos
- change for a small sized model (improve velocity and improve gpu memory usage)
- change comments to docstring format
- Improved pdf file conversion to questions answering pairs
- Improved epub file conversion to questions answering pairs
- Improved web page conversion to questions answering pairs
- resolved pdftotext dependence in requirements.txt
- updated lasted dependencies
autocards.py:
- Added new dependencies
- refactored code
- changed model for a small sized
- improved gpu cuda memory usage
- implemented new pdf conversion
- implemented new html web conversion
- implemented new epub conversion
@thiswillbeyourgithub
Copy link
Collaborator

Hi and thanks a LOT for the interest in this project,

I was among the ones who tried to implement pdf and epub feature. I was very surprised at the lack of uniformity among those files, making it very tricky to always correclty parse text.

Are you confident that your solution works? Have you tested it on different files?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants