Refactored code and a lot of improvements in speed and file reading #31

Huggyturd · 2022-03-12T17:25:41Z

<<< I'am not a native in English >>>
Hello!
I discovered this project in my searches for machine generation of anki decks from books.
I am a programmer with some experience and I make some improvements in the code and add new features.
Now you be able to convert pdf, epub e web pages in a nice manner and a problem with gpu memory when using cuda it's resolved.
I was able to convert a pdf file with 900 pages with a gtx 1050 ti in approximately 1.2 hour.

- refatoring code - added cuda clear cache for gpu memory optimization

- resolved comment typos - change for a small sized model (improve velocity and improve gpu memory usage) - change comments to docstring format

- Improved pdf file conversion to questions answering pairs - Improved epub file conversion to questions answering pairs - Improved web page conversion to questions answering pairs

- resolved pdftotext dependence in requirements.txt

- updated lasted dependencies

autocards.py: - Added new dependencies - refactored code - changed model for a small sized - improved gpu cuda memory usage - implemented new pdf conversion - implemented new html web conversion - implemented new epub conversion

… develop

thiswillbeyourgithub · 2022-03-13T13:03:15Z

Hi and thanks a LOT for the interest in this project,

I was among the ones who tried to implement pdf and epub feature. I was very surprised at the lack of uniformity among those files, making it very tricky to always correclty parse text.

Are you confident that your solution works? Have you tested it on different files?

cloura added 8 commits March 10, 2022 02:14

code refactoring

fc94075

changes:

e203013

- refatoring code - added cuda clear cache for gpu memory optimization

Changes:

424d843

- resolved comment typos - change for a small sized model (improve velocity and improve gpu memory usage) - change comments to docstring format

Changes:

94c381b

- Improved pdf file conversion to questions answering pairs - Improved epub file conversion to questions answering pairs - Improved web page conversion to questions answering pairs

Changes:

86af367

- resolved pdftotext dependence in requirements.txt

Changes:

705c802

- updated lasted dependencies

Changes:

0453037

autocards.py: - Added new dependencies - refactored code - changed model for a small sized - improved gpu cuda memory usage - implemented new pdf conversion - implemented new html web conversion - implemented new epub conversion

Merge branch 'develop' of https://github.com/Huggyturd/autocards into…

391849e

… develop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored code and a lot of improvements in speed and file reading #31

Refactored code and a lot of improvements in speed and file reading #31

Huggyturd commented Mar 12, 2022

thiswillbeyourgithub commented Mar 13, 2022

Refactored code and a lot of improvements in speed and file reading #31

Are you sure you want to change the base?

Refactored code and a lot of improvements in speed and file reading #31

Conversation

Huggyturd commented Mar 12, 2022

thiswillbeyourgithub commented Mar 13, 2022