-
-
Notifications
You must be signed in to change notification settings - Fork 19
Support client-side rendered content #20
Comments
Hi! Thanks for your interest in autocards. I've contributed quite a lot to PRs of autocards (see for ex the pending PR) but sadly I'm terrible at webdesign so will very probably not do this myself. If you provide a clean way to simply get text data from a URL I can manage integrating it to the codebase very quickly though if you want. Have a nice day! |
I looked into it and it seems that basically every solution either requires 1) integration with a web browser or 2) using a paid service (which probably uses 1 under the hood). Here's one working example from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
# need Firefox installed, and the corresponding Firefox driver
# see https://selenium-python.readthedocs.io/installation.html#drivers
opts = FirefoxOptions()
# I'm using WSL, so I need this option
opts.add_argument("--headless")
url = "https://www.khanacademy.org/humanities/world-history/medieval-times/cross-cultural-diffusion-of-knowledge/a/the-golden-age-of-islam"
driver = webdriver.Firefox(options=opts)
driver.get(url)
soup = BeautifulSoup(driver.page_source)
# close(), or quit()
driver.quit() Unfortunately it requires having Firefox installed and installing the corresponding web driver into your PATH. There is also requests-html which is supposed to be a drop-in replacement for This is to say that all of these methods are brittle and trying to support it in the library itself would be a pain. But, including instructions on how to do it somewhere might be useful. |
Yes that's my conclusion as well. I think dynamic website can be exported to PDF or just copied and pasted to autocards so that's "fine" :/ Thanks for looking into this! |
Many sites aren't rendered server-side and so are unusable with
consume_web
, for example all the articles on KhanAcademy https://www.khanacademy.org/humanities/world-history/medieval-times/cross-cultural-diffusion-of-knowledge/a/the-golden-age-of-islamIntegration with Selenium, splash, etc would be one way to fix this
The text was updated successfully, but these errors were encountered: