Scripts to scrape quiz content off Edx, remove answers & consolidate to a single html for print-out/practice/review purposes.
This project requires:
- Python 3, and
- a Unix based operation system (e.g. MacOS or Linux). Sorry Windows users, you deserve a better operating system.
Get the source code onto your computer:
if you have git:
git clone git@github.com:weilu/towardmit.git
Otherwise click the green button "Clone or download" on this page, click "Download ZIP". Then unzip the file you downloaded.
Then open the Terminal app (MacOS) or the equivalent of a console thing on other unix systems and executing the following commands:
cd towardmit
pip install -r requirements.txt
cp scrape_sample.sh scrape.sh
In the scrape.sh file, the [your request headers] needs to be replaced by request headers obtained from your browser. The request headers will be a series of '-H' options, which includes your edX login details. The header you need to scrape the courses are the cookies, which can be detected because they start with the work Cookie (e.g. -H 'Cookie: __cf...').
These headers can be found by doing the following.
(instructions are with the Chrome or Chromium web-browser, and tested using Linux & MacOS)
- Open your edX dashboard, logging in as necessary
- Open 'Developer Tools', which is a sub-menu item from 'More tools' in the menu
- Choose the 'Console' tab in the Developer pane
- Enter the following command 'alert( document.cookie );' (only the text inside the single quotes)
- Copy the text in the alert window
- Paste the required header into your scrape.sh file inside the single quotes
You may need to edit the scrape.py file to adjust for the version of the course you are enrolled to. This can be achieved by doing the following.
- Open the scrape.py file in a text editor.
- Find the courses variable inside the "_main_" function.
- Adjust to the the courses you want to scrape. You have to be registered for that run of the course, which is denoted by the letters following the (+) 1T2020 -> term (1T for spring, 2T for summer and 3T for fall) and the year you enrolled. You can also check this information in the URL of any of your courses.
python scrape.py
The generated quiz html files can be found in your out
directory