This program is designed to read content from a Moodle system, sanitize it, and save it as text or publish it to a LLM vector store.
It works by searching for courses, and then extracting the content of the course either into a .csv file or (todo) a vector store or search engine.
To install the Moodle Content API, follow these steps:
- Clone the repository:
git clone https://github.com/brianlmerritt/moodle-content-api.git
- Install the required dependencies:
pip install -r requirements.txt
You also need to setup Web Services (REST) and generate a user token
If that user doesn't have full view all courses & categories, restrict requests to course by course or search of courses by pattern instead of find all courses.
To use the Moodle Content API, you need to provide the necessary configuration settings. Update the .env
file with your Moodle web token and other required parameters - use the .env_example to help.
Once the configuration is set up, you can run the program using the following command:
python3 get_moodle_courses_data.py
Files are stored in course_data
- Pages
- Books
- Files (WIP)
- Folders (WIP)
- Labels
- Blocks
- Finish extract data from Moodle for forums, lessons, assignments, quizzes (to start)
- Extract study map function if applicable (at RVC it is strand map)
- Build text import routines to save sanitised data (plain text & .md format?) with meta data from course, section, module, and study map if applicable
- Set up contributing possibility
- Add LTI & other content via Selenium?
- Add lecture capture
Coming soon