MoocRadar is maintained by the Knowledge Engineering Group of Tsinghua University with the assistance of Insititute of Education, Tsinghua Univerisity. This repository consists of 2,513 exercises, 14,226 students and over 12 million behavioral data and 5,600 fine-grained concepts, for supporting the developments of cognitive student modeling in MOOCs. The raw data is from XuetangX (https://www.xuetangx.com/).
We summarize the features of MoocRadar as:
- Abundant Learning Context: MoocRadar provides the relevant learning resources, structures, and contents about the students' exercise behaviors, which can enrich the selection candidates for the modeling methods.
- Fine-grained Knowledge Concepts: All the fine-grained concepts have been manually annotated and checked by the experts, which guarantees the quality of such specifical knowledge.
- Cognitive Level Labels: We invoke the Bloom Cognitive Taxonomy to construct "Cognitive Level" tags for the exercises, which can be further explored in subsequent research.
We are still going on the extension and annotation of this repository.
Based on MoocRadar, developers can attempt to build a more informative profile for each student, as introduced in our paper.
-
Exercise amount is extended to 9,384 !!
-
Our paper is submitted to SIGIR resource track !!
-
Update the annotation guidance of fine-grained concepts and cognitive labels.
There are multi-level data to be used, including:
Dataset | Description | Download Link |
---|---|---|
MoocRadar_Raw | All data of MOOC-radar. | Raw link |
Rsearchers can set up the presented models with EduKTM and EduCDM.
We provide several basic model's demo, including:
We also provide the performance of the improvement of DKVMN and NCDM with side information (i.e. cognitive and video).
-
--mode
(Option: Coarse/Middle/Fine) for your settings -
--data_dir
with Corresponding granularity data from above table.for example, for
--mode Middle
setting, prepare the following files:./data/student-problem-middle.json
./data/problem.json
-
then generate train/test dataset by setting:
--data_process
in scripts
Option 1: generate by setting: --data_process
in scripts
Option 2: download from there
There are also several tools and guidance for extending and employing the data.
For extending the data from MOOCCubeX knowledge base.
- Step 1: Download raw data from https://github.com/THU-KEG/MOOCCubeX.
- Step 2: Build the concepts with MOOCCube Cocnept Helper.
For further data annotation:
- The annotation guidance for fine-grained concepts and cognitive labels
- (Currently Chinese Only) https://cloud.tsinghua.edu.cn/f/cdfdeca0893e4ed0ac9c/
For more information:
- About learning sequence representation:
- About Bloom Cognitive Taxonomy:
- About the Fine-grained Concept Extraction:
- Ver 0.5: https://github.com/yujifan0326/Concept-Acquisition-Pipeline
- Ver 1.0: On the development.
The distribution of students' exercise behaviors, accurate rates and concept-linked exercises.
@article{MOOCRadar,
title={MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs},
author={Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Haitao Zheng, Juanzi Li, Jie Tang},
year={ 2023 }
}