MoocRadar

MoocRadar is maintained by the Knowledge Engineering Group of Tsinghua University with the assistance of Insititute of Education, Tsinghua Univerisity. This repository consists of 2,513 exercises, 14,226 students and over 12 million behavioral data and 5,600 fine-grained concepts, for supporting the developments of cognitive student modeling in MOOCs. The raw data is from XuetangX (https://www.xuetangx.com/).

We summarize the features of MoocRadar as:

Abundant Learning Context: MoocRadar provides the relevant learning resources, structures, and contents about the students' exercise behaviors, which can enrich the selection candidates for the modeling methods.
Fine-grained Knowledge Concepts: All the fine-grained concepts have been manually annotated and checked by the experts, which guarantees the quality of such specifical knowledge.
Cognitive Level Labels: We invoke the Bloom Cognitive Taxonomy to construct "Cognitive Level" tags for the exercises, which can be further explored in subsequent research.

We are still going on the extension and annotation of this repository.

Based on MoocRadar, developers can attempt to build a more informative profile for each student, as introduced in our paper.

News !!

Exercise amount is extended to 9,384 !!
Our paper is submitted to SIGIR resource track !!
Update the annotation guidance of fine-grained concepts and cognitive labels.

Data Access

There are multi-level data to be used, including:

Dataset	Description	Download Link
MoocRadar_Raw	All data of MOOC-radar.	Raw link

Reproduction Model

Rsearchers can set up the presented models with EduKTM and EduCDM.

We provide several basic model's demo, including:

Knowledge Tracing:
- DKT
- DKT+
- DKVMN
Cognitive Diagnosis:
- GDIRT
- MIRT
- NCDM

We also provide the performance of the improvement of DKVMN and NCDM with side information (i.e. cognitive and video).

+Cognitive:
- DKVMN
- NCDM
+Video:
- DKVMN
- NCDM

Data for baselines reproduction:

--mode (Option: Coarse/Middle/Fine) for your settings
--data_dir with Corresponding granularity data from above table.

for example, for --mode Middle setting, prepare the following files:
- ./data/student-problem-middle.json
- ./data/problem.json
then generate train/test dataset by setting: --data_process in scripts

Data for improvement reproduction with cognitive and video side information:

Option 1: generate by setting: --data_process in scripts

Option 2: download from there

Toolkit & Guidance

There are also several tools and guidance for extending and employing the data.

For extending the data from MOOCCubeX knowledge base.

Step 1: Download raw data from https://github.com/THU-KEG/MOOCCubeX.
Step 2: Build the concepts with MOOCCube Cocnept Helper.

For further data annotation:

The annotation guidance for fine-grained concepts and cognitive labels
(Currently Chinese Only) https://cloud.tsinghua.edu.cn/f/cdfdeca0893e4ed0ac9c/

For more information:

About learning sequence representation:
- https://arxiv.org/pdf/2208.04708
About Bloom Cognitive Taxonomy:
- http://www.edpsycinteractive.org/topics/cognition/bloom.pdf
About the Fine-grained Concept Extraction:
- Ver 0.5: https://github.com/yujifan0326/Concept-Acquisition-Pipeline
- Ver 1.0: On the development.

Feature

The distribution of students' exercise behaviors, accurate rates and concept-linked exercises.

Reference

 @article{MOOCRadar,
  title={MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs},
  author={Jifan Yu, Mengying Lu, Qingyang Zhong, Zijun Yao, Shangqing Tu, Zhengshan Liao, Xiaoya Li, Manli Li, Lei Hou, Haitao Zheng, Juanzi Li, Jie Tang},
  year={ 2023 }
 }

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
baselines-cognitive		baselines-cognitive
baselines-video		baselines-video
baselines		baselines
data		data
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoocRadar

News !!

Data Access

Reproduction Model

Data for baselines reproduction:

Data for improvement reproduction with cognitive and video side information:

Toolkit & Guidance

Feature

Reference

About

Releases

Packages

Contributors 2

Languages

THU-KEG/MOOC-Radar

Folders and files

Latest commit

History

Repository files navigation

MoocRadar

News !!

Data Access

Reproduction Model

Data for baselines reproduction:

Data for improvement reproduction with cognitive and video side information:

Toolkit & Guidance

Feature

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages