Canvas Data Integration

This application uses Instructure's Data Access Platform (DAP) to pull Canvas table data into data files that we can then insert into our database to perform operations and reporting on. Since DAP does not yet natively support Oracle for replicating and synchronizing data to a database (the preferred solution), we developed this application to support data replication to Oracle with a desired subset of data for further operations and reporting. The end result is a setup of Oracle tables and views that provide data to stakeholders in further derived reports.

The application runs in three distinct steps:

Retreive desired Canvas table data from DAP in JSONL format
Cleanup data from JSONL data files using pandas and export to final CSV files
Insert data from final CSV data files into Oracle tables

For our example setup below, we are looking to implement an early-alert system for students struggling in Canvas courses, so that we can forward them to Advising or other resources for assistance. We have retreived data from Canvas that is used to create tables and a view that we can use to gauge student's performance in their current Canvas course enrollments through their overall course score. Supporting information like their total time spent in the enrollment and their last activity date in the enrollment can help identify struggling students that may require assistance from Advising, etc.

Requirements

This application assumes you already have an Oracle Database and Canvas LMS in place.

Setup

Application setup

To begin, clone or download the canvas-data-integration application. For the application setup, proceed as follows:

(Optional) Set up a Python virtual environment to not pollute the base environment on the machine hosting the application.
1. Navigate to the application directory
2. Setup a virtual environment: python -m venv .venv
3. Activate the virtual environment (to install required packages in it): .\.venv\Scripts\activate
Next, navigate to the application directory (if not already there) and install the required Python packages for the application: python -m pip install -r requirements.txt
Next, either create system environment variables or create an environment variable file called .env alongside the config.yml file in the application's base directory, named like so: canva-data-integration/.env. We use environment variables for sensitive but necessary DAP and database authentication fields, those being:
```
DAP_API_URL=my-api-gateway-url
DAP_CLIENT_ID='my-client-id' # in single quotes to escape hashtags
DAP_CLIENT_SECRET=my-client-secret
DB_HOST=my-host
DB_PORT=my-port-number
DB_SERVICE=my-service-name
DB_USERNAME=my-oracle-username
DB_PASSWORD=my-oracle-password
```
Replace the values with your own connection and authentiation information for DAP and Oracle. DAP API tokens can be obtained at identity.instructure.com, but are temporary and will need to be refreshed occasionally.
Change the directory in run.ps1 to your project directory

Oracle setup

It is assumed that you are already aware of the Canvas data structure you are looking to retrieve. Use DAP Datasets to identify Canvas tables and fields you wish to retrieve. For the database table, view, and merge query setup, proceed as follows:

Create your Canvas table equivalents in Oracle with your desired columns. Oracle scripts can be derived and condensed from DAP API scripts for PostgreSQL, MySQL, Microsoft SQL Server. The sample statements below are the tables we need for our early-alert system:

canvas_course_sections - DAP dataset course_sections

CREATE TABLE canvas_course_sections (
    course_sections_id NUMBER(19) NOT NULL,
    course_sections_name VARCHAR2(255) NOT NULL,
    course_sections_course_id NUMBER(19) NOT NULL,
    course_sections_workflow_state VARCHAR2(255) NOT NULL,
    course_sections_ts TIMESTAMP ZONE NOT NULL,
    CONSTRAINT pk_course_sections PRIMARY KEY (course_sections_id)
    );

canvas_courses - DAP dataset courses

CREATE TABLE canvas_courses (
    courses_id NUMBER(19) NOT NULL,
    courses_sis_source_id VARCHAR2(255),
    courses_name VARCHAR2(255),
    courses_enrollment_term_id NUMBER(19) NOT NULL,
    courses_workflow_state VARCHAR2(255) NOT NULL,
    courses_is_public VARCHAR2(5),
    courses_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_courses PRIMARY KEY (courses_id)
    );

canvas_enrollment_terms - DAP dataset enrollment_terms

CREATE TABLE canvas_enrollment_terms (
    enrollment_terms_id NUMBER(19) NOT NULL,
    enrollment_terms_sis_source_id VARCHAR2(255),
    enrollment_terms_workflow_state VARCHAR2(255) NOT NULL,
    enrollment_terms_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_enrollment_terms PRIMARY KEY (enrollment_terms_id)
    );

canvas_enrollments - DAP dataset enrollments

CREATE TABLE canvas_enrollments (
    enrollments_id NUMBER(19) NOT NULL,
    enrollments_last_activity_at TIMESTAMP,
    enrollments_total_activity_time NUMBER(10),
    enrollments_course_section_id NUMBER(19) NOT NULL,
    enrollments_course_id NUMBER(19) NOT NULL,
    enrollments_role_id NUMBER(19) NOT NULL,
    enrollments_user_id NUMBER(19) NOT NULL,
    enrollments_sis_pseudonym_id NUMBER(19),
    enrollments_workflow_state VARCHAR2(255) NOT NULL,
    enrollments_type VARCHAR2(255) NOT NULL,
    enrollments_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_enrollments PRIMARY KEY (enrollments_id)
    );

canvas_pseudonyms - DAP dataset pseudonyms

CREATE TABLE canvas_pseudonyms (
    pseudonyms_id NUMBER(19) NOT NULL,
    pseudonyms_user_id NUMBER(19) NOT NULL,
    pseudonyms_workflow_state VARCHAR2(255) NOT NULL,
    pseudonyms_unique_id VARCHAR2(255) NOT NULL,
    pseudonyms_sis_user_id VARCHAR2(255),
    pseudonyms_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_pseudonyms PRIMARY KEY (pseudonyms_id)
    );

canvas_scores - DAP dataset scores

CREATE TABLE canvas_scores (
    scores_id NUMBER(19) NOT NULL,
    scores_current_score BINARY_DOUBLE,
    scores_enrollment_id NUMBER(19) NOT NULL,
    scores_workflow_state VARCHAR2(255) NOT NULL,
    scores_course_score VARCHAR2(5) NOT NULL,
    scores_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_scores PRIMARY KEY (scores_id)
    );

canvas_users - DAP dataset users

CREATE TABLE canvas_users (
    users_id NUMBER(19) NOT NULL,
    users_workflow_state VARCHAR2(255) NOT NULL,
    users_name VARCHAR2(255),
    users_ts TIMESTAMP NOT NULL,
    CONSTRAINT pk_users PRIMARY KEY (users_id)
    );

(Optional) In our case, we also want a final view for our joined Canvas table information to list out distinct Canvas student enrollments with all our desired data points:

canvas_data

create or replace view canvas_data as
select distinct
        substr(et.enrollment_terms_sis_source_id, 0, 6) TermCode,
        substr(et.enrollment_terms_sis_source_id, 8) PartOfTerm, 
        substr(c.courses_sis_source_id, 8) CRN,
        substr(substr(cs.course_sections_name, 1, instr(cs.course_sections_name, '_', -1, 1) -1), instr(substr(cs.course_sections_name, 1, instr(cs.course_sections_name, '_', -1, 1) -1), '_', -1, 1) +1) SubjectCodeCourseNumber,
        substr(cs.course_sections_name, instr(cs.course_sections_name, '_', -1, 1) +1) CourseName,
        to_char(e.enrollments_last_activity_at,'MM/DD/YYYY') LastActivityDate,
        round(e.enrollments_total_activity_time/60/60, 2) TotalActivityTimeHrs,
        u.users_name FullName,
        p.pseudonyms_sis_user_id XID,
        p.pseudonyms_unique_id Email,
        to_char(s.scores_current_score, 'FM99999990.00') CurrentScore,
        e.enrollments_workflow_state EnrollmentState,
        LISTAGG(tp.pseudonyms_sis_user_id, ', ') WITHIN GROUP (ORDER BY tp.pseudonyms_sis_user_id) AS TeacherXID,
        LISTAGG(tu.users_name, ', ') WITHIN GROUP (ORDER BY tu.users_name) AS TeacherFullName 
from    canvas_enrollment_terms et
join    canvas_courses c on c.courses_enrollment_term_id = et.enrollment_terms_id
join    canvas_course_sections cs on cs.course_sections_course_id = c.courses_id
join    canvas_enrollments e on e.enrollments_course_section_id = cs.course_sections_id
    and e.enrollments_course_id = c.courses_id
    and e.enrollments_workflow_state = 'active' -- active courses only
join    canvas_users u on u.users_id = e.enrollments_user_id
join    canvas_pseudonyms p on p.pseudonyms_user_id = u.users_id
    and p.pseudonyms_id = e.enrollments_sis_pseudonym_id
join    canvas_scores s on s.scores_enrollment_id = e.enrollments_id
    and s.scores_course_score = 'True'
left join canvas_enrollments te on te.enrollments_course_section_id = cs.course_sections_id
    and te.enrollments_course_id = c.courses_id
    and te.enrollments_type = 'TeacherEnrollment'
left join canvas_users tu on tu.users_id = te.enrollments_user_id
left join canvas_pseudonyms tp on tp.pseudonyms_user_id = tu.users_id
    and tp.pseudonyms_id = te.enrollments_sis_pseudonym_id
where   et.enrollment_terms_sis_source_id like '20%'
group by substr(et.enrollment_terms_sis_source_id, 0, 6),
        substr(et.enrollment_terms_sis_source_id, 8),
        substr(c.courses_sis_source_id, 8),
        substr(substr(cs.course_sections_name, 1, instr(cs.course_sections_name, '_', -1, 1) -1), instr(substr(cs.course_sections_name, 1, instr(cs.course_sections_name, '_', -1, 1) -1), '_', -1, 1) +1),
        substr(cs.course_sections_name, instr(cs.course_sections_name, '_', -1, 1) +1),
        to_char(e.enrollments_last_activity_at,'MM/DD/YYYY'),
        round(e.enrollments_total_activity_time/60/60, 2),
        u.users_name,
        p.pseudonyms_sis_user_id,
        p.pseudonyms_unique_id,
        to_char(s.scores_current_score, 'FM99999990.00'),
        e.enrollments_workflow_state;

In the application directory, modify config.yaml to contain a canvas_table configuration entry with the details of each table as demonstrated in the sample config.yml.
- For each table, fields accepts a list of desired columns from the Canvas table as defined in DAP datasets. See config.yml for examples.
- The db_query field should define your merge query that will update your Oracle table with the newest Canvas table information from each application run. See config.yml for examples.
- The query_type field ('incremental' or 'snapshot') defines which time-period DAP should retreive data for, for the specified Canvas table, as defined here. When intializing your Oracle database tables, it is recommended to first run each table in 'snapshot' mode to get the totality of records from the Canvas table from DAP. Warning: Certain Canvas tables can return large numbers of records when using 'snapshot' mode. You can test with 'incremental' mode first to see how many records are returned for a more specific period of time.
  - Afterwards, you can retreive the records changed in the past X days with the 'incremental' mode in combination with the past_days configuration entry.
(Optional) Timestamps retrieved from Canvas are formatted according to ISO-8601 standards and are in UTC time zone. These timestamps are used solely for comparison purposes in Oracle MERGE queries that insert or update data in our Oracle tables. Therefore, you can safely insert them directly into the corresponding TIMESTAMP fields in the tables. Should you wish to convert to your local time zone for further operations, you can adjust the setup as follows:
1. Modify each table's timestamp field to use the TIMESTAMP WITH TIME ZONE data type instead of the TIMESTAMP data type.
2. Adjust your MERGE queries to convert your timestamp fields to a different time zone. For example:
```
TO_TIMESTAMP(:4, 'YYYY-MM-DD"T"HH24:MI:SS.FF3"Z"')

to

FROM_TZ(TO_TIMESTAMP(:4, 'YYYY-MM-DD"T"HH24:MI:SS.FF3"Z"'),'UTC') AT TIME ZONE 'America/New_York'
```

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
canvas_data_integration		canvas_data_integration
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
requirements.txt		requirements.txt
run.ps1		run.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Canvas Data Integration

Requirements

Setup

Application setup

Oracle setup

Resources

About

Releases 1

Languages

License

mzstanisic/canvas-data-integration

Folders and files

Latest commit

History

Repository files navigation

Canvas Data Integration

Requirements

Setup

Application setup

Oracle setup

Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages