-
Notifications
You must be signed in to change notification settings - Fork 5
Data Flow Architecture
Data Flow is a multi-tier application to facilitate the automation of spreadsheet data (CSV/Excel files) conversion into the Ed-Fi ODS API format. Data Flow is a .NET C# application and designed to run on Microsoft Azure, Windows Virtual Machines and/or physical machines running Windows Server.
-
Ed-Fi Operational Data Store API - Data Flow requires the Ed-Fi ODS API as a platform for secure data storage and access. The Ed-Fi Alliance has provided a data standard and full stack technology set for K-12 education serving entities to manage multiple sources of information. The Ed-Fi ODS API stack requires Windows Server with IIS and Microsoft SQL Server or can be deployed to Microsoft Azure.
-
Data Flow has its own MS SQL database for local configuration and job status information. Data Flow will also need access to an Azure storage account or local file directory for storage of source files.
Data Flow's janitors are C# command line applications that process data in the various stages of workflow. Each janitor can be deployed under Windows Task Scheduler or as Azure App Service WebJobs.
-
File Transport janitor - connects to a pre-determined SFTP or FTPS location and pulls CSV/Excel files based on the provided file mask. The Data Flow stack contains a prototype of a Google Chrome extension that automates GET and POST operations from web sites as an additional method to obtain source data.
-
Transform / Load janitor - processes files pulled by the File Transport component and loads them into the Ed-Fi API. The Transform / Load component uses data maps as defined in the Data Flow database to convert data into the Ed-Fi format.
-
Reporting / Cleanup janitor - sends a daily report on operations within Data Flow. Also cleans up older data files based on configurable number of days to keep source data.
Data Flow contains a C# ASP.NET administration panel to manage the various functions for the application. The Admin Panel is either deployed under Microsoft IIS or Azure App Services. With the Admin Panel, one can:
- Configure data agents to obtain data at pre-determined times.
- Map source data to the Ed-Fi ODS API using JSON-based data maps.
- View and control job status information for configured data agents.
- View roster and assessment data stored within the Ed-Fi ODS API.
Data Flow's key concepts and workflow is:
- An agent will obtain and store a file to process. The file will be logged for processing in the Data Flow database. An agent could be a SFTP/FTPS site (via the File Transport janitor), a manual upload via the Admin Panel or a file obtained via the Chrome extension.
- The Transform / Load janitor will check the file log in the Data Flow database for new files to process. If new files are found, the janitor will convert the file to the Ed-Fi format and send to the ODS API.
- Each agent has one or more "data maps" associated with it to instruct the ETL process. A number of Ed-Fi entities need to be populated before results data is stored. Typically Education Organizations/Schools and Assessments are pre-configured before the ETL job is ran. As the ETL process runs, typically it'll insert Student, Student School Enrollment and Student Assessment Result objects in the Ed-Fi API.
- As each operation and ETL step is transacted, Data Flow will log to LogIngestions and NLog (application) tables to provide the administrator details of each step. Once the operation has successfully or unsuccessfully processed, this will be updated as status within the Files table.
- The Reporting / Cleanup janitor will run nightly to send a summary status report of file operations and delete source files depending configured expiration date.