-
Notifications
You must be signed in to change notification settings - Fork 2
How to Create and Update the SIAL
Overviews and instructions on how to install and run the SIAL are found in the SIAL README.
This page contains the notes for the How to Workshops that SIA runs.
Social Investment Analytical Layer - standardised tables and views makes it easier to do cross sector analysis. Where cost information is missing we have asked the agencies to provide cost estimates (or to check our estimates) so that we can account for more of Government spend in the IDI.
[Put on whiteboard] The standardised table structure
- snz_uid (int)
- department (varchar3)
- datamart (varchar3)
- subject_area (varchar3)
- start_date (datetime)
- end_date (datetime)
- cost (float)
- event_type (varchar)
- event_type_2 (varchar) and so on.
An example of a record from a SIAL table is shown below
snz_uid | department | datamart | subject_area | start_date | end_date | cost | event_type |
---|---|---|---|---|---|---|---|
1234 | MSD | BEN | T1 | 01/01/2013 | 31/12/2013 | 987.65 | JSS |
[Put on whiteboard] We have SIAL tables/views for many of the schemas
- moh_clean
- moe_clean
- msd_clean
- acc_clean
- cyf_clean
- cor_clean
- moj_clean
- pol_clean
- ird_clean
- hnz_clean
SIAL_bqa_complete/incomplete – We ask all agencies to provide feedback on the SIAL tables. These folders reflect whether or not the agency has provided feedback on the tables. MOH has the most outstanding tables (because they have so many data marts so it takes longer to QA them all).
docs – has a list of data dictionaries and also a visual of the datasets and their availability you should have a look at this area first before using the SIAL tables to understand what the different event types are and whether or not we have discovered any additional data quality issues during the QA process.
logs – will store the logs in a text file because they are so large and we need to scan the logs for SIAL errors.
SIAL_dependencies – has all the csv and text files that have pricing information that is not available in the IDI (MOH MOE and MOJ). These csv and text files are bulk inserted into the database so that all the cost information is available to other users. It also contains the scripts to build the SIAL and remove it.
Running the SIAL (full version on README)
-
Go to https://github.com/nz-social-investment-agency/social_investment_analytical_layer (searching for social investment analytical layer github will return the relevant page as well)
-
Download the zip file
-
At this point you would email access2microdata@stats.govt.nz and ask them to drop the file into your project area. For this workshop they are already there so we do not need to email them.
-
Copy this folder into your personal folder that way if we accidentally overwrite things then it won’t matter.
-
Open social_investment_analytical_layer/SIAL_dependencies/main_sial.sas
-
We will add line numbers so it is easier to follow along Tools > options > sas program > editor options check the show line number box
-
There are two variables you need to change Set
targetschema
asSTATSNZ\<userid>
this is your personal schema no one else can write here Go to the file explorer and copy the path\\<file_sys_name>\MAA2016-15 Supporting the Social Investment Unit\social_investment_analytical_layer
set this assial_code_path
-
Run the views and only the school table. Do not run MSD T1 (otherwise you will be there forever). Either comment it out or delete it.
[Scroll quickly through this code and explain what it will do while the code is running]
Note: The server was unable to handle 8 people simultaneously running the SIAL and querying the same MOE tables. We experienced long run times and errors. Future workshops should have 2-3 instances running simultaneously to avoid this problem.
Explain why we need to run each refresh (updates the tables – views are always pointing to the latest refresh so they will always be up to date).
Open up sql go to your personal schema and check for the SIAL views and one of the SIAL tables (one of each so they know how to locate views and tables).
Open one of the tables or views and note the standard structure
Delete from your personal schema – Stats NZ prefer us not to write to this area. We were just doing this for illustrative purposes so that we can all run the SIAL without getting errors about locks etc. We will uninstall all the tables in the next step.
- Open social_investment_analytical_layer\SIAL_dependencies\uninstall_sial.sas
- Again, there are two parameters
targetschema
andsial_code_path
. Use the same two arguments that you specified in main_sial.sas - Note because we ran the earlier script which sends the log output to file we don’t see any log displayed
This should be run as a second workshop so that new people are not overwhelmed by all the content for the SIAL workshop.
We will not create a new table/view but we will step you through one of the SIAL views SIAL_MOE_itl_events which caused us a lot of headaches when it was first built.
- Before you start make sure you have enough cost information back to your first event start date. If you do not you may need to deflate existing costs to get these (this will need to be checked by the relevant agency to see if this is appropriate)
- Create a csv (or txt if there are annoying characters which was the case with the PRIMHD purchasing units cost) with the cost information
- Check that you have the following formats in the csv
- Start and end date in dd/mm/yyyy (the sql script will cast this into datetime)
- Make sure the costs are decimal values with no dollars or comma (otherwise the csv will start to skip columns),
- In the csv nulls have to be blank (i.e. two commas with nothing between them) not NULL otherwise they will be treated as strings
- Create the SIAL table and join on the cost table using dates and any other info needed to join
- Go to <sial_code_path>\SIAL_dependencies\load_pricing_tables_to_sandpit.sql this script will bulk insert your csv tables into the sandpit so that when you write a sql script and do a join these tables will exist
- Several changes will need to be made to the load_pricing_tables_to_sandpit.sql
- Declare and initialise the file path and the name of the csv
- Scroll down to the bottom of the script and create code that bulk inserts the csv into the table. You can use previous examples as guidance but remember that your data types may differ from the existing cost tables so they may need to be modified
- Changes will also need to be made to <sial_code_path>\SIAL_dependencies\uninstall_sial_tables_views.sql
- Add the newly created cost table to the list of pricing tables to drop
- Go to <sial_code_path>\SIAL_dependencies\main_sial.sas
- Update the main header to specify whether the new SIAL table is in fact a table or a view and whether a cost table is created with it
- Update the section further down that specifies all the tables/views created It starts with At this point, you will find a list of SIAL related
- If you are creating a SIAL view then you will need this final step
- Go to <sial_code_path>\SIAL_dependencies\create_sial_views.sql and add the view name to the commented-out sub directory tree. This is in case we need to go back and do some testing or if we build views that depend on other views then they will need to be run in a particular order.
- If you are creating a SIAL table then you will need this final step
- Refer to the bottom of <sial_code_path>\SIAL_dependencies\main_sial.sas you will need to specify a log name and print the output to the log to check for errors
- Then you will need to include the sas script that builds the table
- Then call the getErrors to get any errors that may occur during the table building process then write any of the output from getErrors to the log table.
- Examples of this for MSD T1 are shown at the bottom of the main script and can be copied and amended for the new SIAL table.
Adding a person to an IDI Project
Access to the SIA Github organisation
Graph databases and visualisations
How to Create and Update the SIAL
How to Use and Update the SIDF
How to Use and Update the SIAtoolbox
How to Use and Update the Markdown Reports
How to Use Git and Version Control