-
Notifications
You must be signed in to change notification settings - Fork 2
How to Use and Update the SIDF
Overviews and instructions on how to install and run the SIDF are found in the SIDF README.
This page contains the notes for the How to Workshops that SIA runs.
Social Investment Data Foundation – is a set of scripts designed to create an analysis-ready dataset. By specifying a group of people to analyse and a set of arguments to specify what variables are created, building datasets ready to analyse can be accomplished in under a day.
[Put on whiteboard]
- Characteristics (gender, age, ethnicity, iwi and so on)
- Service metrics such as cost and duration (this is where the SIAL comes in)
- Outcome variables that involve complex logic that don’t fit into the above two categories
- Highest qualification
- Household composition
- Mother smokes
- Potentially avoidable hospitalisations
- Te reo speaker
- Age a first birth
- Contact records
- Diabetes
- Disability needs
- ECE participation
- Frequency of address changes
- Household composition
- Mental health and addiction indicator
- Social housing indicator
- Before school check scores
- And potentially others
Based on parameters you specify in a control script a series of tables will be created. These tables will have names such as:
SI_PD_COHORT_CHAR – characteristics as per the data foundation
SI_PD_COHORT_CHAR_EXT – any extensions you made to the standard SI_PD_COHORT_CHAR table
<sial_table>_RLPW – service metric tables in a wide table format (long tables have _RLPL suffix)
Outcomes tables don’t have standard names. If it is not one of the above tables or the control file or cohort table then it is an outcomes table
Note: A single analysis-ready table is not created as the control script has the potential to create thousands of variables which is not useful for analysis. It was decided that the user should determine what variables they want so the final joins to create a single analysis-ready dataset are left to the used.
Characteristics have intuitive names such as snz_sex_code, as_at_age and so on
Service metrics use a code based system of naming. This is best illustrated with an example.
A sample of a SIAL table is shown below
snz_uid | department | datamart | subject_area | start_date | end_date | cost | event_type |
---|---|---|---|---|---|---|---|
1234 | MSD | BEN | T1 | 01/01/2013 | 31/12/2013 | 987.65 | JSS |
1234 | MSD | BEN | T1 | 30/06/2014 | 31/12/2014 | 123.45 | JSS |
The SIAL tables are summarised to create a series of service metrics that use the following notation
The SIAL tables are summarised to create a series of service metrics that use the following notation
<W>_<XXX>_<YYY>_<ZZZ>_<AAA>
- W = the window. This is a time window for the analysis you are interested in [draw a picture to illustrate history, profile, forecast and analysis windows]
- XXX = the department
- YYY = the datamart
- ZZZ = subject area
- AAA = one of the following metrics: count (cnt), duration (dur), cost (cst), days since the first event (dsf), days since the last event (del).
For the benefit example some service metrics could be
- P_MSD_BEN_T1_JSS_dur
- F_MSD_BEN_T1_JSS_dur
- P_MSD_BEN_T1_JSS_cnt
- F_MSD_BEN_T1_JSS_cnt
If you are interested in benefits in general then you can also create P_MSD_BEN_T1_dur and so on.
Full details can be found in Appendix B of the Social Housing Technical Report.
Outcome names can vary but tend to be more intuitive than the service metric names e.g disability_needs_flag.
The SIA has used the data foundation in the following examples:
- Social housing test case
- Mental health test case
- Vulnerable mothers work
- Outcomes framework
It is recommended that you look at these to see the data foundation in action.
[Note do this in the sial_repo_template_workshop Show in Github the submodules for some of these repositories and explain how we need submodules so we don’t have multiple copies of the data foundation that all need to be updated manually.]
Each of these is tweaked slightly to suit the analysis. For instance, the vulnerable mothers work has two main scripts – two create two analysis tables via the data foundation; one for the children and one for the mothers.
If you do not have a copy of the SIDF in your IDI project folder refer to the README for installation instructions Break into three groups and go to the readme and follow the example
Warning: Do not have more than 3 groups running the SIDF. We have encountered severe performance issues when we run such workshops with multiple instances.
si_get_cohort_example.sas
Note the datetime format of the as_at_date.
si_control_example_pd.sas
Step through each of the variables in the control file.
L specifies all our costs are lump sum – if you have inconsistent tables you would need a manual work around.
Draw a diagram of the as at date and the periods before and after and explain the concept of a year using the social housing propensity matching as an example.
Quickly go over the price adjustment columns and refer them to social housing if they wish to see inflation or discounting.
Note that the flags for the agencies run all the tables e.g. si_use_moh will build all the moh variables for all the tables.
Then talk about the measures we have in particular the two types of counts and days since variables.
The rest of the script creates these as sas macro variables that can be used by other parts of the SAS script. It starts off as a long file because it is easier for the viewer to read but needs to be transposed to a wide table so that it is easier to do the macro variable creation.
si_main_example_pd.sas Explain that si_get_characteristics is the slow changing characteristics. If you want to add anything to the characteristics like filter our those who are attached to the spine, do that in si_get_characteristics_extension
Explain why we write to the database – in database processing is more efficient when it comes to joining to the SIAL tables.
The macro %si_wrapper_sial_rollup
is where the service metrics are constructed.
The macro %si_get_outcomes_ext
is where all the outcome variables are constructed
Open the SAS server and have a look in the work library. You will see a series of tables that have been created. The ones that end in _RLPW (rollup wide) create a wide table with the service metrics. To create such a table you need to ensure your column names do not exceed 28 characters (SAS limit is 32 but we suffix _cst _dur etc to the variables. The easiest way to ensure this happens is to not have any rollups more granular than the subject area.
If you need to rollup at the event level and the event_type name is long then use the si_rollup_output_type
variable to Long. This creates a long table with the suffix _RLPL. It would be up to the user to rename these and transpose to a wide table.
Open up the table and have a look at the column names. They match the naming convention explained earlier. Note (P1 is one year before the intervention date, P2 is two years before, F0 is the intervention date, F1 is one year after the date and so on.
SIAL_qualifcations is an outcome table.
It is up to you to decide what variables you want and to join all the tables together to create a final dataset.
Most of the enhancements to date have been creating new outcome variables. These are stored in the /sasautos
folder.
These new scripts should have unit tests with them to demonstrate that they work correctly. These are stored in the /unittests
folder.
If you wish for several outcomes to be generated at once you can add their macros to si_get_outcomes_ext.sas
Adding a person to an IDI Project
Access to the SIA Github organisation
Graph databases and visualisations
How to Create and Update the SIAL
How to Use and Update the SIDF
How to Use and Update the SIAtoolbox
How to Use and Update the Markdown Reports
How to Use Git and Version Control