Extract2DDI

Purpose

To extract DDI metadata from SPSS and Stata files. It supports the most commonly used version of the DDI metadata standard version:

DDI-Codebook 2.5
DDI-Lifecycle 3.2 (Instance)
DDI-Lifecycle 3.3 (Fragment) - currently in development

It also adds functionality (for DDI-Lifecycle variants) to mute summary statistics on selected variables (e.g. exclude frequencies) either where these are not appropriate or to redact for other reasons. An optional message can be associated with each variable to indicate the reasons why.

Development History

This code base was originally forked from https://github.com/ncrncornell/ced2arddigenerator and uses the reader libraries:

https://github.com/ncrncornell/ced2arspssreader (SPSS)
https://github.com/ncrncornell/ced2ar-stata-reader (Stata)
DDI-Lifecycle was added
Redaction of summary statistics

Known issues

No support for redaction of summary statistics on the DDI-Codebook variant

This project contains java classes that will allow you to read several versions of Stata or SPSS data sets and generate out DDI 3.3 xml files.

This maven project generates two .jar files. One is used by developers and the other by end users. The jar files are:

Extract2DDI-without-dependencies.jar (Developers) The normal jar file that you can include in other projects. This jar depends on: ced2ar-stata-reader and ced2arspssreader. (This is the maven project artifact.)
Extract2DDI.jar (End Users) The runnable jar file you can use on a command line prompt.

Artifacts

Maven Central

Build

For Developers:

Clone the github repository to your machine.
Go to the root directory of the cloned repository.
Use maven 2 to build the project. On the command line, enter the following command

mvn clean install -Dgpg.skip If publishing, omit the -Dgpg.skip.

Usage

For Developers:

The best way to use this code is to include the jar file in an existing project, such as ced2ardata2ddi The following code is in: ced2ardata2ddi's DataFileRestController.java file

  if (file.getOriginalFilename().toLowerCase().endsWith(".dta")) {
    StataCsvGenerator gen = new StataCsvGenerator();
    variablesCSV = gen.generateVariablesCsv(fileLocation,summaryStats, recordLimit);
  } else if (file.getOriginalFilename().toLowerCase().endsWith(".sav")) {
    SpssCsvGenerator gen = new SpssCsvGenerator();
    variablesCSV = gen.generateVariablesCsv(fileLocation,summaryStats, recordLimit);
  }

For End Users:

Download ced2arddi3generator-jar-with-dependencies.jar
See Run Instructions in next section.

Run Instructions

Run from a terminal:

java -jar Extract2DDI.jar -f <filename> --config <filename> --format [2.5 |3.2] [ -s <sumstats> | -l <obsLimit> ]

usage: Options are as follows...

 -f <arg>      (required) data file name and extension.

 -l <arg>      (optional) limit number of observations to process.   Default: Process all observations

 -s <arg>      (optional) generate summary statistics.  Values: TRUE|FALSE   Default: TRUE
 
--config <arg> (optional) use config file with specified path. Format of the config file:
    agency=uk.closer
    ddilang=en-GB
    stats=max,min,mean,valid,invalid,freq,stdev
    outputfile=example_file_name
    sumstats=TRUE
    obsLimit=1000
    dataset_short_description: short description
    data_description: Description of file
    dataset_URI: https:example.com
    is_public : [0|1]

 --format [2.5 | 3.2]

 --exclude <arg> (optional) exclude statistics for variables specified in the file with specified path. Format for the exlude file:
    var_1=max:user message
    var_2=freq:removed frequencies
    
 --statistics (optional) Produce statistics file
 
 --frequencies (optional) Produce frequency file

2.5 format

java -jar Extract2DDI.jar -f filename.sav --format 2.5

3.2 format

java -jar Extract2DDI.jar -f filename.sav --config [filename] --format 3.2

Example

java -jar Extract2DDI.jar -f dataset.sav -s TRUE -l 1000

This run example generates the following files:

One DDI xml file. dataset.sav.xml
Two csv files:
- dataset.sav.vars.csv
- dataset.sav_var_values.csv
One log file. ced2arstatareader.log

Version: 1.3.0 7/31/18 Required: JDK 8.0

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github/workflows		.github/workflows
.settings		.settings
src		src
target		target
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE.md		LICENSE.md
README.md		README.md
example_file_name.xml		example_file_name.xml
pom.xml		pom.xml
test-file-3.2.xml		test-file-3.2.xml
test-file-3.3.xml		test-file-3.3.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract2DDI

Purpose

Development History

Known issues

Artifacts

Maven Central

Build

Usage

Run Instructions

2.5 format

3.2 format

About

Releases

Packages

Contributors 6

Languages

License

CLOSER-Cohorts/Extract2DDI

Folders and files

Latest commit

History

Repository files navigation

Extract2DDI

Purpose

Development History

Known issues

Artifacts

Maven Central

Build

Usage

Run Instructions

2.5 format

3.2 format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages