-
Notifications
You must be signed in to change notification settings - Fork 33
/
index.html
141 lines (129 loc) · 9.38 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" href="images/favicon.png">
<title>CSV Schema</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="https://netdna.bootstrapcdn.com/bootstrap/3.0.2/css/bootstrap.min.css">
<!-- Custom styles for this template -->
<link href="css/tna.css" rel="stylesheet">
<!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
<script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script>
<![endif]-->
</head>
<body>
<!-- Wrap all page content here -->
<div id="wrap">
<!-- Fixed navbar -->
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="https://www.nationalarchives.gov.uk" title="Go to the The National Archives homepage"><img src="images/logo-white.png" alt="The National Archives" id="logo"> Digital Preservation</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li><a href="https://digital-preservation.github.io/csv-validator">CSV Validator</a></li>
<li class="active"><a href="#">CSV Schema</a></li>
<li><a href="https://digital-preservation.github.io/csv-schema/csv-schema-1.1.html">CSV Schema Specification</a></li>
</ul>
</div><!--/.nav-collapse -->
</div>
</div>
<!-- Begin page content -->
<div class="container">
<div class="page-header">
<h1>CSV Schema</h1>
</div>
<p class="lead">A text based schema language (<code>CSV Schema</code>) for describing data in CSV files for the purposes of validation. Released as Open Source under the <a href="https://www.mozilla.org/MPL/2.0/">Mozilla Public Licence version 2.0</a>.</p>
<div id="toc"></div>
<div>
<h2>Overview</h2>
<p>Firstly, we defined a Grammar which describes a language for expressing rules to validate a CSV file. We call such an expression of this language a <code>CSV Schema</code>. The grammar itself is more formally described in <code>EBNF</code> and is available in the <a href="csv-schema-1.1.html">CSV Schema Specification</a>.</p>
<p>Secondly, we created a reference implemention, in the form of a Validator Tool and API (<a href="https://digital-preservation.github.io/csv-validator"><code>CSV Validator</code></a>) that will take a <a href="https://digital-preservation.github.io/csv-schema">CSV Schema</a> file and a CSV file, verify that the CSV Schema itself is syntactically correct and then assert that each rule in the CSV Schema holds true for the CSV file.</p>
<p>The Schema and Validator can really be considered separately, you do not need to be aware of the validation tool or API to author CSV Schema.</p>
<div>
<h3>Background</h3>
<p>The National Archives receive Metadata along with Collections of Digitised or Born-Digital Collections. Whilst The National Archives typically process Metadata in XML and RDF, it was recognised that it was too difficult and/or expensive for many suppliers to produce the desired metadata in XML and/or RDF, as such it was decided that Metadata would be received in CSV format.</p>
<p>Our experience shows that when suppliers are asked to produce metadata in XML or RDF there are several possible barriers:
<ul>
<li>Many content/document repository systems only export metadata in CSV, or generate XML or RDF in a non-desirable format which would then have to be transformed (at further cost).</li>
<li>Lack of technical knowledge in either XML or RDF.</li>
<li>Lack of experience of tools for producing and validating XML or RDF.</li>
<li>Cost. Installing new software tools comes at a severe cost for those OGDs that have outsourced their IT support.</li>
<li>Best/Worst case, most suppliers already have Microsoft Excel (or an equivalent) installed which they know how to use to produce a CSV file.</li>
</ul>
</p>
<p>The National Archives set exacting requirements on the Metadata that they expect and the format of that Metadata. Such constraints enable them to automatically process it, as the semantics of the metadata are already defined. Whilst previous bespoke tools have been developed in the past for validating data in various CSV files, it was felt that a generic open tool which could be shared with suppliers would offer several benefits:
<ul>
<li>A common CSV Schema language, would enable The National Archives to absolutely define required Metadata formats.</li>
<li>Developed CSV Schemas could be shared with suppliers and other archival sector organisations.</li>
<li>Suppliers could validate Metadata before sending it to The National Archives, by means of our <a href="https://github.com/digital-preservation/csv-validator">CSV Validator</a> tool. Hopefully reducing mistakes and therefore costs to both parties.</li>
<li>The National Archives could use the same tool to ensure Metadata compliance automatically.</li>
<li>Although not of primary concern, it was recognised that this tool would also have value for anyone working with CSV as a data/metadata transfer medium.</li>
</ul>
</p>
</div>
<div>
<h2>CSV Schema Language</h2>
<p>
The CSV Schema Language is defined in the <a href="csv-schema-1.1.html">CSV Schema Language 1.1 specification</a>, (this supersedes the original <a href="csv-schema-1.0.html">CSV Schema Language 1.0 specification</a> as 25 January 2016).
It is suggested that the extension .csvs be used for CSV Schema Language files. There is also a working draft of <a href="csv-schema-1.2.html">CSV Schema Language 1.2</a>, with a few new features.
</p>
</div>
<div>
<h2>Reference Implementation</h2>
<p>For details of the CSV Validator tool and API see <a href="https://github.com/digital-preservation/csv-validator">https://github.com/digital-preservation/csv-validator</a>.</p>
</div>
<div>
<h2>Example CSV Schemas</h2>
<p>
In order to understand how to write CSV Schemas in practice, see the example CSV Schema file,
<a href="https://github.com/digital-preservation/csv-schema/blob/master/example-schemas/generic_digitised_surrogate_tech_acq_metadata_v1.1.csvs">digitised_surrogate_tech_acq_metadata_v1.1_TESTBATCH000.csvs</a>,
in the GitHub repository <a href="https://github.com/adamretter/csv-schema/tree/master/example-schemas">digital-preservation/csv-schema/example-schemas</a>.
In the <a href="https://github.com/digital-preservation/csv-schema/tree/master/example-schemas/example-data">example-data</a> subfolder you will find a CSV file,
<a href="https://github.com/digital-preservation/csv-schema/blob/master/example-schemas/example-data/digitised_surrogate_tech_acq_metadata_v1_TESTBATCH000.csv">digitised_surrogate_tech_acq_metadata_v1_TESTBATCH000.csv</a>,
which complies with the schema. This CSV file refers to XML files in the folder structure below <a href="https://github.com/digital-preservation/csv-schema/tree/master/example-schemas/example-data/TEST_1">TEST_1</a>
</p>
</div>
<div>
<h2>For Software Developers</h2>
<p>See <a href="https://github.com/digital-preservation/csv-schema">https://github.com/digital-preservation/csv-schemas</a>.</p>
</div>
</div>
</div>
</div>
<div id="footer">
<div class="container">
<p class="text-muted credit">Copyright © 2014 <a href="https://nationalarchives.gov.uk">The National Archives</a>.</p>
</div>
</div>
<!-- Bootstrap core JavaScript
================================================== -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://code.jquery.com/jquery-1.10.2.min.js"></script>
<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.2/js/bootstrap.min.js"></script>
<script src="js/jquery.toc.min.js"></script>
<script language="JavaScript">
<!--
$(document).ready(function(){
$('#toc').toc({
'selectors': 'h2,h3' //elements to use as headings
});
});
-->
</script>
</body>
</html>