Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIARD-DK: When exporting table meta data, we must change 'typeOriginal' accordingly, if type is changed (CLOBS or BLOBS) #128

Open
ThomasKristensen opened this issue Feb 15, 2016 · 3 comments
Labels
siard-dk Specific module
Milestone

Comments

@ThomasKristensen
Copy link
Contributor

@andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.

https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124

Example:

Orginal Mysql Table:

CREATE TABLE `datatypes` (
  `col1` blob COMMENT 'Test comment',
  `col_key` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column',
  PRIMARY KEY (`col_key`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

Table metadata in generated SIARD-DK archive:

<table>
            <name>datatypes</name>
            <folder>table1</folder>
            <description>Description should be entered manually</description>
            <columns>
                <column>
                    <name>col1</name>
                    <columnID>c1</columnID>
                    <type>INTEGER</type>
                    <typeOriginal>BLOB</typeOriginal>
                    <nullable>true</nullable>
                    <description>Test comment</description>
                    <functionalDescription>Dokumentidentifikation</functionalDescription>
                </column>
                <column>
                    <name>col_key</name>
                    <columnID>c2</columnID>
                    <type>INTEGER</type>
                    <typeOriginal>INT</typeOriginal>
                    <nullable>false</nullable>
                    <description>Comment Primary Key Column</description>
                </column>
            </columns>
            <primaryKey>
                <name>PRIMARY</name>
                <column>col_key</column>
            </primaryKey>
            <rows>1</rows>
        </table>


We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).

@chalkos chalkos added this to the 2.0.0-beta1 milestone Feb 15, 2016
@chalkos chalkos added siard-dk Specific module export labels Feb 15, 2016
@chalkos chalkos removed this from the 2.0.0-beta1 milestone Feb 15, 2016
@chalkos chalkos added export siard-dk Specific module and removed export siard-dk Specific module labels Feb 15, 2016
@andreaskring
Copy link
Contributor

I will fix this

2016-02-15 14:33 GMT+01:00 Thomas Kristensen notifications@github.com:

@andreaskring https://github.com/andreaskring When exporting columns
with CLOBS or BLOBS, we change the column type to integer (as it is
supposed to identify the file with the binary data). When we do this
transformation, we also need to change to 'orginalType' value, as the
import mechanism sometimes relies on the original type for determining the
type of the columns.

https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124

Example:

Orginal Mysql Table:

CREATE TABLE datatypes (
col1 blob COMMENT 'Test comment',
col_key int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column',
PRIMARY KEY (col_key)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

Table metadata in generated SIARD-DK archive:

datatypestable1Description should be entered manually col1 c1 INTEGER BLOB true Test comment Dokumentidentifikation col_key c2 INTEGER INT false Comment Primary Key Column PRIMARY col_key 1

We this archive is imported in MySQL, this will result in the 'col1'
column getting datatype 'blob', which is not what we want, as the data will
now be integers (ids of files with the exported blobs in the archive).


Reply to this email directly or view it on GitHub
#128.

Andreas Kring
Magenta Aps
Incuba, Åbogade 15
DK-8200 Aarhus N

tel +45 3336 9699
dir +45 8177 1698

http://www.magenta.dk
andreas@magenta.dk

@ThomasKristensen
Copy link
Contributor Author

@chalkos chalkos removed the export label Mar 9, 2016
@chalkos
Copy link
Contributor

chalkos commented May 11, 2016

Some parts of dbptk relied on the "original type", but this has proven to be a bad practice. Sure, it easily solves some problems for now, but in the long run relying on the original type will cause a lot of trouble due to the variety of values that can be present in that field.
The preferred way to recognise types is to use the SQL standard type (99, 2008, etc). This may lead to some type changes (ie, text fields becoming CLOBs) but that may be acceptable as long as no data is lost (for that the new type must be able to contain all values that could be represented in the original type).

@chalkos chalkos modified the milestones: Unplanned, Screening Jan 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
siard-dk Specific module
Projects
None yet
Development

No branches or pull requests

3 participants