GCP-CLEANER

Source code for a docker image that clean file data based on the data types

Motivation

If you will migrate to Big Query, often you will face issues about the integrity of the data that you send. This image will detect mixed data in your file and asign a null value.

How it works?

This image needs to know:

FILENAME=file.txt
SCHEMA=squema.json
DELIMITER=\t
HEADER=false
SPLIT_ROWS=100000

You will send this as env vars.

Notes:

This image use chunks for default, i recommend using 100000 rows as the size limit fo the chunk but this number depends on your system spec.
The output file will be delimited with commas(,)

Use with docker

Make sure to create: in, out and schema directories
You will put the file in the "in" directory (kinda redundant)
the result will be in "out"
And put the squema in .. you guess it.

First pull the image

docker pull ricardotryit/file-cleaner

Run the Docker image

docker run --rm -v $(shell pwd)/in:/home/in/ -v $(shell pwd)/out:/home/out/ -v $(shell pwd)/schema:/home/schema/ --env-file=$(shell pwd)/.env ricardotryit/file-cleaner

Notes:

You can use the make file in the repository and do everything above by just typing make pull and make run

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCP-CLEANER

Motivation

How it works?

Use with docker

About

Releases

Packages

Languages

richyxi/gcp-cleaner

Folders and files

Latest commit

History

Repository files navigation

GCP-CLEANER

Motivation

How it works?

Use with docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages