Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple input files support #6

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mariusmilea
Copy link

We're heavily using this tool to convert a couple of GBs of JSON files into AVRO every day.
It was useful for me to have this tool to accept more JSON files as input, hence my commit here.
If you need to convert a batch of json files, originally, json2avro could only be used like this:

cat file1.json file2.json file3.json | json2avro -S schema_file output.avro

With this patch, json2avro can also be used like this:

json2avro -S schema_files file1.json file2.json file3.json output.avro

eliminating thus the cat utility or any other utility used to concatenate the input files.
The performance improvement is between 1 and 1.5 seconds for a batch of 160MB of JSON files, when running json2avro with multiple input files.

@grisha
Copy link
Owner

grisha commented Jan 3, 2016

@spil-marius Sorry - I somehow never saw this pull request until now. How has this been working for you, do you think this is ok to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants