Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(csv) Add a CsvParser.Feature.SKIP_EMPTY_LINES to allow skipping empty rows #15

Closed
cowtowncoder opened this issue Mar 20, 2017 · 17 comments
Labels
Milestone

Comments

@cowtowncoder
Copy link
Member

Looks like a commonly supported option with CSV (family of formats) is to allow skipping of empty rows; that is, ones that only contain white-space.

@cowtowncoder
Copy link
Member Author

note: Original had 5 thumbs-up, couple of +1s. So highly requested.

@kg29
Copy link

kg29 commented Apr 9, 2017

Is anyone working on this? I would be happy to pick it up.

@cowtowncoder
Copy link
Member Author

@kg29 No one is workign on this as far as I know, so help would be highly valued!

@kunickiaj
Copy link

This is the only thing preventing us from switching from commons-csv.

@cowtowncoder
Copy link
Member Author

@kunickiaj One thing that could help (I may have time to look into this after completing non-blocking json parser) would be a failing unit test that simply shows how you'd like this to work. That is, test that would pass after fix.

@kunickiaj
Copy link

Yeah, I may look into this once I get a free moment.

We did some benchmarks against jackson csv + guard clause on our side vs commons-csv and it pretty much wiped out any performance gains from moving to jackson.

Figure fixing it here should be a much better option.

@kunickiaj
Copy link

Here's a trivial test which would pass given the desired behavior.

https://github.com/kunickiaj/jackson-dataformats-text/commit/b34aa58b5cb0a6926485d70fc17d2bc296eac925

@cowtowncoder
Copy link
Member Author

Excellent thank you; I'll have a look.

@cowtowncoder cowtowncoder changed the title (csv) Add a CsvSchema option to allow skipping empty rows (csv) Add a CsvParser.Feature.SKIP_EMPTY_LINES to allow skipping empty rows Jun 16, 2017
@cowtowncoder
Copy link
Member Author

Quick note: I think this makes more sense as CsvParser.Feature than CsvSchema setting since it's not as much a property of document than handling (although I can see why someone might argue otherwise too). But I think this is more convenient from user perspective.

@cowtowncoder
Copy link
Member Author

Hmmh. Looking at code, this may not be easy to implement. As things are, START_OBJECT (or, without schema, START_ARRAY) is returned without looking at line. Logic would need to check for this setting, then do look-ahead... maybe that works.

@cowtowncoder
Copy link
Member Author

For documentation purposes, I'll note that feature itself was actually added in 2.9, but implementation not. This is unfortunate.

@ehills
Copy link

ehills commented Feb 7, 2018

Hi @cowtowncoder how did this go? Has this been implemented? Your last comment was a wee bit confusing.
Cheers,
Ed

@cowtowncoder
Copy link
Member Author

@ehills No, functionality has not been added and does not exist.

@feirorum
Copy link

feirorum commented Apr 5, 2019

Maybe I misunderstand something here, but from the description in the first comment, I think this should be described as "skip blank lines", i.e. lines with only whitespace. Maybe you'd like to add SKIP_BLANK_LINES instead of changing SKIP_EMPTY_LINES to do this, otherwise code which uses the library could change its behaviour when using the new version.

Also, in the linked unit test I don't find any blank (only whitespace) lines, just empty lines:

Current CSV string:
final String CSV = "1,"xyz"\n\ntrue,\n";

Test for blank line being removed would need something like:
final String CSV = "1,"xyz"\n\ntrue,\n \n";

The rest of the test would be identical.

Cheers, Anders

@jabengozar
Copy link

jabengozar commented Jul 1, 2019

Not works for me yet

@cowtowncoder
Copy link
Member Author

Good point on "skip blank" vs "skip empty".

@cowtowncoder cowtowncoder added 2.10 and removed 2.9 labels Oct 5, 2019
vboulaye added a commit to vboulaye/jackson-dataformats-text that referenced this issue Oct 5, 2019
vboulaye added a commit to vboulaye/jackson-dataformats-text that referenced this issue Oct 5, 2019
vboulaye added a commit to vboulaye/jackson-dataformats-text that referenced this issue Oct 6, 2019
vboulaye added a commit to vboulaye/jackson-dataformats-text that referenced this issue Oct 7, 2019
@cowtowncoder
Copy link
Member Author

And there was much rejoining for now @vboulaye's patch is merges into 2.10, and the feature shall be supported in patch release 2.10.1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants