Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to include original csv line in resulting tuple returned by CSV.decode(stream, options) #78

Open
npac opened this issue Mar 21, 2018 · 2 comments

Comments

@npac
Copy link

npac commented Mar 21, 2018

Hi,
Thanks for great library.
The CSV.decode(...) function returns a tuple, either {:ok, map()} in case of success or {:error, binary()} in case of failure to decode.

My usecase. Read a csv file. Insert a row in DB for each csv line. Invalid csv lines must be saved in a separate file for further analyses.

In my case i need to get original csv line in both cases (decode success or failure).

  • In case of decode failure, problematic lines are saved in a separate file for further analyses.
  • In case of successful decode, i still need original csv line since result of decoding must be inserted into a database table and if insert fails i also need to save that line in a file for further analyses.

Do you think that an option could be introduced in the library to return a tuple {:ok, map(), binary()} or {:error, binary(), binary()} where 3rd binary is raw data from input stream? I can try to submit a PR for that if it's ok.

So far i come up with next workaround for my case ...

path
  # stream line by line from csv file
  |> File.stream!()
  # Start stream transformation. We do a csv decode line by line.
  # Tradeoff: CSV.decode reports incorrect line number in case of failure to decode a line. Error will always refer to line 1 :( 
  |> Stream.transform(0, fn line, acc ->
    # Decode a CVS line. The result might be either {:ok, map} or {:error, reason}
    [result] = CSV.decode([line], separator: ?,, headers: [:a, :b, :c]) |> Enum.take(1)
    # we need to keep original line in resulting tuple.
    # in case of an error we must save this line in a separate file
    {[Tuple.append(result, line)], acc + 1}
  end)
  # Process result of decoding using a parallel stream
  # Here our stream contains a tuple 
  # either {:ok, %{...}, "foo, bar, baz"} in case of decode success 
  # or {:error, "Row has ... - expected .. line 1, "foo, bar, baz"} 
  |> ParallelStream.each(&process_decoded(&1))
  |> Stream.run()
@beatrichartz
Copy link
Owner

Hi, this is interesting. Could definitely start something on this, but would also accept a PR to make this optional behaviour.

@beatrichartz
Copy link
Owner

Maybe have a look at pr #95 - this might be what you are looking for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants