Skip to content

readr 1.1.0

Compare
Choose a tag to compare
@jimhester jimhester released this 03 Apr 14:02

readr 1.1.0

This release contains mainly bug fixes and feature improvements suggested by the community. A couple of more significant features are connection support for the write_*() functions and parse_factor(levels = NULL).

Connection support for write_*() functions allow one to write directly to compressed formats such as .gz, bz2 or .xz and readr will automatically open the appropriate connection if a filename with one of the those suffixes is supplied as an argument.

parse_factor(levels = NULL), will produce a factor column based on the levels in the data, which mimics parsing of factors in base R.

New features

Parser improvements

  • parse_factor() gains a include_na argument, to include NA in the factor levels (#541).
  • parse_factor() will now can accept levels = NULL, which allows one to generate factor levels based on the data (like stringsAsFactors = TRUE) (#497).
  • parse_numeric() now returns the full string if it contains no numbers (#548).
  • parse_time() now correctly handles 12 AM/PM (#579).
  • problems() now returns the file path in additional to the location of the error in the file (#581).
  • read_csv2() gives a message if it updates the default locale (#443, @krlmlr).
  • read_delim() now signals an error if given an empty delimiter (#557).
  • write_*() functions witting whole number doubles are no longer written with a trailing .0 (#526).

Whitespace / fixed width improvements

  • fwf_cols() allows for specifying the col_positions argument of
    read_fwf() with named arguments of either column positions or widths
    (#616, @jrnold).
  • fwf_empty() gains an n argument to control how many lines are read for whitespace to determine column structure (#518, @yeedle).
  • read_fwf() gives error message if specifications have overlapping columns (#534, @gergness)
  • read_table() can now handle pipe() connections (#552).
  • read_table() can now handle files with many lines of leading comments (#563).
  • read_table2() which allows any number of whitespace characters as delimiters, a more exact replacement for utils::read.table() (#608).

Writing to connections

  • write_*() functions now support writing to binary connections. In addition output filenames with .gz, .bz2 or .xz will automatically open the appropriate connection and to write the compressed file. (#348)
  • write_lines() now accepts a list of raw vectors (#542).

Miscellaneous features

  • col_euro_double(), parse_euro_double(), col_numeric(), and parse_numeric() have been removed.
  • guess_encoding() returns a tibble, and works better with lists of raw vectors (as returned by read_lines_raw()).
  • ListCallback R6 Class to provide a more flexible return type for callback functions (#568, @mmuurr)
  • tibble::as.tibble() now used to construct tibbles (#538).
  • read_csv, read_csv2, and read_tsv gain a quote argument, (#631, @noamross)

Bugfixes

  • parse_factor() now converts data to UTF-8 based on the supplied locale (#615).
  • read_*() functions with the guess_max argument now throw errors on inappropriate inputs (#588).
  • read_*_chunked() functions now properly end the stream if FALSE is returned from the callback.
  • read_delim() and read_fwf() when columns are skipped using col_types now report the correct column name (#573, @cb4ds).
  • spec() declarations that are long now print properly (#597).
  • read_table() does not print spec when col_types is not NULL (#630, @jrnold).
  • guess_encoding() now returns a tibble for all ASCII input as well (#641).