Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/newline check #58

Merged
merged 11 commits into from
Jan 30, 2024
Merged

Feature/newline check #58

merged 11 commits into from
Jan 30, 2024

Conversation

AlexAxthelm
Copy link
Collaborator

reflows checking for newline at end of file.

On unix-ish (Linux + MacOS), uses tail -c1 to grab the last byte of the file, and then wc -l to count newlines in that string (of 1 byte).

Maintains previous (pure R) logic for windows systems.

Copy link

github-actions bot commented Dec 20, 2023

Coverage Report
file head main diff
Overall 57% 58% 🔻 -0.89%
R/canonize_path.R 91% 91% 0%
R/char_to_.R 0% 0% 0%
R/determine_headers.R 72% 72% 0%
R/get_csv_specs.R 0% 0% 0%
R/guess_delimiter.R 100% 100% 0%
R/guess_file_encoding.R 89% 89% 0%
R/guess_numerical_mark.R 94% 94% 0%
R/has_binary_null.R 100% 100% 0%
R/has_consistent_fields_per_line.R 100% 100% 0%
R/has_newline_at_end.R 66% 100% 🔻 -34%
R/is_file_accessible.R 100% 100% 0%
R/is_readable_file.R 100% 100% 0%
R/is_text_file.R 100% 100% 0%
R/is_valid_currency_code.R 100% 100% 0%
R/is_valid_cusip.R 100% 100% 0%
R/is_valid_isin.R 100% 100% 0%
R/read_portfolio_csv.R 86% 86% 0%
R/simplify_if_one_col_df.R 100% 100% 0%

@cjyetman
Copy link
Member

cjyetman commented Jan 9, 2024

This fails on the Windows test, which I assume should be resolved before merging.

@AlexAxthelm
Copy link
Collaborator Author

@cjyetman Now that I'm looking at these tests again, I think the file creation is failing on Windows (depends on echo, printf, etc.) Trying some different ideas now.

@AlexAxthelm
Copy link
Collaborator Author

@cjyetman Note that there is a (small) logical change in commit 69d37df to not accept \r as a final character (only \n).

I believe this is acceptable behavior, since both unix-ish and Windows end their line endings in \n: \n for unix, and \r\n for Windows. Wikipedia is suggesting that ending in \r is not used any more (\r for Commodore, Apple II, and classic Mac, and \n\r for similarly old OSs)

R/has_newline_at_end.R Outdated Show resolved Hide resolved
R/has_newline_at_end.R Outdated Show resolved Hide resolved
@cjyetman
Copy link
Member

last test run failed on macOS

Co-authored-by: CJ Yetman <cyetman@rmi.org>
@cjyetman
Copy link
Member

Since you're making big changes to has_newline_at_end(), and I failed to make tests for it before, I opened #60 to add tests for has_newline_at_end() (as it is). I'd like to see that merged first and then this PR updated to main so that it's easy to test that any changes made here do not break existing behavior. Then I'd like to see any new tests you make for file_has_newlines_at_end() live in that same test file and follow the same pattern/strategy, creating temporary files within the test that get auto destroyed using withr::local_tempfile(). Since these test files are easy to create in R, I would rather see them created within the test so that the specification of the test file is right next to the test which uses them, rather than having an additional directory full of static test files. If you need to convince yourself that it works, consider the following...

writeChar("A,B\n1,2", "no_newline.txt", eos = NULL)
writeChar("A,B\n1,2\n", "newline.txt", eos = NULL)
writeChar("A,B\r1,2\r", "return.txt", eos = NULL)
writeChar("A,B\r\n1,2\r\n", "return_newline.txt", eos = NULL)

readChar("no_newline.txt", 10)
#> [1] "A,B\n1,2"
readChar("newline.txt", 10)
#> [1] "A,B\n1,2\n"
readChar("return.txt", 10)
#> [1] "A,B\r1,2\r"
readChar("return_newline.txt", 10)
#> [1] "A,B\r\n1,2\r\n"

readBin("no_newline.txt", what = "raw", n = 10)
#> [1] 41 2c 42 0a 31 2c 32
readBin("newline.txt", what = "raw", n = 10)
#> [1] 41 2c 42 0a 31 2c 32 0a
readBin("return.txt", what = "raw", n = 10)
#> [1] 41 2c 42 0d 31 2c 32 0d
readBin("return_newline.txt", what = "raw", n = 10)
#>  [1] 41 2c 42 0d 0a 31 2c 32 0d 0a

pacta.portfolio.import:::has_newline_at_end("no_newline.txt")
#> [1] FALSE
pacta.portfolio.import:::has_newline_at_end("newline.txt")
#> [1] TRUE
pacta.portfolio.import:::has_newline_at_end("return.txt")
#> [1] TRUE
pacta.portfolio.import:::has_newline_at_end("return_newline.txt")
#> [1] TRUE

using that strategy, you can easily build tests like...

test_that("has_newline_at_end identifies no trailing newline or return", {
  no_newline <- withr::local_tempfile()
  writeChar("A,B\n1,2", no_newline, eos = NULL)
  expect_true(tail(readBin(no_newline, what = "raw", n = 10), n = 1) == charToRaw("2"))
  expect_false(has_newline_at_end(no_newline))
})
#> Test passed 🎉

test_that("has_newline_at_end identifies trailing newline", {
  newline <- withr::local_tempfile()
  writeChar("A,B\n1,2\n", newline, eos = NULL)
  expect_true(tail(readBin(newline, what = "raw", n = 10), n = 1) == charToRaw("\n"))
  expect_true(has_newline_at_end(newline))
})
#> Test passed 🥇

change a conditional that had been pinned to FALSE to check for unix platforms
Copy link
Member

@cjyetman cjyetman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@cjyetman cjyetman added this pull request to the merge queue Jan 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Jan 30, 2024
@AlexAxthelm AlexAxthelm added this pull request to the merge queue Jan 30, 2024
@cjyetman cjyetman removed this pull request from the merge queue due to a manual request Jan 30, 2024
@cjyetman cjyetman merged commit 54a8113 into main Jan 30, 2024
8 checks passed
@cjyetman cjyetman deleted the feature/newline-check branch January 30, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

encoding test crashes R session on macOS
2 participants