Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

davidfuhry
Copy link

@davidfuhry davidfuhry commented Jan 15, 2020

When using Windows readLines will expect text files to be encoded in Windows-1252.

This would add an optional encoding argument to TextReuseCorpus as well as TextReuseTextDocument which can be used to explicitly specify the encoding of the input files (mostly UTF-8).

As it defaults to "unknown" which is the default for readLines this should maintain backward compatability.

Edit: I forgot to mention that this specific issue can be worked around by setting options(encoding = "UTF-8") before creating the corpus however this has some side effects so I still think having an encoding argument is the better way to deal with this.

@codecov-io
Copy link

codecov-io commented Jan 15, 2020

Codecov Report

Merging #89 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
+ Coverage   86.77%   86.81%   +0.04%     
==========================================
  Files          25       25              
  Lines         658      660       +2     
==========================================
+ Hits          571      573       +2     
  Misses         87       87
Impacted Files Coverage Δ
R/TextReuseTextDocument.R 91.52% <100%> (ø) ⬆️
R/TextReuseCorpus.R 82.69% <100%> (+0.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a96fec3...5d68a16. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants