Skip to content

Commit

Permalink
Fixed encoding for MovieLens data
Browse files Browse the repository at this point in the history
  • Loading branch information
ankane committed Sep 30, 2023
1 parent f91b3f5 commit af04f3c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/disco/data.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def load_movielens
file_hash: "06416e597f82b7342361e41163890c81036900f418ad91315590814211dca490")

# convert u.item to utf-8
movies_str = File.read(item_path).encode("UTF-8", "binary", invalid: :replace, undef: :replace, replace: "")
movies_str = File.read(item_path).encode("UTF-8", "ISO-8859-1")

movies = {}
CSV.parse(movies_str, col_sep: "|") do |row|
Expand Down

0 comments on commit af04f3c

Please sign in to comment.