Skip to content

Commit

Permalink
FIX: Support UTF-8 encoding for JSON files (#1357)
Browse files Browse the repository at this point in the history
* WIP: add ensure_ascii flage to _write_json

* Revert "WIP: add ensure_ascii flage to _write_json"

This reverts commit 4c47679.

* Dont Force ASCII encoding in _write_json

* TST: Add a test

TIL: That json.loads will always convert unicode. So to test that unicode was properly encoded while writing to disk, I had to had to just read the text on disk without the json module

* DOC: update changelog

* Commit Dan's suggestion

Instead of closing and re-opening the file, rewind the "playhead" to the start of the open file, then use fid.read() as usual

Co-authored-by: Daniel McCloy <dan@mccloy.info>

---------

Co-authored-by: Daniel McCloy <dan@mccloy.info>
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
  • Loading branch information
3 people authored Jan 1, 2025
1 parent 3492fa0 commit 3f59b0e
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 3 deletions.
2 changes: 2 additions & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The following authors had contributed before. Thank you for sticking around!

* `Stefan Appelhoff`_
* `Daniel McCloy`_
* `Scott Huberty`_

Detailed list of changes
~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -47,6 +48,7 @@ Detailed list of changes
^^^^^^^^^^^^

- :func:`mne_bids.read_raw_bids` can optionally return an ``event_id`` dictionary suitable for use with :func:`mne.events_from_annotations`, and if a ``values`` column is present in ``events.tsv`` it will be used as the source of the integer event ID codes, by `Daniel McCloy`_ (:gh:`1349`)
- :func:`mne_bids.make_dataset_description` now correctly encodes the dataset description as UTF-8 on disk, by `Scott Huberty`_ (:gh:`1357`)

⚕️ Code health
^^^^^^^^^^^^^^
Expand Down
11 changes: 9 additions & 2 deletions mne_bids/tests/test_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ def test_make_dataset_description(tmp_path, monkeypatch):
make_dataset_description(
path=tmp_path,
name="tst2",
authors="MNE B., MNE P.",
authors="MNE B., MNE P., MNE Ł.",
funding="GSOC2019, GSOC2021",
references_and_links="https://doi.org/10.21105/joss.01896",
dataset_type="derivative",
Expand All @@ -386,7 +386,14 @@ def test_make_dataset_description(tmp_path, monkeypatch):

with open(op.join(tmp_path, "dataset_description.json"), encoding="utf-8") as fid:
dataset_description_json = json.load(fid)
assert dataset_description_json["Authors"] == ["MNE B.", "MNE P."]
assert dataset_description_json["Authors"] == ["MNE B.", "MNE P.", "MNE Ł."]
# If the text on disk is unicode, json.load will convert it. So let's test that
# the text was encoded correctly on disk.
fid.seek(0)
# don't use json.load here, as it will convert unicode to str
dataset_description_string = fid.read()
# Check that U+0141 was correctly encoded as Ł on disk
assert "MNE Ł." in dataset_description_string

# Check we raise warnings and errors where appropriate
with pytest.raises(
Expand Down
2 changes: 1 addition & 1 deletion mne_bids/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ def _write_json(fname, dictionary, overwrite=False):
f'"{fname}" already exists. Please set overwrite to True.'
)

json_output = json.dumps(dictionary, indent=4)
json_output = json.dumps(dictionary, indent=4, ensure_ascii=False)
with open(fname, "w", encoding="utf-8") as fid:
fid.write(json_output)
fid.write("\n")
Expand Down

0 comments on commit 3f59b0e

Please sign in to comment.