Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDDLSimServer data.json file truncated #254

Open
GMMDMDIDEMS opened this issue Mar 11, 2024 · 3 comments
Open

RDDLSimServer data.json file truncated #254

GMMDMDIDEMS opened this issue Mar 11, 2024 · 3 comments

Comments

@GMMDMDIDEMS
Copy link
Contributor

GMMDMDIDEMS commented Mar 11, 2024

With large instances (many objects and states) and correspondingly large data.json files, it very often happens that the data.json files are cut off, i.e. a part is missing and is not formatted correctly.

I cannot exactly identify the cause of the problem, but it is not due to the implementation. I can rule out that it is due to a lack of disk space, and it shouldn't be due to memory either, as a Docker container has no resource constraints by default. No error is thrown if a file is saved incorrectly formatted/truncated and it is also not possible to say that a file is no longer written correctly above a certain size. Some files with 13MB were written correctly and some whose correct size is 8MB were only written up to 2.4MB.

However, switching from json to orjson, a faster and more memory-efficient alternative, eliminates all problems.

Fix

def dump_data(self, fn):

with

import orjson

def dump_data(self, fn):
   """Dumps the data to a json file."""
   json_content = orjson.dumps(self.logs)
   with open(fn, mode="wb") as f:
      f.write(json_content)
@mike-gimelfarb
Copy link
Collaborator

Hmm so it's not us. We noticed this a while ago when running the code on a compute server, and thought it was something on the server's end. Now you've pointed out that it is basically the json package at fault. We'll roll out your solution soon. Alternatively if you wish to make a PR, you are welcome to do so. The extra analysis and debugging is also greatly appreciated, thanks!

@GMMDMDIDEMS
Copy link
Contributor Author

I have tested a little further and there are also a few cases with orjson where the json file is truncated. Currently I suspect that it is due to peak memory consumption when the server.logs dict is serialised to a JSON formatted stream.

When I have some more time I will test further to verify the actual cause of the problem. Most likely I won't get a chance to do so in the next week.

@mike-gimelfarb
Copy link
Collaborator

mike-gimelfarb commented Mar 22, 2024

Might it have something to do with a limit on the server where the code is running?
Maybe we should test to see if incrementally writing to a json file from the dict in append mode key-by-key also causes the same problem? Alternatively, we could consider switching to csv or, even better, use the built in CSV logger we already have in pyrddlgym instead of json.
In any case, thanks again for all your help in debugging, it is really appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants