Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reading sysfs files - reopen each time vs caching open filehandles #12

Open
fuzzycow opened this issue Oct 20, 2014 · 7 comments
Open

Comments

@fuzzycow
Copy link

This is not a bug, but a general observation:

It looks like most implementations of ev3dev language bindings choose to reopen sysfs files each time anew, rather then caching an open filehandle (open once, seek to the beginning before reading).

While reopen approach is much "safer" - it has a serious performance penalty.

Unfortunately my primary experience is with language bindings not in this repo, but I believe the above applies to lesser or higher degree to all implementations.

In a small python test using cached filehandle is over x3.5 faster, and golang test with 3 concurrent readers showed x10 performance improvement.

@WasabiFan
Copy link
Member

@fuzzycow Sorry for the late response; I have been trying to juggle a lot of different tasks over the last few weeks, so I am not as fast to respond as I would like to be.

At this time, the internal handling of file handles is left up to the implementation, and not specified in the spec, as it can be language-dependent.

As you said, the tradeoff is generally being safe and slow(er) vs possibly unsafe and fast(er). But the way I see it, there is not always a clear benefit to doing it either way:

  • In some languages (such as JavaScript and Node.JS), the user is giving up the execution speed in exchange for a less work-intensive development cycle. Just starting the Node interpreter can take a second on the EV3, and I'm not even sure that you can maintain a file handle in the same way that one would in a language like C++.
  • Keeping file handles means that you need to use slightly more memory, and have to check to see if you have a cached copy of the handle before reading from it. You also have to seek to the beginning of the file each time. These effects may be negligible, but it really depends on the language and implementation. I have found that even small things like this can make a huge difference on the EV3's processor.
  • If we have a file handle open to a device's property file and that device is unplugged, how would we handle that? We could check every time that we read from a file to make sure that it's still valid, or maybe catch the errors when they are thrown. But that leads to more checks that we have to do.

So in the end, I see your point, but I don't think that we would be able to draw a clear-cut conclusion that every binding must maintain the open file handle. Here's my proposal: we put a note in our spec that asks the maintainers to be cognisant of the tradeoffs between the two options, and advises that they experiment with both to decide how they implement it. Does that sound good?

@fdetro You are more knowledgeable on the subject of performance maximization than I am; jump in if you have anything to add.

@fdetro
Copy link
Contributor

fdetro commented Oct 27, 2014

@WasabiFan I agree with your comments regarding CPU vs. resource / memory usage optimization. As one normally does not read sensor / motor values / attributes continuously from different threads and typically also with a quite low frequency (some 10 Hz?), the small overhead of re-opening the file every time shouldn't really hurt.

On the other side with a larger number of sensors and motors connected one can end up with quite a lot of open file handles in the case of caching, which will use memory and OS resources even if used only once.

So I would vote for not caching the handles in the standard case. If someone has a use case where read performance really matters, he can easily optimize this case with a custom implementation.

@fuzzycow
Copy link
Author

Thanks for responding!

I agree with your comments - its often best to default to pragmatic approach, and to advise developers on the potential impact and ways for further optimization.

I would still like to share some thoughts om the issue - for those who may want to do alot of sysfs file io on ev3:

(EDIT - correct performance numbers for python)
In my simple tests single-process/single-thread test (python) managed to process on the order of ~600 open/read/close per second.
If we consider a project reading ~5 sensor values, and ~10 motor values (e.g.: datalogging/charting) - you end up with ~15 reads per iteration.
At 10Hz this translates into ~1/4 of max available cpu time being spent on open/close.

Golang-specific notes:
Single coroutine (green/light thread) performance in Golang was actually lower then in python.
I did some basic profiling on the test application, and it seems that a single system call in golang arm file open code (this golang bit is written asm) is "responsible".
Golang concurrent coroutines test without cached filehandles suggests that while one coroutine is doing open another coroutine can not progress. I'm not sure if only I/O operation are blocked process-wide, or the whole process is paused.
x10 performance improvement when using concurrent coroutines and cached filehandles seems to indicate that golang arm concurrency plays well with read/write I/O but not with open/close.

Regarding performance impact:
CPU impact - as benchmarks show CPU impact is lessened when using cached filehandles. Furthermore cached filehandles lower the amount of garbage collection in languages like python and golang.
Memory impact - I couldn't find exact numbers, but I think filehandle objects are small enough and given the choice of "keeping" objects around and gc - the first approach is often better for performance.

I will post my test code snippets and benchmark results as soon as I get a bit of time.

Lastly - I would like to reemphasize that I'm not suggesting that existing code be rewritten to suit my particular needs or opinions. Information is provided as food for thought, with hope that it will assist others implementing heavy sysfs file io.

@fuzzycow
Copy link
Author

Benchmarks:
Python reopen before each read (with open(filename,"rb") as f) - ~600 read/per second
Python cached filehandle - 1860 to 2300

Golang (reopen file, 5 reader coroutine, 1 consumer coroutine) - ~520
Golang (cached filehandles, 5 reader coroutine, 1 consumer coroutine) - ~7000

@ddemidov
Copy link
Member

Looks like this was addressed in #25.

@dsharlet
Copy link
Contributor

That PR only helps with this for C++, the other languages will need their own solution.

@ddemidov
Copy link
Member

Python binding is based on C++ one, so that leaves js, lua, and R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants