-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential pull request for pre-allocated buffer read? #152
Comments
Only Python 3 would be fine, but a default compile-time dependency on numpy would be problematic. Do you see a way to disentangle this? Would it be possible to make this optional at compile-time? |
i wonder whether a hard dep on numpy would be really a problem. what actual uses outside the scientific community (which uses numpy throughout) does pyaa have? conversely, i somewhat doubt that buffer pre-alloc actually buys you a lot in this case, as even the tightest loop is (sample-)rate-limited. anyway, one could do it without numpy while being numpy-compatible by using the buffer protocol directly. that would make the usage somewhat uglier, though. |
Thanks for commenting back so quickly! I agree that the memory allocation is probably not a huge drain on resource. I think I can also potentially trigger garbage collection on the input buffer. My naive implementation just using read() currently also involves copying the data from the input buffer into a numpy array and reshaping it, along with error handling if it's the wrong size. So I actually end up effectively doubling the memory allocation/garbage collection load. Any thoughts on that? I looked briefly at the buffer protocol documentation, and haven't fully understood it yet. I'm not really a python expert, and so there are things that I don't fully understand how to do. It seems like I should be able to effectively cast the |
Ah, actually, reading more, it seems that |
yes, my own code uses frombuffer() and reshape()'s it according to sample size and channel count. |
Yes, it can be done with the struct module: data = np.array(struct.unpack(conversion_string, rawdata), dtype=self.dtype) data: will be the numpy array conversion_string = f"{conversion_dict['endianness']}{noofsamples}{conversion_dict['formatchar']}" with some example conversion dicts:
and noofsamples is the number of samples in the rawdata. rawdata: bytes string Not sure how many copies are produced in the process. But memory usage can be kept reasonably low by keeping the number of samples low. |
|
I think so, but you can perhaps use frombuffer there. I would have to look into it to be sure. I think all these are for a single microphone channel. |
I use read() in a tight loop where I write microphone data to a file using python-soundfile. I've found it helpful to create a version of the read function that takes a preallocated numpy buffer, so I can write code like this:
The two caveats about the proposed
read_into()
are (1) dependency on numpy and (2) I only have working Python 3 code.The text was updated successfully, but these errors were encountered: