Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use numpy.concatenate instead of hstack, vstack #1009

Open
chillenzer opened this issue Sep 15, 2022 · 5 comments
Open

Use numpy.concatenate instead of hstack, vstack #1009

chillenzer opened this issue Sep 15, 2022 · 5 comments

Comments

@chillenzer
Copy link

chillenzer commented Sep 15, 2022

Hi everybody,
I was just reading through Episode 2 and was surprised about the appearance of numpy.hstack and numpy.vstack. Isn't it more useful to just introduce numpy.concatenate with an appropriate axis kwarg? I personally never use the {h,v}stack functions because they lack the generality to handle some cases for higher dimensions (and whenever I did it took me a while to sort out which of the 3, 4 or 5 axes of my array is considered "horizontal"). Even if the tutorial (at least at that point) is only concerned with 2D data, would it hurt to give them the exact same functionality but sneaking in the generality they might need for their own use case? One could even argue that it is simpler

  • to have only one function name to remember.
  • not to rely on them having the correct geometrical picture in mind when they could have an unambiguously enumerated axis instead.

Admittedly, this might be just personal preference (my own as well as of the people I work with), so I would be interested to hear if there are some rational arguments for the current way of doing it. If not, I'm happy to provide this small patch myself.
Best,
Julian

@shermanlo77
Copy link

It's an interesting point! I think numpy.hstack() and numpy.vstack() would help those who haven't grasped the idea of dimensions and axis yet. For example, putting a lego block on top on another is easier to think about than which dimension is which.

But a note on numpy.concatenate() would be useful for the stronger students

@chillenzer
Copy link
Author

Okay, I see your point. When teaching this material, we had some discussions with the learners about how the data is laid out and what the provided axis actually means. This is particularly tricky in 2D arrays where there is a coincidental symmetry between the axis arguments and their complement; e.g. when np.sum(..., axis=0) reduces a 5d array to a 4d array it is pretty clear that the sum was taken along axis 0 while the same for 2d to 1d could either mean "along axis 0" or "only axis 0 is kept" which is often symmetrical in shape in such situations.

I guess one could argue that the provided representation when printing gives a reasonable intuition for 2D arrays, still this does not straightforwardly generalize to higher dimensions (at least in my head).

But I think the compromise you suggest could be okay. Shall I write up a short info box on this?

@shermanlo77
Copy link

That's a good point, the episode goes on to explain the difference between numpy.mean(data, axis=0) and numpy.mean(data, axis=1) so the students should know about dimensions and axis

I think your/our suggestion on using concatenate() could be a good candidate for a pull request

@chillenzer
Copy link
Author

Great! I will write something up and create a PR. Doesn't have high priority though, so might take a while to arrive. =)

@chillenzer
Copy link
Author

In fact, np.concatenate could make the whole axis thing even clearer than np.mean because one can immediately follow the change in shape as opposed to manually inspecting the data which at that point is definitely not 100% obvious to compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants