Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI- this showed up in a tech forum. #3

Open
sxlijin opened this issue Apr 23, 2020 · 4 comments
Open

FYI- this showed up in a tech forum. #3

sxlijin opened this issue Apr 23, 2020 · 4 comments

Comments

@sxlijin
Copy link

sxlijin commented Apr 23, 2020

https://news.ycombinator.com/item?id=22957884

Comments are somewhat toxic, but may contain interesting notes about implementation decisions.

@LowLevelMahn
Copy link

LowLevelMahn commented Apr 24, 2020

telling that the C++ code is slower than go using this type of code creates some form of "feedback" :) but its far from "toxic"

@LowLevelMahn
Copy link

LowLevelMahn commented Apr 24, 2020

so the questions from hackernews are :

why such constructs?:
https://github.com/ExaScience/elprep-bench/blob/master/cpp/filter_pipeline.cpp#L20-L33
auto alns = any_cast<shared_ptr<deque<shared_ptr<sam_alignment>>>>(data);

it is strange that there is no unique_prt in the complete code - everything is everytime shared?
why no std::unique_ptr?
needs the pointer really thread-safe "sharedable" with ref-counting over mupltiple threads?
shared_ptr is very costly(ref-counting, atomic-lock...), unique_ptr is nearly for free

is it clear that C++ needs way less new or make_shared/unique in code than Java?

it seems you used shared_ptr to implement some sort of move-semantik, that comes
also for free using unique_ptr

the allocation overhead seems to be very huge

@maximegmd
Copy link

It's hard to believe this kind of publication gets accepted when the variable that is actually measured here is the authors' relative competence in 3 langages.

@pcostanza
Copy link
Contributor

@sxlijin Thanks a lot for the link, such notifications on discussions around elPrep are very much appreciated. However, we are currently very busy with working on the next release of elPrep, so we are focusing on that rather than participating in such discussions. Maybe we will comment sometime later. (There was a similar discussion on reddit some time ago, with similar criticisms which we already addressed at https://www.reddit.com/r/programming/comments/avsfc6/performance_comparison_of_go_c_and_java_for/ - many of our answers back then probably apply here as well, but we would have to double-check.)

@LowLevelMahn An important aspect of elPrep is that it is an open-ended framework where more filters can be added, including complex ones like the ones for marking duplicates or base-quality score recalibration, and combined in arbitrary ways. (We are currently working on other more complex ones.) In the general case, this makes it impossible to predict the lifetimes of the objects involved, which is why you need something like shared_ptr in the general case (or garbage collection if available). We already had a version of elPrep with mostly manual memory management before, and this became impractical, which is why we did the study. Doing manual memory management in C++ wouldn't have improved our situation, so wasn't a real option. This motivation for our work is discussed in the paper at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2903-5 , and it is important to assess our work in this light. You can find more information about the background of work in our other papers about elPrep, namely https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209523 and https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132868

As far as we can tell, reference counting across multiple threads didn't account for major performance losses. Performance is mostly lost during a long-running deallocation phase which is strictly sequential. For the remaining phases, the C++ version is actually on par with the other implementations. This is actually discussed in some detail in the paper.

@Yamashi Competence in programming languages is difficult to assess, but productivity is an important dimension for real-world projects. It is already known for quite some time that automatic memory management can drastically improve productivity. See https://ieeexplore.ieee.org/document/5387117 for example.

Feel free to assess the proficiency in other programming languages at our lab by looking at our other projects at https://github.com/exascience/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants