Ugrep benchmarks shows that is faster most of the time than ripgrep, is it true? #2597
-
Benchmark was created by autor of ugrep - https://github.com/Genivia/ugrep-benchmarks I see here in readme, that results are completelly oposite. Are there any new independed benchmarks that shows real performance of this tools? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think it's really something you'll have to try for yourself. And ripgrep has more benchmarks than just what is in the README. Look at most of the benchmarks in the repository you've linked. A lot of the timings are in the low milliseconds. And notice also that the benchmarks don't include match counts. Both of those things are important. Firstly, very fast timings suggest that most of what you're measuring is process overhead. Usually the way around that is to increase the size of the haystack. In ugrep's single-file benchmarks, it's only searching a single ~100MB file. That's just not big enough. Here, let's take a look at how long it takes ripgrep to find a substring that only occurs on the last line of
So the search in the bigger file took about 4.6 times as much time as for the smaller file, yet the bigger file is 3,125,000 times bigger than the smaller file! So clearly overhead is likely coming into play when you're measuring 30 milliseconds for a single execution. It's not that being faster and having less overhead doesn't matter, but it's only one part of the picture. Secondly, the benchmark results don't show match counts. This makes it quite difficult to match them up with your own use cases. Some use cases demand high match counts. For example, you might use ripgrep to highlight some part of every line in a file. But many use cases also have low match counts. For example, you might be looking for where a function is called 10 times in a codebase with 1GB of source code. Those have very very very different performance profiles. For example, ripgrep absolutely destroys ugrep in at least some of those cases:
The final thing I'd point out (I'm trying to cover very broad strokes here) is that ripgrep is quite a bit faster at respecting your gitignore files. ugrep doesn't do it by default, so you actually have to enable it. First, we'll compare how they do on a checkout of Chromium when searching literally everything:
Now let's see how they do while only filtering out hidden files/directories and binary files:
And now with gitignore filtering (which is ripgrep's default):
I'm not aware of any high quality independent benchmarks. Note that I also maintain benchmarks at the level of the regex engine: https://github.com/BurntSushi/rebar I also published an in depth benchmark analysis when I published ripgrep (which predates ugrep): https://blog.burntsushi.net/ripgrep/ The benchmarks in that blog post do support ugrep, and my most recent run of them can be found here: https://github.com/BurntSushi/ripgrep/blob/962d47e6a1208cf2187cd34c2a7f6cf32e2a4903/benchsuite/runs/2022-12-16-archlinux-duff/summary My personal and biased assessment is that ugrep is competitive with ripgrep, is sometimes faster but still has a number of performance cliffs in common use cases that ripgrep doesn't have. |
Beta Was this translation helpful? Give feedback.
I think it's really something you'll have to try for yourself. And ripgrep has more benchmarks than just what is in the README.
Look at most of the benchmarks in the repository you've linked. A lot of the timings are in the low milliseconds. And notice also that the benchmarks don't include match counts. Both of those things are important.
Firstly, very fast timings suggest that most of what you're measuring is process overhead. Usually the way around that is to increase the size of the haystack. In ugrep's single-file benchmarks, it's only searching a single ~100MB file. That's just not big enough. Here, let's take a look at how long it takes ripgrep to find a substring that only occurs …