Ugrep benchmarks shows that is faster most of the time than ripgrep, is it true? #2597

qarmin · 2023-08-25T16:52:34Z

qarmin
Aug 25, 2023

Benchmark was created by autor of ugrep - https://github.com/Genivia/ugrep-benchmarks

I see here in readme, that results are completelly oposite.

Are there any new independed benchmarks that shows real performance of this tools?

Aug 25, 2023

I think it's really something you'll have to try for yourself. And ripgrep has more benchmarks than just what is in the README.

Look at most of the benchmarks in the repository you've linked. A lot of the timings are in the low milliseconds. And notice also that the benchmarks don't include match counts. Both of those things are important.

Firstly, very fast timings suggest that most of what you're measuring is process overhead. Usually the way around that is to increase the size of the haystack. In ugrep's single-file benchmarks, it's only searching a single ~100MB file. That's just not big enough. Here, let's take a look at how long it takes ripgrep to find a substring that only occurs …

View full answer

BurntSushi · 2023-08-25T17:25:12Z

BurntSushi
Aug 25, 2023
Maintainer

I think it's really something you'll have to try for yourself. And ripgrep has more benchmarks than just what is in the README.

Look at most of the benchmarks in the repository you've linked. A lot of the timings are in the low milliseconds. And notice also that the benchmarks don't include match counts. Both of those things are important.

Firstly, very fast timings suggest that most of what you're measuring is process overhead. Usually the way around that is to increase the size of the haystack. In ugrep's single-file benchmarks, it's only searching a single ~100MB file. That's just not big enough. Here, let's take a look at how long it takes ripgrep to find a substring that only occurs on the last line of enwik8 and compare it with the same search for a file only containing that line:

$ cat corpi/tiny.txt
Recently, Standard Japanese has

$ time rg 'Standard Japanese has' corpi/tiny.txt
1:Recently, Standard Japanese has

real    0.003
user    0.000
sys     0.003
maxmem  11 MB
faults  0

$ time rg 'Standard Japanese has' corpi/enwik8
1128024:Recently, Standard Japanese has

real    0.014
user    0.010
sys     0.003
maxmem  101 MB
faults  0

$ ls -l corpi/enwik8 corpi/tiny.txt
-rw-rw-r-- 1 andrew users 100000000 Aug 25 09:33 corpi/enwik8
-rw-rw-r-- 1 andrew users        32 Aug 25 13:04 corpi/tiny.txt

So the search in the bigger file took about 4.6 times as much time as for the smaller file, yet the bigger file is 3,125,000 times bigger than the smaller file! So clearly overhead is likely coming into play when you're measuring 30 milliseconds for a single execution. It's not that being faster and having less overhead doesn't matter, but it's only one part of the picture.

Secondly, the benchmark results don't show match counts. This makes it quite difficult to match them up with your own use cases. Some use cases demand high match counts. For example, you might use ripgrep to highlight some part of every line in a file. But many use cases also have low match counts. For example, you might be looking for where a function is called 10 times in a codebase with 1GB of source code. Those have very very very different performance profiles. For example, ripgrep absolutely destroys ugrep in at least some of those cases:

$ cd /dev/shm/
$ curl -LO 'https://burntsushi.net/stuff/OpenSubtitles2018.raw.sample.en.gz'
$ gzip -d OpenSubtitles2018.raw.sample.en.gz
$ time rg-13.0.0 -c '\w+\s+Sherlock\s+Holmes\s+\w+' OpenSubtitles2018.raw.sample.en
85

real    0.107
user    0.080
sys     0.027
maxmem  903 MB
faults  0

$ time ugrep-4.0.3 -c '\w+\s+Sherlock\s+Holmes\s+\w+' OpenSubtitles2018.raw.sample.en
85

real    26.049
user    25.970
sys     0.073
maxmem  29 MB
faults  0

The final thing I'd point out (I'm trying to cover very broad strokes here) is that ripgrep is quite a bit faster at respecting your gitignore files. ugrep doesn't do it by default, so you actually have to enable it. First, we'll compare how they do on a checkout of Chromium when searching literally everything:

$ g remote -v
origin  git@github.com:nwjs/chromium.src (fetch)
origin  git@github.com:nwjs/chromium.src (push)

$ g rev-parse HEAD
e25f26f9aef0fd5958b980c3da4fdc2703ea4ccc

$ time grep -r --binary-files=text Openbox ./ | wc -l
10

real    4.836
user    2.971
sys     1.836
maxmem  23 MB
faults  0

$ time rg -uuua Openbox | wc -l
10

real    1.659
user    1.585
sys     4.339
maxmem  96 MB
faults  0

$ time ugrep --hidden --binary-files=text Openbox | wc -l
10

real    1.711
user    2.122
sys     5.107
maxmem  14 MB
faults  8

Now let's see how they do while only filtering out hidden files/directories and binary files:

$ time rg --no-ignore Openbox | wc -l
10

real    0.295
user    0.816
sys     2.441
maxmem  90 MB
faults  0

$ time ugrep --binary-files=without-match Openbox | wc -l
10

real    0.383
user    2.402
sys     3.261
maxmem  14 MB
faults  0

And now with gitignore filtering (which is ripgrep's default):

$ time rg Openbox | wc -l
10

real    0.350
user    1.738
sys     2.165
maxmem  92 MB
faults  0

$ time ugrep --ignore-files --binary-files=without-match Openbox | wc -l
10

real    1.844
user    3.636
sys     2.171
maxmem  11 MB
faults  12

Are there any new independed benchmarks that shows real performance of this tools?

I'm not aware of any high quality independent benchmarks.

Note that I also maintain benchmarks at the level of the regex engine: https://github.com/BurntSushi/rebar

I also published an in depth benchmark analysis when I published ripgrep (which predates ugrep): https://blog.burntsushi.net/ripgrep/

The benchmarks in that blog post do support ugrep, and my most recent run of them can be found here: https://github.com/BurntSushi/ripgrep/blob/962d47e6a1208cf2187cd34c2a7f6cf32e2a4903/benchsuite/runs/2022-12-16-archlinux-duff/summary

My personal and biased assessment is that ugrep is competitive with ripgrep, is sometimes faster but still has a number of performance cliffs in common use cases that ripgrep doesn't have.

1 reply

BurntSushi Aug 27, 2023
Maintainer

Also the author of hypergrep did their own benchmarks. Not exactly unbiased and totally independent, but it's someone who isn't involved with ugrep or ripgrep. And the benchmarks include both ripgrep and ugrep.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ugrep benchmarks shows that is faster most of the time than ripgrep, is it true? #2597

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ugrep benchmarks shows that is faster most of the time than ripgrep, is it true? #2597

qarmin Aug 25, 2023

Replies: 1 comment · 1 reply

BurntSushi Aug 25, 2023 Maintainer

BurntSushi Aug 27, 2023 Maintainer

qarmin
Aug 25, 2023

Replies: 1 comment 1 reply

BurntSushi
Aug 25, 2023
Maintainer

BurntSushi Aug 27, 2023
Maintainer