Skip to content
Dr. Robert van Engelen edited this page Aug 22, 2023 · 120 revisions

"Take big bites. Anything worth doing is worth overdoing."      — Robert A. Heinlein, Time Enough for Love

🏆 Google Open Source Peer Bonus Award 🏆

Very honored to receive the Google OSPB 2022 award for my work on ugrep. But let's not forget all the people who offered suggestions, comments and otherwise contributed to the project!

Development roadmap

Ugrep has a clear roadmap. Ugrep is already the fastest and most feature-rich grep utility. But ugrep is relatively new, so there is still some room for new features and improvements:

  • the highest priority is testing and quality assurance to continue to make sure ugrep has no bugs and is reliable
  • make ugrep even faster, see my latest blog article demonstrating with a reproducible benchmark that ugrep beats GNU grep and ripgrep in terms of raw performance
  • improve the interactive TUI with a split screen
  • listen to users to improve ugrep with new features
  • add file indexing to speed up cold search performance
  • share reproducible performance data with the community

Why did you build ugrep?

We were looking for an efficient grep tool to quickly dig through hundreds of zip- and tar-archived project repos with thousands of source code files, documentation files, images, and binary files. We wanted to do this without having to expand archives, to save time and storage resources. With ugrep we have the ability to specifically search source code (with option -t) while ignoring everything else in these huge zip- and tar-archives. Even better, ugrep can ignore matches in strings and comments in source code using "negative patterns", e.g. with pre-defined patterns ugrep -f c++/zap_strings -f c++/zap_comments .... To keep ugrep clean BSD-3 source code unencumbered by GPL or LGPL terms and conditions, I wrote my own tar, zip, pax and cpio unarchivers from scratch in C++ that call external decompression libraries linked with ugrep.

Later on, we started to make ugrep a lot faster (see the part below). After that, many users offered suggestions to add more features, such as Boolean search queries, fuzzy search, improved TUI, binary search with hexdumps, and file indexing.

Why is ugrep fast, aren't all grep just as fast?

Ugrep uses the new method I presented in my talk at the Performance Summit IV. I explain in more detail the new method and performance results in my article. Ugrep is faster than all other grep tools for common search patterns and usage scenarios. See for example performance comparisons. Ugrep uses new methods from our research. Ugrep uses a new logic and arithmetic hashing technique to predict matches. When a possible match is predicted, a pattern match is performed with our RE/flex library. This DFA-based regex library is much faster to match patterns than other libraries such as PCRE2, even when PCRE2's JIT is enabled. In addition, ugrep's worker threads are optimally load-balanced. We also use AVX/SSE/ARM-NEON/AArch64 instructions and utilize efficient non-blocking asynchronous IO.

Is ugrep mature and stable?

We at our research lab (and many others) use, test, and evaluate ugrep regularly and we cannot accept errors. Our RE/flex library that is used by ugrep has been around for several years and is stable. Ugrep also meets the highest quality standards (A+) for C++ source code according to lgtm. We continue field-testing ugrep. If there is any problem, let us know by opening an issue, so everyone benefits!

What's new?

Some examples of what's new that other grep tools don't offer:

Option -Q opens a query UI to search files as you type (press F1 or CTRL-Z for help and options):


Option -t searches files by file type and predefined source code search patterns can be specified with option -f:


Option -z searches archives (cpio, pax, tar, zip) and compressed files and tarballs (zip, gz, bz2, xz, lzma, Z, lz4, zstd):


Options -U, -W and -X search binary files, displayed as hexdumps:


Option --filter searches pdf, office documents, and more:


Option -Z searches for fuzzy (approximate) matches within an optionally specified max error:


Option --pretty enhances the output to the terminal. You can specify pretty in a .ugrep configuration file so that ug -l lists directory trees instead of the traditional flat grep list:


Context options -ABC also work with option -o to display the context of the only-matching pattern part on a line, by fitting the match in the specified number of columns. This is particularly useful when searching files with very long lines!

Are there any limitations?

Not really. We carefully designed and gradually implemented ugrep without limits, unlike some other grep tools that warn about potential truncated output under certain conditions. For example, unlike other grep tools, there are no practical limitations on the match size for multiline patterns, even when its context (option -C) is large. There is no limit on the file size, which may exceed 2GB. The maximum regex pattern length is 2GB. If the pattern causes excessive memory requirements due to its size and complexity, then an error message may be generated before ugrep starts searching. This should not happen in any practical use case.

Where can I find the tutorial, documentation, and examples?

It's all in one README on GitHub.

What does the initial U stand for in ugrep?

U name it. The U wasn’t used by any other grep tools I could find, so “ugrep” was a logical choice. But if you really must, take a pick:

  • User friendly grep (yes it is, but that's not the only goal)
  • Universal grep (yes, it supports features of competing greps, but what does Universal mean?)
  • Ultra grep (yes it is ultra fast, but ultra ... what?)
  • Ultimate grep (not there yet, but soon...?)
  • Uberty grep (sounds too über...)
  • Unzymotic grep (too fab...)
  • u grep (you grep? sounds just right!)

Can I help?

Absolutely! There are many ways to contribute. If you have a suggestion or if you're not happy with something then post it as an issue.

A shout out and a big thank you to our heroes, the project contributors: rbnor, ribalda, theUncanny, ucifs, NightMachinary, jonassmedegaard, cdluminate, grylem, ISO8807, 0x7FFFFFFFFFFFFFFF, bolddane, marc-guenther, rrthomas, illiliti, stdedos, bmwiedemann, pete-woods, paoloschi, mmuman, alex-bender, smac89, htgoebel, gaeulbyul, dicktyr, andresroldan, AlexanderS, NapVMk, chy-causer, camuffo, trantor, essays-on-esotericism, hanyfarid, reneeotten, wahjava, idigdoug, ericonr, juhopp, emaste, zoomosis, ChrisMoutsos, wimstefan, navarroaxel, korziner, carlwgeorge and others.

Please ⭐️ the project if you use ugrep (even occasionally) to thank the contributors for their hard work!

-- Robert

Clone this wiki locally