Skip to content

Commit

Permalink
Adds --TYPE support (resolves #40), user-defined type support (resolves
Browse files Browse the repository at this point in the history
#38, resolves #39).

* feature/ISSUE40-AddTYPESupport:
  Almost.
  Final README.md updates?
  More README.md updates.
  Added help on --[no]TYPE.
  More README.md updates.
  More README.md updates.
  More README.md updates.
  More README.md updates.
  Added TOC to README.md.
  More README.md updates.
  More README.md updates.
  More README.md changes.
  Some updates to README.md.
  Added support for --help-types/--list-file-types.  Resolves #39.
  Back to returning 1 (no match) for "no such file".
  Removed degenerate regex perf test, all performed similarly to other regexes, just made test longer.
  Change return code of "no such file or dir" to a grep-like 2; changed sanity.at to match.  Added a perf test for an exponential-in-text-to-match regex.
  Removed debug print.  Improved "User-defined file type specs" test.
  Changed m_builtin_type_map to m_builtin_and_user_type_map.  Changed IsType() to use that instead of m_active_type_map for determining valid type names.  Changed TypeAdd*()/TypeDel() to add/del types to m_builtin_and_user_type_map in addition to m_active_type_map.  Passes test.
  Updated tests/type_inclusion.at to accommodate "last --TYPE wins" filtering.
  Fixed situation where "ucg --noenv --type=nocpp --type=nocc --type=hh '#endif' ~/src/boost_1_58_0" would give different results than "ucg --noenv --type=hh '#endif' ~/src/boost_1_58_0".
  Added some tests.  --TYPE/--noTYPE not working yet.
  "is" and "ext" filters implemented.  Added exception handler to main() to catch ArgParseExceptions.
  COPYING and AUTHORS are now docs and will be installed.  Fixed DOS line endings on COPYING.
  --TYPE/--noTYPE working.
  • Loading branch information
gvansickle committed Dec 29, 2015
2 parents 4a68883 + 2d614ed commit 4a5a042
Show file tree
Hide file tree
Showing 10 changed files with 1,161 additions and 709 deletions.
1,348 changes: 674 additions & 674 deletions COPYING

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ ACLOCAL_AMFLAGS = -I m4 --install
# Note that we can't list these other libraries in e.g. *_DEPENDENCIES because that replaces all Automake-generated dependencies.
SUBDIRS = third_party src tests

# Make sure README.rd gets distributed and installed correctly.
dist_doc_DATA=README.md
# Make sure README.rd and other docs get distributed and installed correctly.
dist_doc_DATA = README.md COPYING AUTHORS

# The Automake rules for the ucg executable.
bin_PROGRAMS=ucg
Expand Down
158 changes: 135 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,61 @@
# UniversalCodeGrep

UniversalCodeGrep (ucg) is another [Ack](http://beyondgrep.com/) clone. It is a grep-like tool specialized for searching large bodies of source code.

`ucg` is written in C++ and takes advantage of the C++11 and newer facilities of the language to reduce reliance on non-standard libraries, increase portability, and increase scanning speed.

As a consequence of `ucg`'s use of these facilities and its overall design for maximum concurrency and speed, `ucg` is extremely fast. Under Ubuntu 15.04, scanning the Boost 1.58.0 source tree with `ucg` 0.1.0, [`ag`](http://geoff.greer.fm/ag/) 0.28.0, and `ack` 2.14 produces the following results:
UniversalCodeGrep (ucg) is another [Ack](http://beyondgrep.com/) clone. It is an extremely fast grep-like tool specialized for searching large bodies of source code.

## Table of Contents

* [UniversalCodeGrep](#universalcodegrep)
* [Table of Contents](#table-of-contents)
* [Introduction](#introduction)
* [Speed](#speed)
* [License](#license)
* [Installation](#installation)
* [Prerequisites](#prerequisites)
* [gcc version 4.8 or greater.](#gcc-version-48-or-greater)
* [pcre version 8.2 or greater.](#pcre-version-82-or-greater)
* [Supported OSes and Distributions](#supported-oses-and-distributions)
* [Usage](#usage)
* [Command Line Options](#command-line-options)
* [Searching](#searching)
* [File presentation](#file-presentation)
* [File inclusion/exclusion:](#file-inclusionexclusion)
* [File type specification:](#file-type-specification)
* [Miscellaneous:](#miscellaneous)
* [Informational options:](#informational-options)
* [.ucgrc Files](#ucgrc-files)
* [Format](#format)
* [Location and Read Order](#location-and-read-order)
* [User-Defined File Types](#user-defined-file-types)
* [Extension List Filter](#extension-list-filter)
* [Literal Filename Filter](#literal-filename-filter)
* [Author](#author)

## Introduction

UniversalCodeGrep (ucg) is an extremely fast grep-like tool specialized for searching large bodies of source code. It is intended to be largely command-line compatible with [Ack](http://beyondgrep.com/), to some extent with [`ag`](http://geoff.greer.fm/ag/), and where appropriate with `grep`. Search patterns are specified as PCRE regexes.

### Speed
`ucg` is intended to address the impatient programmer's code searching needs. `ucg` is written in C++11 and takes advantage of the concurrency (and other) support of the language to increase scanning speed while reducing reliance on third-party libraries and increasing portability. Regex scanning is provided by the [PCRE library](http://www.pcre.org/), with its [JIT compilation feature](http://www.pcre.org/original/doc/html/pcrejit.html) providing a huge performance gain on most platforms.

As a consequence of its use of these facilities and its overall design for maximum concurrency and speed, `ucg` is extremely fast. Under Fedora 23, scanning the Boost 1.58.0 source tree with `ucg` 0.2.0, [`ag`](http://geoff.greer.fm/ag/) 0.30.0, and `ack` 2.14 produces the following results:

| Command | Approximate Real Time |
|---------|-----------------------|
| `time ucg '#endif' ~/src/boost_1_58_0` | ~ 3 seconds |
| `time ag '#endif' ~/src/boost_1_58_0` | ~ 10 seconds |
| `time ack '#endif' ~/src/boost_1_58_0` | ~ 19 seconds |
| `time ucg 'BOOST.*HPP' ~/src/boost_1_58_0` | ~ 0.53 seconds |
| `time ag 'BOOST.*HPP' ~/src/boost_1_58_0` | ~ 11.1 seconds |
| `time ack 'BOOST.*HPP' ~/src/boost_1_58_0` | ~ 18.3 seconds |

## License

[GPL (Version 3 only)](https://github.com/gvansickle/ucg/blob/master/COPYING)

## Installation

UniversalCodeGrep installs from the distribution tarball (available [here](https://github.com/gvansickle/ucg/releases/download/0.1.0/universalcodegrep-0.1.0.tar.gz)) in the standard autotools manner:
UniversalCodeGrep installs from the distribution tarball (available [here](https://github.com/gvansickle/ucg/releases/download/0.2.0/universalcodegrep-0.2.0.tar.gz)) in the standard autotools manner:

```sh
tar -xaf universalcodegrep-0.1.0.tar.gz
cd universalcodegrep-0.1.0.tar.gz
tar -xaf universalcodegrep-0.2.0.tar.gz
cd universalcodegrep-0.2.0.tar.gz
./configure
make
make install
Expand All @@ -36,48 +69,127 @@ This will install the `ucg` executable in `/usr/local/bin`. If you wish to inst

### Prerequisites

- `gcc` version 4.9 or greater.
#### `gcc` version 4.8 or greater.

Versions of `gcc` prior to 4.8 do not have sufficiently complete C++11 support to build `ucg`.

#### `pcre` version 8.2 or greater.

Versions of `gcc` prior to 4.9 are known to ship with an incomplete implementation of the standard `<regex>` library. Since `ucg` depends on this C++11 feature, `configure` attempts to detect a broken `<regex>` at configure-time.
This should be available from your Linux distro.

### Supported OSes and Distributions

UniversalCodeGrep should build and function anywhere there's a `gcc` 4.9 or greater available. It has been tested on the following OSes/distros:
UniversalCodeGrep should build and function anywhere the prerequisites are available. It has been built and tested on the following OSes/distros:

- Linux
- Ubuntu 15.04 (with gcc 4.9.2, the current default compiler on this distro)
- Windows 7 + Cygwin 64-bit (with gcc 4.9.3, the current default compiler on this distro)
- Ubuntu 15.04
- CentOS 7
- Fedora 22
- Fedora 23
- RHEL 7
- SLE 12
- openSUSE 13.2
- openSUSE Leap 42.1
- Windows 7 + Cygwin 64-bit (Note however that speed here is comparable to `ag`)

## Usage

Invoking `ucg` is the same as with `ack`:
Invoking `ucg` is the same as with `ack` or `ag`:

```sh
ucg [OPTION...] PATTERN [FILES OR DIRECTORIES]
```

...where `PATTERN` is an ECMAScript-compatible regular expression.
...where `PATTERN` is an PCRE-compatible regular expression.

If no `FILES OR DIRECTORIES` are specified, searching starts in the current directory.

### Options
### Command Line Options

Version 0.1.0 of `ucg` only supports a small subset of the options supported by `ack`. Future releases will have support for more options.
Version 0.2.0 of `ucg` supports a significant subset of the options supported by `ack`. Future releases will have support for more options.

#### Searching

| Option | Description |
|----------------------|------------------------------------------|
| `-i, --ignore-case` | Ignore case distinctions in PATTERN |

| `-i, --ignore-case` | Ignore case distinctions in PATTERN |
| `-Q, --literal` | Treat all characters in PATTERN as literal. |
| `-w, --word-regexp` | PATTERN must match a complete word. |

#### File presentation

| Option | Description |
|----------------------|------------------------------------------|
| `--color, --colour` | Render the output with ANSI color codes. |
| `--color, --colour` | Render the output with ANSI color codes. |
| `--nocolor, --nocolour` | Render the output without ANSI color codes. |

#### File inclusion/exclusion:
| Option | Description |
|----------------------|------------------------------------------|
| `--ignore-dir=name, --ignore-directory=name` | Exclude directories with this name. |
| `--noignore-dir=name, --noignore-directory=name` | Do not exclude directories with this name. |
| `-n, --no-recurse` | Do not recurse into subdirectories. |
| `-r, -R, --recurse` | Recurse into subdirectories (default: on). |
| `--type=[no]TYPE` | Include only [exclude all] TYPE files. Types may also be specified as `--[no]TYPE`. |

#### File type specification:
| Option | Description |
|----------------------|------------------------------------------|
| `--type-add=TYPE:FILTER:FILTERARGS` | Files FILTERed with the given FILTERARGS are treated as belonging to type TYPE. Any existing definition of type TYPE is appended to. |
| `--type-del=TYPE` | Remove any existing definition of type TYPE. |
| `--type-set=TYPE:FILTER:FILTERARGS` | Files FILTERed with the given FILTERARGS are treated as belonging to type TYPE. Any existing definition of type TYPE is replaced. |

#### Miscellaneous:
| Option | Description |
|----------------------|------------------------------------------|
| `-j, --jobs=NUM_JOBS` | Number of scanner jobs (std::thread<>s) to use. |
| `--noenv` | Ignore .ucgrc files. |

#### Informational options:
| Option | Description |
|----------------------|------------------------------------------|
| `-?, --help` | give this help list |
| `--help-types, --list-file-types` | Print list of supported file types. |
| `--usage` | give a short usage message |
| `-V, --version` | print program version |

## Configuration (.ucgrc) Files

UniversalCodeGrep supports configuration files with the name `.ucgrc`, in which command-line options can be stored on a per-user and per-directory-hierarchy basis.

### Format

`.ucgrc` files are text files with a simple format. Each line of text can be either:

1. A single-line comment. The line must start with a `#` and the comment continues for the rest of the line.
2. A command-line parameter. This must be exactly as if it was given on the command line.

### Location and Read Order

When `ucg` is invoked, it looks for command-line options from the following locations in the following order:

1. The `.ucgrc` file in the user's `$HOME` directory, if any.
2. The first `.ucgrc` file found, if any, by walking up the component directories of the current working directory. This traversal stops at either the user's `$HOME` directory or the root directory. This is called the project config file, and is intended to live in the top-level directory of a project directory hierarchy.
3. The command line itself.

Options read later will override earlier options.

## User-Defined File Types

`ucg` supports user-defined file types with the `--type-set=TYPE:FILTER:FILTERARGS` and `--type-add=TYPE:FILTER:FILTERARGS` command-line options. Only two FILTERs are currently supported, `ext` (extension list) and `is` (literal filename).

### Extension List Filter

The extension list filter allows you to specify a comma-separated list of file extensions which are to be considered as belonging to file type TYPE.
Example:
`--type-set=type1:ext:abc,xqz,def`

### Literal Filename Filter

The literal filename filter simply specifies a single literal filename which is to be considered as belonging to file type TYPE.
Example:
`--type-add=autoconf:is:configure.ac`

## Author

Gary R. Van Sickle
8 changes: 8 additions & 0 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ int main(int argc, char **argv)
if(g.Error())
{
std::cout << "ucg: \"" << g.ErrorPath() << "\": No such file or directory" << std::endl;
// Both ack and ag return 1 in this situation, which indicates that "no matches were found".
// We'll follow their lead; this is really sort of an error, and grep would return 2 here,
// but I suppose it could be argued that there is no match here.
return 1;
}

Expand All @@ -122,4 +125,9 @@ int main(int argc, char **argv)
std::cerr << "ucg: Error during regex parsing: " << e.what() << std::endl;
return 255;
}
catch(const ArgParseException &e)
{
std::cerr << "ucg: Error during arg parsing: " << e.what() << std::endl;
return 255;
}
}
Loading

0 comments on commit 4a5a042

Please sign in to comment.