Skip to content

Commit

Permalink
Support --regex-case and default to it for uppercase characters patterns
Browse files Browse the repository at this point in the history
  • Loading branch information
vaeth committed May 28, 2023
1 parent 1f9f688 commit 3688489
Show file tree
Hide file tree
Showing 16 changed files with 155 additions and 45 deletions.
5 changes: 5 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# ChangeLog for eix - Ebuild IndeX for portage

*eix-0.36.7
Martin Väth <martin at mvath.de>:
- Support --regex-case and prefer it over --regex if the pattern has
uppercase characters, see https://github.com/vaeth/eix/issues/110

*eix-0.36.6
Martin Väth <martin at mvath.de>:
- Support *.gpkg.tar (BINPKG_FORMAT=gpkg), see
Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ dnl each item is listed in a separate line with indent level increased;
dnl in such a case the opening/closing braces are isolated.
dnl 2. The AC_INIT macro must be in one line since it is parsed by
dnl primitive scripts.
AC_INIT([eix], [0.36.6], [https://github.com/vaeth/eix/issues/], [eix], [https://github.com/vaeth/eix/])
AC_INIT([eix], [0.36.7], [https://github.com/vaeth/eix/issues/], [eix], [https://github.com/vaeth/eix/])
AC_PREREQ([2.64])

m4_ifdef([AC_CONFIG_MACRO_DIR],
Expand Down
24 changes: 14 additions & 10 deletions manpage/de-eix.1.in
Original file line number Diff line number Diff line change
Expand Up @@ -957,12 +957,13 @@ Nur ein Algorithmus kann pro AUSDRUCK gewählt werden.
Falls keine dieser Optionen benutzt wird, wird die Vorgabe anhand einer
Heuristik ausgewählt, die von der Gestelt des Suchmusters und vom ausgewählten
Operandenfeld abhängt.
In den meisten Fällen wird diese Vorgabe B<--regex> sein, außer wenn das
Suchmuster aussieht wie "typischerweise" ein glob-Muster or ein Teilstring
(in diesem Fall wird der entsprechende Algorithmus die Vorgabe), oder falls
das ausgewählte Operandenfeld sich nur auf USE-flags, sets, EAPI oder SLOT
bezieht, so dass die meisten Benutzer vermutlich erwarten, dass auch tatsächlich
gegen den gesamten String getestet werden soll.
In den meisten Fällen wird diese Vorgabe B<--regex> sein (oder B<--regex-case>,
falls der Ausdruck Großbuchstaben enthält), außer wenn das Suchmuster aussieht
wie "typischerweise" ein glob-Muster or ein Teilstring (in diesem Fall wird der
entsprechende Algorithmus die Vorgabe), oder falls das ausgewählte
Operandenfeld sich nur auf USE-flags, sets, EAPI oder SLOT bezieht, so dass
die meisten Benutzer vermutlich erwarten, dass auch tatsächlich gegen den
gesamten String getestet werden soll.
Details der Heuristik werden in der später erklärten B<DEFAULT_MATCH_ALGORITHM>-Konfigurationsvariable festgelegt.
Je nach Wert dieser Variablen kann es auch passieren, dass eine der folgenden
Optionen explizit für jede Suche angegeben werden muss.
Expand Down Expand Up @@ -995,12 +996,15 @@ für weitere Details.
Vorsicht bei der Übergabe des Suchmusters in einer Shell (Quoting!).
.TP
.BR -r ", " --regex
Das Suchmuster ist ein regulärer Ausdruck.
Das Suchmuster ist ein regulärer Ausdruck (ohne Beachtung der Großschreibung).
Nur ein Teilstring muss passen (außer wenn ^ oder $ benutzt werden).
Das leere Suchmuster passt auf alles.
Weitere Informationen finden sich in
.BR regex (7).
Vorsicht bei der Übergabe des Suchmusters in einer Shell (Quoting!).
.TP
.BR --regex-case
Wie B<--regex>, aber mit Beachtung der Groß-/Kleinschreibung.
.\" }}}
.\" {{{ -------- Layouts festlegen
.SS Layouts festlegen \fP(siehe B<FORMATSTRING> weiter unten)
Expand Down Expand Up @@ -3288,9 +3292,9 @@ die benutzt wird, um die Vorgabe für den Matchalgorithmus
Dies ist analog wie bei B<DEFAULT_MATCH_FIELD>, haupstächlichr mit dem Unterschied,
dass natürlich I<Matchalgorithmus> den Vorgabe-Matchalgorithmus angibt.

Die möglichen Werte für I<Matchalgorithmus> sind:
B<regex>, B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>.
Sie entsprechen den analogen Kommandozeilenoptionen für die Wahl des Matchalgorithmus.
Die möglichen Werte für I<Matchalgorithmus> sind: B<regex>, B<regexcase>,
B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>. Sie entsprechen
den analogen Kommandozeilenoptionen für die Wahl des Matchalgorithmus.
Der spezielle Wert B<error> bedeutet, dass eix mit der Fehlermeldung abbricht,
dass der Matchalgorithmus nicht automatisch erkannt werden kann und explizit
angegeben werden muss.
Expand Down
20 changes: 12 additions & 8 deletions manpage/en-eix.1.in
Original file line number Diff line number Diff line change
Expand Up @@ -937,11 +937,11 @@ Only one algorithm can be chosen for an expression.
If you do not specify some of these options, the default is chosen according
to some heuristic depending on the form of your search pattern and according
to the selected match field.
In most cases, the default will be B<--regex> unless your expression
"looks" like a glob pattern or a substring (in which case the corresponding
algorithm will be the default), or if the selected match field refers only
to USE-flags, sets, EAPI, or SLOT which perhaps most people would expect to
match the whole string.
In most cases, the default will be B<--regex> (or B<--regex-case> if your
expression contains capital letters) unless your expression "looks" like a glob
pattern or a substring (in which case the corresponding algorithm will be the
default), or if the selected match field refers only to USE-flags, sets, EAPI,
or SLOT which perhaps most people would expect to match the whole string.
Details of the heuristic are specified by the B<DEFAULT_MATCH_ALGORITHM>
configuration variable explained later.
Depending on the configuration of that variable, it might even be non-optional
Expand Down Expand Up @@ -975,12 +975,15 @@ for further information. Be sure to use single quotes around patterns (to preven
shell from intercepting any wildcards).
.TP
.BR -r ", " --regex
pattern is a regexp.
pattern is a regexp, ignoring case.
Only a substring must be matched (unless ^ or $ are used);
the empty pattern matches everything.
For further information, please read
.BR regex (7).
Again, be sure to use single quotes around patterns.
.TP
.BR --regex-case
As B<--regex>, but does not ignore case.
.\" }}}
.\" {{{ -------- Defining layouts
.SS Defining layouts \fP(see B<FORMATSTRING> below)
Expand Down Expand Up @@ -3218,8 +3221,9 @@ which is used to determine the default match algorithm.

The interpretation is essentially analogous to B<DEFAULT_MATCH_FIELD> with the main difference that I<match_algorithm>
specifies the default match algorithm chosen.
The possible values for I<match_algorithm> are B<regex>, B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>,
or B<error> which correspond to the analogous command line option for the match algorithm.
The possible values for I<match_algorithm> are B<regex>, B<regexcase>,
B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>, or B<error>
which correspond to the analogous command line option for the match algorithm.
The special value B<error> means that eix stops with an error message claiming
that a match algorithm is not autodetected and must be specified explicitly.
If no other default match algorithm default is specified, then B<regex> is used.
Expand Down
20 changes: 12 additions & 8 deletions manpage/ru-eix.1.in
Original file line number Diff line number Diff line change
Expand Up @@ -934,11 +934,11 @@ Only one algorithm can be chosen for an expression.
If you do not specify some of these options, the default is chosen according
to some heuristic depending on the form of your search pattern and according
to the selected match field.
In most cases, the default will be B<--regex> unless your expression
"looks" like a glob pattern or a substring (in which case the corresponding
algorithm will be the default), or if the selected match field refers only
to USE-flags, sets, EAPI, or SLOT which perhaps most people would expect to
match the whole string.
In most cases, the default will be B<--regex> (or B<--regex-case> if your
expression contains capital letters) unless your expression "looks" like a glob
pattern or a substring (in which case the corresponding algorithm will be the
default), or if the selected match field refers only to USE-flags, sets, EAPI,
or SLOT which perhaps most people would expect to match the whole string.
Details of the heuristic are specified by the B<DEFAULT_MATCH_ALGORITHM>
configuration variable explained later.
Depending on the configuration of that variable, it might even be non-optional
Expand Down Expand Up @@ -972,12 +972,15 @@ for further information. Be sure to use single quotes around patterns (to preven
shell from intercepting any wildcards).
.TP
.BR -r ", " --regex
pattern is a regexp.
pattern is a regexp, ignoring case.
Only a substring must be matched (unless ^ or $ are used);
the empty pattern matches everything.
For further information, please read
.BR regex (7).
Again, be sure to use single quotes around patterns.
.TP
.BR --regex-case
As B<--regex>, but does not ignore case.
.\" }}}
.\" {{{ -------- Defining layouts
.SS Defining layouts \fP(see B<FORMATSTRING> below)
Expand Down Expand Up @@ -3215,8 +3218,9 @@ which is used to determine the default match algorithm.

The interpretation is essentially analogous to B<DEFAULT_MATCH_FIELD> with the main difference that I<match_algorithm>
specifies the default match algorithm chosen.
The possible values for I<match_algorithm> are B<regex>, B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>,
or B<error> which correspond to the analogous command line option for the match algorithm.
The possible values for I<match_algorithm> are B<regex>, B<regexcase>,
B<pattern>, B<substring>, B<begin>, B<end>, B<exact>, B<fuzzy>, or B<error>
which correspond to the analogous command line option for the match algorithm.
The special value B<error> means that eix stops with an error message claiming
that a match algorithm is not autodetected and must be specified explicitly.
If no other default match algorithm default is specified, then B<regex> is used.
Expand Down
2 changes: 1 addition & 1 deletion meson.build
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
project('eix', 'cpp',
version : '0.36.6',
version : '0.36.7',
license : 'GPLv2',
default_options : [
'prefix=/usr',
Expand Down
38 changes: 31 additions & 7 deletions po/de.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ msgid ""
msgstr ""
"Project-Id-Version: eix\n"
"Report-Msgid-Bugs-To: https://github.com/vaeth/eix/issues/\n"
"POT-Creation-Date: 2023-01-29 14:08+0100\n"
"PO-Revision-Date: 2023-01-29 14:30+0100\n"
"POT-Creation-Date: 2023-05-28 20:45+0200\n"
"PO-Revision-Date: 2023-05-28 20:49+0200\n"
"Last-Translator: Martin Väth <martin@mvath.de>\n"
"Language-Team: German\n"
"Language: de\n"
Expand Down Expand Up @@ -1777,7 +1777,8 @@ msgid ""
" --installed-without-use disabled useflag (of installed package)\n"
"\n"
" Type of Pattern:\n"
" -r, --regex Pattern is a regexp (default)\n"
" -r, --regex Pattern is a regexp, ignoring case (default)\n"
" --regex-case Pattern is a regexp, taking case into account\n"
" -e, --exact Pattern is the exact string\n"
" -z, --substring Pattern is a substring\n"
" -b, --begin Pattern is the beginning of the string\n"
Expand Down Expand Up @@ -1963,7 +1964,10 @@ msgstr ""
" --installed-without-use deaktiviertes USEflag (installierter Pakete)\n"
"\n"
" Art des PATTERNs:\n"
" -r, --regex PATTERN ist regulärer Ausdruck (Vorgabe)\n"
" -r, --regex PATTERN ist regulärer Ausdruck,\n"
" ignoriert Großschreibung (Vorgabe)\n"
" --regex-case PATTERN ist regulärer Ausdruck,\n"
" beachtet Großschreibung\n"
" -e, --exact PATTERN ist exakter String\n"
" -z, --substring PATTERN ist ein Teilstring\n"
" -b, --begin PATTERN ist der Stringanfang\n"
Expand Down Expand Up @@ -4353,6 +4357,16 @@ msgstr ""
"set, eapi, slot, installed-slot, use, with-use, without-use, src-uri,\n"
"deps, depend, rdepend, pdepend, bdepend, idepend, error."

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX_CASE"
msgid ""
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase."
msgstr ""
"Diese Variable wird nur zur verzögerten Ersetzung benutzt.\n"
"Es handelt sich um das Kriterium, das von DEFAULT_MATCH_ALGORITHM für\n"
"„regexcase“ benutzt wird."

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX"
msgid ""
Expand Down Expand Up @@ -4403,23 +4417,33 @@ msgstr ""
"Es handelt sich um das Kriterium, das von DEFAULT_MATCH_ALGORITHM für\n"
"„begin“ benutzt wird."

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX_CASE_FALLBACK"
msgid ""
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase fallback."
msgstr ""
"Diese Variable wird nur zur verzögerten Ersetzung benutzt.\n"
"Es handelt sich um das Kriterium, das von DEFAULT_MATCH_ALGORITHM für\n"
"„regexcase“ als letzter Test benutzt wird."

#: src/eixrc/defaults.cc
msgctxt "DEFAULT_MATCH_ALGORITHM"
msgid ""
"This is a list of strings of the form (spec)regexp[ ]match_algorithm.\n"
"If spec matches the match field(s) and regexp matches the search pattern,\n"
"use match_algorithm as the default.\n"
"A fallback match_algorithm may be specified as the last entry in the list.\n"
"Admissible values for match_algorithm are: regex, pattern, substring,\n"
"begin, end, exact, fuzzy, error."
"Admissible values for match_algorithm are: regex, regexcase, pattern,\n"
"substring, begin, end, exact, fuzzy, error."
msgstr ""
"Dies ist eine Stringliste von Ausdrücken der Gestalt\n"
"(spec)regexp[ ]match_algorithm.\n"
"Falls spec auf das benutzte Operandenfeld und regexp auf das Suchmuster\n"
"zutrifft, wird match_algorithm als Vorgabe benutzt.\n"
"Ein Fallback-match_algorithm kann als letzter Eintrag der Liste auftreten.\n"
"Zulässige Werte für match_algorithm sind:\n"
"regex, pattern, substring, begin, end, exact, fuzzy, error"
"regex, regexcase, pattern, substring, begin, end, exact, fuzzy, error"

#: src/eixrc/defaults.cc
msgctxt "TEST_FOR_EMPTY"
Expand Down
29 changes: 23 additions & 6 deletions po/ru.po
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ msgid ""
msgstr ""
"Project-Id-Version: eix\n"
"Report-Msgid-Bugs-To: https://github.com/vaeth/eix/issues/\n"
"POT-Creation-Date: 2023-01-29 14:08+0100\n"
"PO-Revision-Date: 2023-01-29 14:28+0100\n"
"POT-Creation-Date: 2023-05-28 20:45+0200\n"
"PO-Revision-Date: 2023-05-28 20:50+0200\n"
"Last-Translator: Artem Vorotnikov <skybon@gmail.com>\n"
"Language-Team: русский <>\n"
"Language: ru\n"
Expand Down Expand Up @@ -1800,7 +1800,8 @@ msgid ""
" --installed-without-use disabled useflag (of installed package)\n"
"\n"
" Type of Pattern:\n"
" -r, --regex Pattern is a regexp (default)\n"
" -r, --regex Pattern is a regexp, ignoring case (default)\n"
" --regex-case Pattern is a regexp, taking case into account\n"
" -e, --exact Pattern is the exact string\n"
" -z, --substring Pattern is a substring\n"
" -b, --begin Pattern is the beginning of the string\n"
Expand Down Expand Up @@ -2003,7 +2004,9 @@ msgstr ""
" --installed-without-use отключённый флаг use (установленного пакета)\n"
"\n"
" Тип схемы :\n"
" -r, --regex Схема - регулярное выражение (по умолчанию)\n"
" -r, --regex Схема - регулярное выражение, ignoring case\n"
" (по умолчанию)\n"
" --regex-case Схема - регулярное выражение, not ignoring case\n"
" -e, --exact Схема - точная строка\n"
" -z, --substring Схема - подстрока\n"
" -b, --begin Схема - начало строки\n"
Expand Down Expand Up @@ -4064,6 +4067,13 @@ msgid ""
"idepend, error."
msgstr ""

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX_CASE"
msgid ""
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase."
msgstr ""

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX"
msgid ""
Expand Down Expand Up @@ -4099,15 +4109,22 @@ msgid ""
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for begin."
msgstr ""

#: src/eixrc/defaults.cc
msgctxt "MATCH_ALGORITHM_REGEX_CASE_FALLBACK"
msgid ""
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase fallback."
msgstr ""

#: src/eixrc/defaults.cc
msgctxt "DEFAULT_MATCH_ALGORITHM"
msgid ""
"This is a list of strings of the form (spec)regexp[ ]match_algorithm.\n"
"If spec matches the match field(s) and regexp matches the search pattern,\n"
"use match_algorithm as the default.\n"
"A fallback match_algorithm may be specified as the last entry in the list.\n"
"Admissible values for match_algorithm are: regex, pattern, substring,\n"
"begin, end, exact, fuzzy, error."
"Admissible values for match_algorithm are: regex, regexcase, pattern,\n"
"substring, begin, end, exact, fuzzy, error."
msgstr ""

#: src/eixrc/defaults.cc
Expand Down
4 changes: 3 additions & 1 deletion src/eix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,8 @@ static void dump_help() {
" --installed-without-use disabled useflag (of installed package)\n"
"\n"
" Type of Pattern:\n"
" -r, --regex Pattern is a regexp (default)\n"
" -r, --regex Pattern is a regexp, ignoring case (default)\n"
" --regex-case Pattern is a regexp, taking case into account\n"
" -e, --exact Pattern is the exact string\n"
" -z, --substring Pattern is a substring\n"
" -b, --begin Pattern is the beginning of the string\n"
Expand Down Expand Up @@ -485,6 +486,7 @@ EixOptionList::EixOptionList() {
// Algorithms for a criterion
push_back(Option("fuzzy", 'f'));
push_back(Option("regex", 'r'));
push_back(Option("regex-case", O_REGEX_CASE));
push_back(Option("exact", 'e'));
push_back(Option("pattern", 'p'));
push_back(Option("begin", 'b'));
Expand Down
19 changes: 17 additions & 2 deletions src/eixrc/defaults.cc
Original file line number Diff line number Diff line change
Expand Up @@ -913,6 +913,13 @@ AddOption(STRING, "DEFAULT_MATCH_FIELD",
"with-use, without-use, src-uri, deps, depend, rdepend, pdepend, bdepend,\n"
"idepend, error."));

AddOption(STRING, "MATCH_ALGORITHM_REGEX_CASE",
"[][^$|()].*[[:upper:]]|[.][*+?].*[[:upper:]]|"
"[[:upper:]].*[][^$|()]|[[:upper:]].*[.][*+?]",
P_("MATCH_ALGORITHM_REGEX_CASE",
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase."));

AddOption(STRING, "MATCH_ALGORITHM_REGEX",
"[][^$|()]|[.][*+?]", P_("MATCH_ALGORITHM_REGEX",
"This variable is only used for delayed substitution.\n"
Expand All @@ -938,19 +945,27 @@ AddOption(STRING, "MATCH_ALGORITHM_BEGIN",
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for begin."));

AddOption(STRING, "MATCH_ALGORITHM_REGEX_CASE_FALLBACK",
"[[:upper:]]",
P_("MATCH_ALGORITHM_REGEX_CASE_FALLBACK",
"This variable is only used for delayed substitution.\n"
"It is the criterion used in DEFAULT_MATCH_ALGORITHM for regexcase fallback."));

AddOption(STRING, "DEFAULT_MATCH_ALGORITHM",
"%{\\MATCH_ALGORITHM_REGEX_CASE} regexcase "
"%{\\MATCH_ALGORITHM_REGEX} regex "
"%{\\MATCH_ALGORITHM_PATTERN} pattern "
"%{\\MATCH_ALGORITHM_SUBSTRING} substring "
"%{\\MATCH_ALGORITHM_EXACT} exact "
"%{\\MATCH_ALGORITHM_BEGIN} begin "
"%{\\MATCH_ALGORITHM_REGEX_CASE_FALLBACK} regexcase "
"regex", P_("DEFAULT_MATCH_ALGORITHM",
"This is a list of strings of the form (spec)regexp[ ]match_algorithm.\n"
"If spec matches the match field(s) and regexp matches the search pattern,\n"
"use match_algorithm as the default.\n"
"A fallback match_algorithm may be specified as the last entry in the list.\n"
"Admissible values for match_algorithm are: regex, pattern, substring,\n"
"begin, end, exact, fuzzy, error."));
"Admissible values for match_algorithm are: regex, regexcase, pattern,\n"
"substring, begin, end, exact, fuzzy, error."));

AddOption(BOOLEAN, "TEST_FOR_EMPTY",
"true", P_("TEST_FOR_EMPTY",
Expand Down
Loading

0 comments on commit 3688489

Please sign in to comment.