aboutsummaryrefslogtreecommitdiff
path: root/doc/html/pcre2grep.html
diff options
context:
space:
mode:
Diffstat (limited to 'doc/html/pcre2grep.html')
-rw-r--r--doc/html/pcre2grep.html143
1 files changed, 97 insertions, 46 deletions
diff --git a/doc/html/pcre2grep.html b/doc/html/pcre2grep.html
index 29ab0311..bd12246a 100644
--- a/doc/html/pcre2grep.html
+++ b/doc/html/pcre2grep.html
@@ -21,7 +21,7 @@ please consult the man page, in case the conversion went wrong.
<li><a name="TOC6" href="#SEC6">OPTIONS</a>
<li><a name="TOC7" href="#SEC7">ENVIRONMENT VARIABLES</a>
<li><a name="TOC8" href="#SEC8">NEWLINES</a>
-<li><a name="TOC9" href="#SEC9">OPTIONS COMPATIBILITY</a>
+<li><a name="TOC9" href="#SEC9">OPTIONS COMPATIBILITY WITH GNU GREP</a>
<li><a name="TOC10" href="#SEC10">OPTIONS WITH DATA</a>
<li><a name="TOC11" href="#SEC11">USING PCRE2'S CALLOUT FACILITY</a>
<li><a name="TOC12" href="#SEC12">MATCHING ERRORS</a>
@@ -71,15 +71,16 @@ For example:
<pre>
pcre2grep some-pattern file1 - file3
</pre>
-By default, input files are searched line by line. Each line that matches a
-pattern is copied to the standard output, and if there is more than one file,
-the file name is output at the start of each line, followed by a colon.
-However, there are options that can change how <b>pcre2grep</b> behaves. For
-example, the <b>-M</b> option makes it possible to search for strings that span
-line boundaries. What defines a line boundary is controlled by the <b>-N</b>
-(<b>--newline</b>) option. The <b>-h</b> and <b>-H</b> options control whether or
-not file names are shown, and the <b>-Z</b> option changes the file name
-terminator to a zero byte.
+By default, input files are searched line by line, so pattern assertions about
+the beginning and end of a subject string (^, $, \A, \Z, and \z) match at
+the beginning and end of each line. When a line matches a pattern, it is copied
+to the standard output, and if there is more than one file, the file name is
+output at the start of each line, followed by a colon. However, there are
+options that can change how <b>pcre2grep</b> behaves. For example, the <b>-M</b>
+option makes it possible to search for strings that span line boundaries. What
+defines a line boundary is controlled by the <b>-N</b> (<b>--newline</b>) option.
+The <b>-h</b> and <b>-H</b> options control whether or not file names are shown,
+and the <b>-Z</b> option changes the file name terminator to a zero byte.
</P>
<P>
The amount of memory used for buffering files that are being scanned is
@@ -99,6 +100,10 @@ allow for buffering "before" and "after" lines. If the buffer size is too
small, fewer than requested "before" and "after" lines may be output.
</P>
<P>
+When matching with a multiline pattern, the size of the buffer must be at least
+half of the maximum match expected or the pattern might fail to match.
+</P>
+<P>
Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater.
BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>. When there is more than one pattern
(specified by the use of <b>-e</b> and/or <b>-f</b>), each pattern is applied to
@@ -249,7 +254,7 @@ exactly the same as the number of lines that would have been output, but if the
suppressed lines than the count (that is, the number of matches).
<br>
<br>
-If no lines are selected, the number zero is output. If several files are are
+If no lines are selected, the number zero is output. If several files are
being scanned, a count is output for each of them and the <b>-t</b> option can
be used to cause a total to be output at the end. However, if the
<b>--files-with-matches</b> option is also used, only those files whose counts
@@ -314,6 +319,14 @@ end-of-file; in others it may provoke an error.
See <b>--match-limit</b> below.
</P>
<P>
+<b>-E</b>, <b>--case-restrict</b>
+When case distinctions are being ignored in Unicode mode, two ASCII letters (K
+and S) will by default match Unicode characters U+212A (Kelvin sign) and U+017F
+(long S) respectively, as well as their lower case ASCII counterparts. When
+this option is set, case equivalences are restricted such that no ASCII
+character matches a non-ASCII character, and vice versa.
+</P>
+<P>
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>, <b>--regexp=</b><i>pattern</i>
Specify a pattern to be matched. This option can be used multiple times in
order to specify several patterns. It can also be used as a way of specifying a
@@ -413,6 +426,11 @@ match in a line, each of them is shown separately. This option is mutually
exclusive with <b>--output</b>, <b>--line-offsets</b>, and <b>--only-matching</b>.
</P>
<P>
+<b>--group-separator</b>=<i>text</i>
+Output this text string instead of two hyphens between groups of lines when
+<b>-A</b>, <b>-B</b>, or <b>-C</b> is in use. See also <b>--no-group-separator</b>.
+</P>
+<P>
<b>-H</b>, <b>--with-filename</b>
Force the inclusion of the file name at the start of output lines when
searching a single file. The file name is not normally shown in this case.
@@ -449,7 +467,9 @@ Ignore binary files. This is equivalent to
</P>
<P>
<b>-i</b>, <b>--ignore-case</b>
-Ignore upper/lower case distinctions during comparisons.
+Ignore upper/lower case distinctions when pattern matching. This applies when
+matching path names for inclusion or exclusion as well as when matching lines
+in files.
</P>
<P>
<b>--include</b>=<i>pattern</i>
@@ -544,16 +564,24 @@ used. There is no short form for this option.
<P>
<b>-M</b>, <b>--multiline</b>
Allow patterns to match more than one line. When this option is set, the PCRE2
-library is called in "multiline" mode. This allows a matched string to extend
-past the end of a line and continue on one or more subsequent lines. Patterns
-used with <b>-M</b> may usefully contain literal newline characters and internal
-occurrences of ^ and $ characters. The output for a successful match may
-consist of more than one line. The first line is the line in which the match
-started, and the last line is the line in which the match ended. If the matched
-string ends with a newline sequence, the output ends at the end of that line.
-If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
-match has been handled, scanning restarts at the beginning of the line after
-the one in which the match ended.
+library is called in "multiline" mode, and a match is allowed to continue past
+the end of the initial line and onto one or more subsequent lines.
+<br>
+<br>
+Patterns used with <b>-M</b> may usefully contain literal newline characters and
+internal occurrences of ^ and $ characters, because in multiline mode these can
+match at internal newlines. Because <b>pcre2grep</b> is scanning multiple lines,
+the \Z and \z assertions match only at the end of the last line in the file.
+The \A assertion matches at the start of the first line of a match. This can
+be any line in the file; it is not anchored to the first line.
+<br>
+<br>
+The output for a successful match may consist of more than one line. The first
+line is the line in which the match started, and the last line is the line in
+which the match ended. If the matched string ends with a newline sequence, the
+output ends at the end of that line. If <b>-v</b> is set, none of the lines in a
+multi-line match are output. Once a match has been handled, scanning restarts
+at the beginning of the line after the one in which the match ended.
<br>
<br>
The newline sequence that separates multiple lines must be matched as part of
@@ -570,8 +598,11 @@ well as possibly handling a two-character newline sequence.
<br>
There is a limit to the number of lines that can be matched, imposed by the way
that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
-large processing buffer, this should not be a problem, but the <b>-M</b> option
-does not work when input is read line by line (see <b>--line-buffered</b>.)
+large processing buffer, this should not be a problem.
+<br>
+<br>
+The <b>-M</b> option does not work when input is read line by line (see
+<b>--line-buffered</b>.)
</P>
<P>
<b>-m</b> <i>number</i>, <b>--max-count</b>=<i>number</i>
@@ -661,11 +692,17 @@ pattern to match more than one line, only the first is preceded by its line
number. This option is forced if <b>--line-offsets</b> is used.
</P>
<P>
+<b>--no-group-separator</b>
+Do not output a separator between groups of lines when <b>-A</b>, <b>-B</b>, or
+<b>-C</b> is in use. The default is to output a line containing two hyphens. See
+also <b>--group-separator</b>.
+</P>
+<P>
<b>--no-jit</b>
If the PCRE2 library is built with support for just-in-time compiling (which
speeds up matching), <b>pcre2grep</b> automatically makes use of this, unless it
was explicitly disabled at build time. This option can be used to disable the
-use of JIT at run time. It is provided for testing and working round problems.
+use of JIT at run time. It is provided for testing and working around problems.
It should never be needed in normal use.
</P>
<P>
@@ -759,6 +796,18 @@ Specify a separating string for multiple occurrences of <b>-o</b>. The default
is an empty string. Separating strings are never coloured.
</P>
<P>
+<b>-P</b>, <b>--no-ucp</b>
+Starting from release 10.43, when UTF/Unicode mode is specified with <b>-u</b>
+or <b>-U</b>, the PCRE2_UCP option is used by default. This means that the
+POSIX classes in patterns match more than just ASCII characters. For example,
+[:digit:] matches any Unicode decimal digit. The <b>--no-ucp</b> option
+suppresses PCRE2_UCP, thus restricting the POSIX classes to ASCII characters,
+as was the case in earlier releases. Note that there are now more fine-grained
+option settings within patterns that affect individual classes. For example,
+when in UCP mode, the sequence (?aP) restricts [:word:] to ASCII letters, while
+allowing \w to match Unicode letters and digits.
+</P>
+<P>
<b>-q</b>, <b>--quiet</b>
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
@@ -796,11 +845,11 @@ total would always be zero.
</P>
<P>
<b>-u</b>, <b>--utf</b>
-Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
-with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
-<b>--include</b> options) and all lines that are scanned must be valid strings
-of UTF-8 characters. If an invalid UTF-8 string is encountered, an error
-occurs.
+Operate in UTF/Unicode mode. This option is available only if PCRE2 has been
+compiled with UTF-8 support. All patterns (including those for any
+<b>--exclude</b> and <b>--include</b> options) and all lines that are scanned
+must be valid strings of UTF-8 characters. If an invalid UTF-8 string is
+encountered, an error occurs.
</P>
<P>
<b>-U</b>, <b>--utf-allow-invalid</b>
@@ -883,25 +932,27 @@ ends of output lines that are copied from the input is not converted to
standard output must end with "\r\n". For all other operating systems, and
for all messages to the standard error stream, "\n" is used.
</P>
-<br><a name="SEC9" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
+<br><a name="SEC9" href="#TOC1">OPTIONS COMPATIBILITY WITH GNU GREP</a><br>
<P>
-Many of the short and long forms of <b>pcre2grep</b>'s options are the same
-as in the GNU <b>grep</b> program. Any long option of the form
-<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
-(PCRE2 terminology). However, the <b>--depth-limit</b>, <b>--file-list</b>,
-<b>--file-offsets</b>, <b>--heap-limit</b>, <b>--include-dir</b>,
-<b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>, <b>-M</b>,
-<b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--om-separator</b>,
-<b>--output</b>, <b>-u</b>, <b>--utf</b>, <b>-U</b>, and <b>--utf-allow-invalid</b>
-options are specific to <b>pcre2grep</b>, as is the use of the
-<b>--only-matching</b> option with a capturing parentheses number.
+Many of the short and long forms of <b>pcre2grep</b>'s options are the same as
+in the GNU <b>grep</b> program. Any long option of the form <b>--xxx-regexp</b>
+(GNU terminology) is also available as <b>--xxx-regex</b> (PCRE2 terminology).
+However, the <b>--case-restrict</b>, <b>--depth-limit</b>, <b>-E</b>,
+<b>--file-list</b>, <b>--file-offsets</b>, <b>--heap-limit</b>,
+<b>--include-dir</b>, <b>--line-offsets</b>, <b>--locale</b>, <b>--match-limit</b>,
+<b>-M</b>, <b>--multiline</b>, <b>-N</b>, <b>--newline</b>, <b>--no-ucp</b>,
+<b>--om-separator</b>, <b>--output</b>, <b>-P</b>, <b>-u</b>, <b>--utf</b>,
+<b>-U</b>, and <b>--utf-allow-invalid</b> options are specific to
+<b>pcre2grep</b>, as is the use of the <b>--only-matching</b> option with a
+capturing parentheses number.
</P>
<P>
Although most of the common options work the same way, a few are different in
<b>pcre2grep</b>. For example, the <b>--include</b> option's argument is a glob
-for GNU <b>grep</b>, but a regular expression for <b>pcre2grep</b>. If both the
-<b>-c</b> and <b>-l</b> options are given, GNU grep lists only file names,
-without counts, but <b>pcre2grep</b> gives the counts as well.
+for GNU <b>grep</b>, but in <b>pcre2grep</b> it is a regular expression to which
+the <b>-i</b> option applies. If both the <b>-c</b> and <b>-l</b> options are
+given, GNU grep lists only file names, without counts, but <b>pcre2grep</b>
+gives the counts as well.
</P>
<br><a name="SEC10" href="#TOC1">OPTIONS WITH DATA</a><br>
<P>
@@ -1065,9 +1116,9 @@ Cambridge, England.
</P>
<br><a name="SEC16" href="#TOC1">REVISION</a><br>
<P>
-Last updated: 21 November 2022
+Last updated: 22 December 2023
<br>
-Copyright &copy; 1997-2022 University of Cambridge.
+Copyright &copy; 1997-2023 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.