pdfgrep 1.4.0 is now available and contains many improvements and new features. Thanks to everyone who helped with this release!
Here is an overview over the changes:
New regex implementations
pdfgrep finally supports searching for fixed strings as well as Perl compatible regular expressions (PCRE). This allows for much more complex searches:
pdfgrep -P "(a|b)c\1" foo.pdf
But also more simple ones, such as searching for the string .*
:
pdfgrep -F ".*" foo.pdf
More grep compatibility
The --null
and --only-matching
switches from grep have found their
way into pdfgrep. Especially the first option allows for more robust
scripting.
Usability improvements
pdfgrep now optionally prints a warning (with --warn-empty
) if a PDF
file contains no searchable text. This prevents surprises when
searching e.g scanned documents, that usually consist only of images
although they appear to contain text.
You can now change the prefix separator with
--match-prefix-separator
to something else:
$ pdfgrep -n --match-prefix-separator "|" foo foo.pdf
foo.pdf|4|foobar
This is especially useful if your filenames frequently contain colons, as is the case under windows.
Also, it is now possible to search multiple PDFs encrypted with
different passwords by passing more than one --password
argument to
pdfgrep. Each password will be tried on each PDF.