pdfgrep 1.4.0 is now available and contains many improvements and new features. Thanks to everyone who helped with this release!
Here is an overview over the changes:
New regex implementations
pdfgrep finally supports searching for fixed strings as well as Perl compatible regular expressions (PCRE). This allows for much more complex searches:
pdfgrep -P "(a|b)c\1" foo.pdf
But also more simple ones, such as searching for the string .*
:
pdfgrep -F ".*" foo.pdf
More grep compatibility
The --null
and --only-matching
switches from grep have found their way into pdfgrep. Especially the first option allows for more robust scripting.
Usability improvements
pdfgrep now optionally prints a warning (with --warn-empty
) if a PDF file contains no searchable text. This prevents surprises when searching e.g scanned documents, that usually consist only of images although they appear to contain text.
You can now change the prefix separator with --match-prefix-separator
to something else:
$ pdfgrep -n --match-prefix-separator "|" foo foo.pdf
foo.pdf|4|foobar
This is especially useful if your filenames frequently contain colons, as is the case under windows.
Also, it is now possible to search multiple PDFs encrypted with different passwords by passing more than one --password
argument to pdfgrep. Each password will be tried on each PDF.