News

pdfgrep 1.4.0 released

by Hans-Peter Deifel on August, 14 2015

pdfgrep 1.4.0 is now available and contains many improvements and new features. Thanks to everyone who helped with this release!

Here is an overview over the changes:

New regex implementations

pdfgrep finally supports searching for fixed strings as well as Perl compatible regular expressions (PCRE). This allows for much more complex searches:

pdfgrep -P "(a|b)c\1" foo.pdf

But also more simple ones, such as searching for the string .*:

pdfgrep -F ".*" foo.pdf

More grep compatibility

The --null and --only-matching switches from grep have found their way into pdfgrep. Especially the first option allows for more robust scripting.

Usability improvements

pdfgrep now optionally prints a warning (with --warn-empty) if a PDF file contains no searchable text. This prevents surprises when searching e.g scanned documents, that usually consist only of images although they appear to contain text.

You can now change the prefix separator with --match-prefix-separator to something else:

$ pdfgrep -n --match-prefix-separator "|" foo foo.pdf
foo.pdf|4|foobar

This is especially useful if your filenames frequently contain colons, as is the case under windows.

Also, it is now possible to search multiple PDFs encrypted with different passwords by passing more than one --password argument to pdfgrep. Each password will be tried on each PDF.