pdfgrep 1.4.0 is now available and contains many improvements and new features. Thanks to everyone who helped with this release!
Here is an overview over the changes:
New regex implementations
pdfgrep finally supports searching for fixed strings as well as Perl compatible regular expressions (PCRE). This allows for much more complex searches:
pdfgrep -P "(a|b)c\1" foo.pdf
But also more simple ones, such as searching for the string
pdfgrep -F ".*" foo.pdf
More grep compatibility
--only-matching switches from grep have found their way into pdfgrep. Especially the first option allows for more robust scripting.
pdfgrep now optionally prints a warning (with
--warn-empty) if a PDF file contains no searchable text. This prevents surprises when searching e.g scanned documents, that usually consist only of images although they appear to contain text.
You can now change the prefix separator with
--match-prefix-separator to something else:
$ pdfgrep -n --match-prefix-separator "|" foo foo.pdf foo.pdf|4|foobar
This is especially useful if your filenames frequently contain colons, as is the case under windows.
Also, it is now possible to search multiple PDFs encrypted with different passwords by passing more than one
--password argument to pdfgrep. Each password will be tried on each PDF.