After a year of waiting, pdfgrep 2.1.0 has finally been released. The tarball can be download on the download page. As always: Thanks to everyone who helped with this release.
This release is packed with new features that bring pdfgrep closer to parity with GNU grep:
These two related options open up new possibilities in scripting. Since they return only file names and not page number or matched text, their output can be used as input for other programs or even pdfgrep itself. As such, they are especially useful in combination with
For example, to search for PDFs in the current directory that don’t contain “foo” but contain “bar”, run:
pdfgrep -Z --files-without-match "foo" *.pdf | xargs -0 pdfgrep -H bar
pdfgrep -RilZ rilz | fzf --read0 --print0 | xargs -0 evince
This allows to limit the search to certain pages. For example, to search for a PDF that contains “foo” on its title page, run:
pdfgrep --page-range 1 foo *.pdf
Since its first release, pdfgrep only allowed to search for a single pattern. And while it’s possible to combine multiple search strings into a single regular expression using the
| operator, this is fiddly to do in scripts. Now there are better options (pun intended).
--regexp argument can be specified multiple times and
--file allows to directly provide a list of patterns in a file. Both can be mixed and all patterns are combined implicitly with