Becoming grep Power User
Today I have stepped up the game in grep
usage. I have been doing a major
refactoring and I have learned several neat things along the way.
Perl regex, lookahead, and lookbehind
First thing I have wanted to do was using lookahead and lookbehind in grep
. I
have used both of those briefly in Vim, however Vim does have its own syntax
for those. It turns out that grep
itself does not support those. Luckily,
grep
does have a -P
flag that enables Perl regular expressions.
Looking up Perl syntax for lookahead and lookbehind I was pleasantly surprised
as it turned out to be a lot more intuitive than I have expected. For
lookbehind there is (?<=foo)bar
and (?<!foo)bar
for match and no match,
respectively. Then, if you omit lesser sign you have lookaheads: foo(?=bar)
and foo(?!bar)
. I have remembered the difference by lesser sign resembling
left arrow and suggesting left/behind.
Lookbehind and lookahead do exactly what their name says. Using the regexes from previous paragraph as an example:
- lookbehind will match any
bar
preceded (?<=
) or not (?<!
) byfoo
, and - lookahead will match any
foo
proceeded (?=
) or not (?!
) bybar
.
”How is it useful?” you might wonder. This syntax helps to avoid complicated
exclusion patterns with grep -v
. Moreover, it goes really well with -o
(or
--only-matching
) flag. Let’s roll an example using a usecase that I have
faced today…
Let’s assume that we want to extract all top level assignments from given R file that looks as follows:
SOME_CONSTANT <- list()
my_function <- function() {}
Doing so, can be done using a positive lookahead:
grep -P '^[A-Za-z_]+(?= <-)' file.R
This will match any ^[A-Za-z_]+
, a variable or function name at a start of a
line, followed by <-
, an assignment operator preceded by a space:
SOME_CONSTANT <- list()
my_function <- function() {}
Now we we have to pipe it to another grep
to retrieve just the names, right?
Wrong! Just throw an -o
flag to the initial grep
and we are good.
I have used it to check whether any functions defined in file X are called in
file Y. We already have the hardest part of the recipe written down, so all
that’s left is adding some xargs
:
grep -oP '^[A-Za-z_]+(?= <-)' X | xargs -n 1 -I {} grep '{}(' Y
pcregrep -M
for multiline matching
Next use case that I have had was listing all unexported functions from a file. An exported function in R has an annotation a line above it:
#' @export
my_function <- function() {}
unexported_function <- function() {}
Using a lookbehind immediately came to my mind, so I have eagerly ran the following command:
grep -oP '(?<!@export\n)[A-Za-z_]+(?= <-)'
Too bad… this won’t work. grep
does not support multiline matching.
However, pcregrep
does! pcregerp
is mostly analogous to grep -P
but it
has -M
flag which enables multiline matching. Knowing this, I have updated
the last command and ran it:
pcregrep -oM '(?<!@export\n)[A-Za-z_]+(?= <-)'
For some reason it still does not work! I have found out that moving \n
out
of the lookbehind makes it work:
pcregrep -oM '(?<!@export)\n[A-Za-z_]+(?= <-)'
It is not ideal, as it does omit function in the very first line of the file if one is present (and I have just realized this while writing this post!), but it definitely sufficed.
Edit matched files with Vim
Quite often grep
goes in pair with editing its matches for me. Vim being my
editor of choice and grep
having -l
(or --files-with-matches
) makes it a
breeze! Simply running:
vim $(grep -lr 'foo')
Will descend directories and open all files matching the pattern as buffers in
Vim. You can add a -p
, -o
, or -O
flag to vim
to open those files as
tabs, splits, or vertical splits, respectively. However, if numerous files are
matched it is best to know how to use Vim’s buffers or perform a grep
dry-run
to not run into troubles!
Other grep
flags I find useful
-A n
,-B n
- printn
lines after or before the matched line-h
- omit file names in output-i
- case insensitive search-I
- omit binary files-n
- number lines-r
- descend directories recursively-R
- like-r
but follows symlinks-v
- select not matching lines--exclude
,--exclude-dir
,--include
,--include-dir
- self-explanatory, especially useful with library folders (vendor
for Go,renv
for R)
Summary
Command line tools come with unparalleled flexibility and I feel like
particularly grep
is the backbone of those tools. I find myself using it on a
daily basis, and having just learned new ways of using it makes me look forward
to using it in the future!