Becoming grep Power User

Today I have stepped up the game in grep usage. I have been doing a major refactoring and I have learned several neat things along the way.

Perl regex, lookahead, and lookbehind

First thing I have wanted to do was using lookahead and lookbehind in grep. I have used both of those briefly in Vim, however Vim does have its own syntax for those. It turns out that grep itself does not support those. Luckily, grep does have a -P flag that enables Perl regular expressions.

Looking up Perl syntax for lookahead and lookbehind I was pleasantly surprised as it turned out to be a lot more intuitive than I have expected. For lookbehind there is (?<=foo)bar and (?<!foo)bar for match and no match, respectively. Then, if you omit lesser sign you have lookaheads: foo(?=bar) and foo(?!bar). I have remembered the difference by lesser sign resembling left arrow and suggesting left/behind.

Lookbehind and lookahead do exactly what their name says. Using the regexes from previous paragraph as an example:

”How is it useful?” you might wonder. This syntax helps to avoid complicated exclusion patterns with grep -v. Moreover, it goes really well with -o (or --only-matching) flag. Let’s roll an example using a usecase that I have faced today…

Let’s assume that we want to extract all top level assignments from given R file that looks as follows:

SOME_CONSTANT <- list()

my_function <- function() {}

Doing so, can be done using a positive lookahead:

grep -P '^[A-Za-z_]+(?= <-)' file.R

This will match any ^[A-Za-z_]+, a variable or function name at a start of a line, followed by <-, an assignment operator preceded by a space:

SOME_CONSTANT <- list()
my_function <- function() {}

Now we we have to pipe it to another grep to retrieve just the names, right? Wrong! Just throw an -o flag to the initial grep and we are good.

I have used it to check whether any functions defined in file X are called in file Y. We already have the hardest part of the recipe written down, so all that’s left is adding some xargs:

grep -oP '^[A-Za-z_]+(?= <-)' X | xargs -n 1 -I {} grep '{}(' Y

pcregrep -M for multiline matching

Next use case that I have had was listing all unexported functions from a file. An exported function in R has an annotation a line above it:

#' @export
my_function <- function() {}

unexported_function <- function() {}

Using a lookbehind immediately came to my mind, so I have eagerly ran the following command:

grep -oP '(?<!@export\n)[A-Za-z_]+(?= <-)'

Too bad… this won’t work. grep does not support multiline matching. However, pcregrep does! pcregerp is mostly analogous to grep -P but it has -M flag which enables multiline matching. Knowing this, I have updated the last command and ran it:

pcregrep -oM '(?<!@export\n)[A-Za-z_]+(?= <-)'

For some reason it still does not work! I have found out that moving \n out of the lookbehind makes it work:

pcregrep -oM '(?<!@export)\n[A-Za-z_]+(?= <-)'

It is not ideal, as it does omit function in the very first line of the file if one is present (and I have just realized this while writing this post!), but it definitely sufficed.

Edit matched files with Vim

Quite often grep goes in pair with editing its matches for me. Vim being my editor of choice and grep having -l (or --files-with-matches) makes it a breeze! Simply running:

vim $(grep -lr 'foo')

Will descend directories and open all files matching the pattern as buffers in Vim. You can add a -p, -o, or -O flag to vim to open those files as tabs, splits, or vertical splits, respectively. However, if numerous files are matched it is best to know how to use Vim’s buffers or perform a grep dry-run to not run into troubles!

Other grep flags I find useful

Summary

Command line tools come with unparalleled flexibility and I feel like particularly grep is the backbone of those tools. I find myself using it on a daily basis, and having just learned new ways of using it makes me look forward to using it in the future!