Home GitHub Patreon
Discussions RSS Twitter

Font-locking with custom matchers

Date Change
2017-06-21 The regexp was fixed to match $FOO_BAR and skip escaped dollars (see this PR)

Previously I have written about font-lock anchored matchers. Today I have come across another problem and another solution with Emacs's font-lock mechanism. This trend is starting to annoy me.

I now work at a DevOps position and so I have been writing a lot of shell scripts lately to glue all the stuff we do together. One missing feature that kept bugging me was the font-locking of interpolated variables in sh-mode… that is, the fact that there wasn't any!

FOO="hello"
# ugly brown!
BAR="hello $FOO"
# I would really like to see that $FOO is *not* being interpolated
BAZ='hello $FOO'
# in regular usage things are highlighted, but you should always quote, right?
bash $FOO $BAR ${BAZ}

The problem with shell programming is that you quote 90% of the time and so the font-locking is wasted :/ You might say, well, just throw in the regexp, right? Not quite. We do not want to highlight variables in single-quoted strings where they are not interpolated. This means we need to only match variables in certain syntactic context.

In the previous post I mentioned you can use custom matchers which are essentially regular elisp functions so long as you conform to the interface of re-search-forward. So that is exactly what I've done.

The following function is a bit convoluted because you need to check the parse state. Note that the function should only "process" one match as font-lock will execute it in a loop until it returns nil or moves point after the limit.

  1. Look for the next string matching a variable syntax, which is either
    • a $ followed by word syntax, or
    • a $ followed by text enclosed in {}.
  2. If no match then return nil. This will tell the font-lock engine there is nothing to do up to limit. This happens if the re-search-forward returns nil right away or eventually runs out of matches as we get over limit.
  3. If match, check if we are inside a double-quoted string.
    • If so, great, announce a match with a throw and the point where we ended (this is not strictly necessary, you only need to return non-nil).
    • If not GOTO 1.
(defun my-match-variables-in-quotes (limit)
  "Match variables in double-quotes in `sh-mode'."
  (with-syntax-table sh-mode-syntax-table
    (catch 'done
      (while (re-search-forward
              ;; `rx' is cool, mkay.
              (rx (or line-start (not (any "\\")))
                  (group "$")
                  (group
                   (or (and "{" (+? nonl) "}")
                       (and (+ (any alnum "_")))
                       (and (any "*" "@" "#" "?" "-" "$" "!" "0" "_")))))
              limit t)
        (-when-let (string-syntax (nth 3 (syntax-ppss)))
          (when (= string-syntax 34)
            (throw 'done (point))))))))

Add the support to the current buffer (use nil as first argument) or sh-mode globally (use 'sh-mode):

(font-lock-add-keywords
 'sh-mode '((my-match-variables-in-quotes
             (1 'default t)
             (2 font-lock-variable-name-face t))))

Quite simple and the outcome is very satisfying. Makes reading shell scripts a lot better in my opinion. If any of you cares to submit this upstream go ahead, I have signed the contributor papers but I hereby withdraw all claims on the above code so you don't have to go through hoops :)

FOO="hello"
# yay
BAR="hello $FOO and also ${FOO}"
# No interpolation in single-quotes
BAZ='hello $FOO'

Published at: 2017-06-11 20:58 Last updated at: 2023-02-08 15:59
Found a typo? Edit on GitHub!