Node.js is Too Inefficient to Monitor Files, So Use Bash Scripts Instead

September 6th, 2014 Permalink

There are a range of project build managers, scaffolds, and task runners in the Node.js ecosystem such as Grunt, Gulp, Yeoman and so forth. A core feature of most of these tools is the ability to watch for file system changes and take some action in response, such as running lint tools, building the project, or restarting a local server to pick up new code. You can also roll your own watcher in Node.js using packages such as node-supervisor or nodemon.

This is all fine and just works out of the box for small projects. But if your project is even of a moderate size, containing a few hundred files that have to be monitored, then using a Node.js monitoring tool is just too inefficient. Any tool that makes use of the minimatch package will peg the processor to maximum capacity when monitoring a project of this size - unless you set a very long check interval, which somewhat defeats the point of even having automation. Anything more than a second or two in response time is annoying and even counterproductive for active development use of these tools. Even the best of the pack, node-supervisor, will take a heavy toll on the processor in these circumstances. This all becomes far worse if you are using a fairly standard Vagrant setup in which your project directory is shared between the host machine and guest VM, with a server and restart monitors running on the guest and your IDE running on the host.

The solution here is simple: don't use Node.js tools to monitor files. Even the lowest level and least expensive operations you can carry out, such as fs.stat() as used in node-supervisor, are too costly in this context. For greater efficiency you should switch to using simple Bash scripts that use find. If you have to invoke Node.js code in response to changes then just run it directly in your script, and there you have it, problem solved. This approach will support monitoring thousands or tens of thousands of files with minimal cost processor time and a much faster response time than is possible using the tools available in Node.js.

Here is an example of an efficient monitoring Bash script that can be tailored to your particular circumstances:

#!/bin/bash
#
# Place this script into your project and run it to monitor specific project
# directories and all their subdirectory contents for file updates.
#
# To make this useful, you will have to:
#
# 1) Adjust the PROJECT_DIR variable as appropriate depending on
#     where this script is located in the project.
#
# 2) Add directories to the MONITOR array.
#
# 3) Add the set of actions to be taken when a change happens, such
#     as running a task, restarting a server process, and so forth.
#
# Note that this script blocks and runs until killed, so you may want to
# launch it as a background task.
#

# The absolute path of the directory containing this script.
DIR="$( cd "$( dirname "$0" )" && pwd)"
# Where is the top level project directory relative to this script?
PROJECT_DIR="${DIR}/.."

# Set up a list of directories to monitor.
MONITOR=()
MONITOR+=( "${PROJECT_DIR}/css" )
MONITOR+=( "${PROJECT_DIR}/js" )
MONITOR+=( "${PROJECT_DIR}/template" )

# This file will be used as a timestamp reference point.
TIMESTAMP_FILE="/tmp/file-monitor-ts"

# The interval in seconds between each check on monitored files.
INTERVAL_SECONDS=1
# How long in the past to to set the timestamp on the reference file
# used for comparison. This is probably overkill, but when running
# Vagrant VMs with synced folders you can run into all sorts of
# interesting behavior with regard to updating timestamps.
LOOKBACK_SECONDS=5

# The last set of updates. We keep this for comparison purposes.
# Since the lookback covers multiple cycles of monitoring for changes
# we need to be able to update only if there are fresh changes in
# the present cycle.
LAST_UPDATES=""

# Loop indefinitely. Killing this process is the only way to exit it,
# which is fine, but you may want to add some sort of check on other
# criteria so that it can shut itself down in response to circumstances.
while [[ true ]] ; do
  # OS X has a date command signature that differs significantly from
  # that used in Linux distros.
  if [[ ${OSTYPE} =~ ^darwin ]]; then
    TIMESTAMP=`date +%s`
    TIMESTAMP=$(( ${TIMESTAMP} - ${LOOKBACK_SECONDS} ))
    TIMESTAMP=`date -r ${TIMESTAMP} +%m%d%H%M.%S`
  else
    TIMESTAMP=`date -d "-${LOOKBACK_SECONDS} sec" +%m%d%H%M.%S`
  fi

  # Create or update the reference timestamp file.
  touch -t ${TIMESTAMP} "${TIMESTAMP_FILE}"

  # Identify updates by comparison with the reference timestamp file.
  UPDATES=`find ${MONITOR[*]} -type f -newer ${TIMESTAMP_FILE}`

  if [[ "${UPDATES}" ]] ; then
    # Pass the updates through ls or stat in order to add a timestamp for
    # each result. Thus if the same file is updated several times over several
    # monitor cycles it will still trigger when compared to the prior set of
    # updates.
    if [[ ${OSTYPE} =~ ^darwin ]]; then
      UPDATES=`stat -F ${UPDATES}`
    else
      UPDATES=`ls --full-time ${UPDATES}`
    fi

    # Only take action if there are new changes in this monitor cycle.
    if [[ "${UPDATES}" != "${LAST_UPDATES}" ]] ; then

      # Take action here. E.g. run a grunt task, restart a server, or run
      # linting and building operations.

    fi
  fi

  LAST_UPDATES="${UPDATES}"
  sleep ${INTERVAL_SECONDS}
done