Redis and Node Part 2: Shared State

Multithreaded JavaScript has been published with O'Reilly!

This is part two of a four part series on using Redis with Node.js. If you're wondering how we got here take a look at Redis and Node Part 1: The Basics. The content of these posts is partially adapted from my book, Advanced Microservices. There is also a companion presentation, Node, Redis, and You!, which I've given at several Meetups and a conference.

If we only ever had a single instance of Node running and if we had no risk of our application crashing then we might not need something like Redis. However we absolutely have a risk of our application code running and we absolutely want to be able to run multiple application instances for both stability and increasing traffic load. A typically safe assumption is that the software you write (e.g. a Node app) for a project is less stable than Open Source software used by thousands of companies (e.g. Redis). With this assumption we try to push state out of our code whenever possible.

Simple Visitor Counter

In this post we're going to be looking at a very simple application, one which counts the number of visitors to a website and prints this value to the user.

As a naïve approach we first decide to keep this variable in memory. After all, Node is a long running process, and creating a variable outside of a request callback will allow us to access this variable between multiple requests. As you can see, with each request, we increment the counter variable and send the result back to the user.

We're also sending back the Process ID (PID). For now this PID is always the same as we always have a single Node instance running. Later this value will be more interesting.

const http = require('http');

let counter = 0;
http.createServer((req, res) => {
  res.end(JSON.stringify({
    counter: ++counter,
    pid: process.pid
  }));
}).listen(9999);

With each new request we make to our server (achieved here by issuing curl requests) we see that the counter value properly increments in each case.

$ curl http://localhost:9999
{"counter":1,"pid":20000}
$ curl http://localhost:9999
{"counter":2,"pid":20000}
$ curl http://localhost:9999
{"counter":3,"pid":20000}

Let's now look at a more complex situation.

Visitor Counter with Cluster

The application we've written today will one day need to scale. Turns out people are really interested in finding out what their visitor number is for our super awesome website. The way we scale Node applications is to run multiple instances of Node processes. This is a common approach used by many different technologies.

Here is a diagram of how this works:

Simple Load Balancer
Simple Load Balancer

Here we see that a User Request enters the concern of our application. The request first hits a Load Balancer (LB). It is the job of an LB to keep track of appropriate application instances to route the requests to. In this case there are two instances running, #1 and #2, and so the request is then routed to one of them.

Sometimes we use dedicated software for functioning as a Load Balancer, such as HAProxy or Nginx. However, for the sake of a simple blog post, we'll make use of the cluster module which is built into node. The way this module works is that when we first execute a Node script that script will act as Master (meaning be the LB). Within this master we can then choose to fork some more processes. Those processes will not be master. HTTP requests made to the Master will then be sent to the other processes using a pattern called Round Robin. This means each new request will cycle through the available processes.

const cluster = require('cluster');
const http = require('http');

if (cluster.isMaster) {
  cluster.fork();
  cluster.fork();
} else {
  let counter = 0;
  http.createServer((req, res) => {
    res.end(JSON.stringify({
      counter: ++counter,
      pid: process.pid
    }));
  }).listen(9999);
}

Unfortunately we've introduced a bug with our application! There are now three Node instances running, one is the Master (which isn't doing too much) and then the two work processes. Each of the work processes has their own counter variable defined. As the requests are made we cycle through the two work processes which causes the counter variable to appear to increment only half of the time.

Also notice in the output how the PID changes as we hit each process. The same PID will never report the same counter value twice and is always incrementing. Everything is behaving as it should, just not as we want it to.

$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":1,"pid":20002}
$ curl http://localhost:9999
{"counter":2,"pid":20001}

Now let's see how we can make use of Redis to fix this catastrophe.

Visitor Counter with Redis and Cluster

In this example we're still going to use the same cluster module, however we're now including the Redis library and offloading some work to Redis. Previously we kept the counter value in Node memory. This approach was simple and very fast but had two side effects. The first side effect we already covered in that there are two different values. The second side effect is that if one of the processes were to be killed and restarted then the counter value would be lost forever.

Here's a new diagram for our application with shared state in Redis:

Load Balancer Shared State
Load Balancer Shared State

The requests from the user still enter the system and are distributed to our application instances via round-robin. However with each request the Node application tells Redis to perform an increment against the counter value.

As we saw in Part 1 of this series, Redis assumes a sane default of a numeric value of 0. This is nice because we don't have to first set the value to 0. Imagine if we were constantly adding and removing application instances. If each instance set the counter value to 0 when it started we would wipe out the value with each instance restart!

Here's the code required to get our functioning and scalable user counting application running:

const cluster = require('cluster');
const http = require('http');
const redis = require('redis').createClient();

if (cluster.isMaster) {
  cluster.fork();
  cluster.fork();
} else {
  http.createServer((req, res) => {
    redis.incr('counter', (error, data) => {
      res.end(JSON.stringify({counter: data, pid: process.pid}));
    });
  }).listen(9999);
}

Now when we make request we see that we're routed between multiple application instances via the alternating PIDs. We also see that the counter value is always increasing. GeoCities would be proud:

$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":2,"pid":20002}
$ curl http://localhost:9999
{"counter":3,"pid":20001}

That's the end of part two, it will be the shortest of this series of posts. Next we'll look at atomicity concerns. If you found this content useful, please checkout my book Advanced Microservices.

Thomas Hunter II Avatar

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.