Node, Redis, and You!

Presented by @tlhunter@mastodon.social

Distributed Systems with Node.js: bit.ly/34SHToF

Beginner Use Case:

Website Visitor Counter

Simple Visitor Counter

  • Naïve approach: I can simply keep data in Node memory!
const http = require('http');

let counter = 0;
http.createServer((req, res) => {
  res.end(JSON.stringify({counter: ++counter, pid: process.pid}));
}).listen(9999);
  • Counter happily increments every request:
$ curl http://localhost:9999
{"counter":1,"pid":20000}
$ curl http://localhost:9999
{"counter":2,"pid":20000}
$ curl http://localhost:9999
{"counter":3,"pid":20000}
  • Life is perfect, what could ever possibly go wrong?

Visitor Counter with Cluster

  • What happens when we want to scale to many processes?
const cluster = require('cluster');
const http = require('http');

if (cluster.isMaster) {
  cluster.fork(); cluster.fork();
} else {
  let counter = 0;
  http.createServer((req, res) => {
    res.end(JSON.stringify({counter: ++counter, pid: process.pid}));
  }).listen(9999);
}
  • Oh No! Two sets of data! What can we do to fix it?!
$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":1,"pid":20002}
$ curl http://localhost:9999
{"counter":2,"pid":20001}
  • But first, let's learn a little more about Redis.

Introduction to Redis

  • In-Memory Key/Value store with several data structure types
  • Create if not exist, destroy if empty philosophy
  • Much like Node, Redis is single-threaded
  • Communication is simple, could telnet and run commands
  • Authentication is disabled by default, bind on local interface
  • Can set per-key expiry, or use global Least Recently Used (LRU)
$ brew install redis # sudo apt-get install redis
$ redis-cli
127.0.0.1:6379> SET xyz 'Hello'
OK
127.0.0.1:6379> GET xyz
"Hello"
127.0.0.1:6379> DEL xyz
(integer) 1
127.0.0.1:6379> GET xyz
(nil)
127.0.0.1:6379> QUIT

Redis Data Structures

  • Strings - Simple key/value pair
    • Can Increment/Decrement numeric strings
  • Lists - Ordered list of data (Linked List)
    • Push, Pop, Shift, Unshift, similar to JavaScript Array
  • Hashes - Field/Value pairs within a single Redis key
    • Set/Get fields, similar to a JavaScript objects
  • Sets - Unordered collection of unique strings
    • Add/Remove items, similar to JavaScript ES2015 Set
  • Sorted Sets - Strings sorted by numeric scores
    • Values are unique, query based on ranges
  • GeoLocation - Collections of Lat/Lon to String pairs
    • Query items based on distance
  • Pub/Sub - Broadcast messages to channels

Using Redis with Node

  • Install Redis NPM module:
$ npm install --save redis
  • Require and call Redis commands:
const redis = require('redis').createClient();

redis.set('hello', 'world', (err) => {
  if (err) { throw err; }
  redis.get('hello', (err, data) => {
    if (err) { throw err; }
    console.log(`Hello, ${data}!`); // outputs 'Hello, world!'
    redis.quit();
  });
});
  • Commands are queued, no need for connection callback
  • But how can we use Redis to fix our visitor counter?

Visitor Counter with Redis and Cluster

const cluster = require('cluster');
const http = require('http');
const redis = require('redis').createClient();

if (cluster.isMaster) {
  cluster.fork(); cluster.fork();
} else {
  http.createServer((req, res) => {
    redis.incr('counter', (error, data) => {
      res.end(JSON.stringify({counter: data, pid: process.pid}));
    });
  }).listen(9999);
}
  • GeoCities would be proud:
$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":2,"pid":20002}
$ curl http://localhost:9999
{"counter":3,"pid":20001}

Intermediate Use Case:

Distributed Job Scheduler

Job Scheduler using Callbacks

  • Naïve approach: Nest related Redis commands using callbacks
const redis = require('redis').createClient();
const JOBS = 'jobs'; // Sorted Set

redis.zadd(JOBS, Date.now() + 5 * 1000, 'email user 1');
redis.zadd(JOBS, Date.now() + 10 * 1000, 'email user 2');

setInterval(() => {
  let now = Date.now();
  redis.zrangebyscore(JOBS, 0, now, (err, jobList) => {          // get jobs until now
    redis.zremrangebyscore(JOBS, 0, now, (err) => {              // delete jobs until now
      console.log('jobs', jobList.length ? jobList : 'N/A', process.pid); // perform work
    });
  });
}, 1 * 1000);
  • ZRANGEBYSCORE: Get list of jobs scheduled until now
  • ZREMRANGEBYSCORE: Delete list of jobs scheduled until now

Job Scheduler Callback Output

  • Commands being issued to Redis (tracked via MONITOR)
1474247436.153554 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247436153"
1474247436.153999 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247436153"
1474247437.155540 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247437155"
1474247437.156171 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247437155"
1474247437.157580 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247437157"
1474247437.158185 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247437157"
1474247438.161422 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247438160"
1474247438.161558 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247438160"
1474247438.162285 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247438160"
1474247438.162373 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247438160"
1474247439.164502 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247439164"
1474247439.165080 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247439164"
  • Race Conditions in console output, note the duplicated work!
jobs N/A 20000
jobs N/A 20001
jobs N/A 20000
jobs ['email user 1'] 20000
jobs ['email user 1'] 20001
jobs N/A 20001

Job Scheduler using MULTI/EXEC

  • This time we wrap commands using a MULTI block:
const redis = require('redis').createClient();
const JOBS = 'jobs'; // Sorted Set

redis.zadd(JOBS, Date.now() + 5 * 1000, 'email user 1');
redis.zadd(JOBS, Date.now() + 10 * 1000, 'email user 2');

setInterval(() => {
  let now = Date.now();
  redis.multi()                             // Same concept as a DB transaction
    .zrangebyscore(JOBS, 0, now)            // get jobs until now
    .zremrangebyscore(JOBS, 0, now)         // delete jobs until now
    .exec((error, data) => {
      let jobList = data[0];
      console.log('jobs', jobList.length ? jobList : 'N/A', process.pid); // perform work
    });
}, 1 * 1000);
  • Now the two Redis commands will be run atomically
  • No other processes can issue commands in-between

Job Scheduler MULTI/EXEC Output

  • Commands being issued to Redis (tracked via MONITOR)
1474250213.374094 [0 127.0.0.1:20000] "multi"
1474250213.374140 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474250213373"
1474250213.374174 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474250213373"
1474250213.374200 [0 127.0.0.1:20000] "exec"
1474250213.377766 [0 127.0.0.1:20001] "multi"
1474250213.377821 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474250213377"
1474250213.377872 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474250213377"
1474250213.377899 [0 127.0.0.1:20001] "exec"
1474250214.380577 [0 127.0.0.1:20000] "multi"
1474250214.380623 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474250214380"
1474250214.380657 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474250214380"
1474250214.380682 [0 127.0.0.1:20000] "exec"
  • Console output, duplicated work is now impossible
jobs N/A 20000
jobs ['email user 1'] 20001
jobs N/A 20000

Advanced Use Case:

GeoLocation City Lookup by Distance

Logic in Redis with SCRIPT/EVALSHA

  • Unlike MULTI, command output can be used for command input
  • SCRIPT LOAD accepts a Lua script and returns the SHA1 hash
  • EVALSHA executes a Lua script, accepting hash and arguments
  • Use a library like lured to manage scripts, SHA1 hashes
  • Need to specify all affected keys as arguments
  • Think of it as a DB Stored Procedure
-- get-cities.lua: Find cities within 10km of query

local key_geo = KEYS[1]
local key_hash = KEYS[2]

local longitude = ARGV[1]
local latitude = ARGV[2]

local city_ids = redis.call('GEORADIUS', key_geo, longitude, latitude, 10, 'km')
return redis.call('HMGET', key_hash, unpack(city_ids))

Logic in Redis with SCRIPT/EVALSHA

const redis = require('redis').createClient();
const fs = require('fs');
const GEO = 'geo-city-locations', HASH = 'hash-city-data';
let lua = {
  find: {
    script: fs.readFileSync(`${__dirname}/get-cities.lua`, {encoding: 'utf8'})
  }
};
const lured = require('lured').create(redis, lua);

redis.geoadd(GEO, -122.419103, 37.777068, 'san-francisco');
redis.hset(HASH, 'san-francisco', JSON.stringify({name: 'San Francisco', temp: 65}));

redis.geoadd(GEO, -122.272938, 37.807235, 'oakland');
redis.hset(HASH, 'oakland', JSON.stringify({name: 'Oakland', temp: 72}));

const BERKELEY = {lon: -122.273412, lat: 37.869103};
lured.load((err) => {
  redis.evalsha(lua.find.sha, 2, GEO, HASH, BERKELEY.lon, BERKELEY.lat, (e, data) => {
    console.log('cities near Berkeley', data.map(JSON.parse));
    // [ {name:"Oakland",temp:72} ]
  });
});

Conclusion

  1. Redis offers several useful data structures
  2. Keep state synchronized across multiple Node processes
  3. Use MULTI/EXEC for independent atomic operations
  4. Use SCRIPT/EVALSHA for chainable atomic operation

Follow me a@tlhunter@mastodon.social

Distributed Systems with Node.js: bit.ly/34SHToF