Node, Redis, and You!

Presented by @tlhunter

I'm writing a book on Microservices: bit.ly/2hlATo2

Beginner Use Case:

The Proverbial Visitor Counter

Simple Visitor Counter

  • Naïve approach: I can simply keep data in Node memory!
var http = require('http');

var counter = 0;
http.createServer(function(request, response) {
  response.end(JSON.stringify({counter: ++counter, pid: process.pid}));
}).listen(9999);
  • Counter happily increments every request:
$ curl http://localhost:9999
{"counter":1,"pid":20000}
$ curl http://localhost:9999
{"counter":2,"pid":20000}
$ curl http://localhost:9999
{"counter":3,"pid":20000}
  • Life is perfect, what could ever possibly go wrong?

Visitor Counter with Cluster

  • What happens when we want to scale to many processes?
var cluster = require('cluster');
var http = require('http');

if (cluster.isMaster) {
  for (var i = 0; i < 2; i++) { cluster.fork(); }
} else {
  var counter = 0;
  http.createServer(function(req, res) {
    res.end(JSON.stringify({counter: ++counter, pid: process.pid}));
  }).listen(9999);
}
  • Oh No! Two sets of data! What can we do to fix it?!
$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":1,"pid":20002}
$ curl http://localhost:9999
{"counter":2,"pid":20001}
  • But first, let's learn a little more about Redis.

Introduction to Redis

  • In-Memory Key/Value store with several data structure types
  • Create if not exist, destroy if empty philosophy
  • Much like Node, Redis is single-threaded
  • Communication is simple, one can telnet and issue commands
  • Authentication is disabled by default, bind on local interface
  • Can set per-key expiry, or use global Least Recently Used (LRU)
$ brew install redis # sudo apt-get install redis
$ redis-cli
127.0.0.1:6379> SET xyz 'Hello'
OK
127.0.0.1:6379> GET xyz
"Hello"
127.0.0.1:6379> DEL xyz
(integer) 1
127.0.0.1:6379> GET xyz
(nil)
127.0.0.1:6379> QUIT

Redis Data Types

  • Strings - Simple key/value pair
    • Can Increment/Decrement numeric strings
  • Lists - Ordered list of data (Linked List)
    • Push, Pop, Shift, Unshift, similar to JavaScript Array
  • Hashes - Key/Value pairs within a single Redis key
    • Set/Get properties, similar to a JavaScript object
  • Sets - Unordered collection of unique strings
    • Add/Remove items, similar to JavaScript ES2015 Set
  • Sorted Sets - Strings sorted by numeric score values
    • Values are unique, query based on ranges
  • GeoLocation - Collections of Lat/Lon to String pairs
    • Query items based on distance

Using Redis with Node

  • Install Redis NPM module:
$ npm install --save redis
  • Require and call Redis commands:
var redis = require('redis').createClient();

redis.set('hello', 'world', function(err) {
  if (err) { throw err; }
  redis.get('hello', function(err, data) {
    if (err) { throw err; }
    console.log('hello', data); // outputs 'hello world'
    redis.quit();
  });
});
  • Commands are queued, no need for connection callback
  • But how can we use Redis to fix our visitor counter?

Visitor Counter with Redis and Cluster

var cluster = require('cluster');
var http = require('http');
var redis = require('redis').createClient();

if (cluster.isMaster) {
  for (var i = 0; i < 2; i++) { cluster.fork(); }
} else {
  http.createServer(function(req, res) {
    redis.incr('counter', function(error, data) {
      res.end(JSON.stringify({counter: data, pid: process.pid}));
    });
  }).listen(9999);
}
  • GeoCities would be proud:
$ curl http://localhost:9999
{"counter":1,"pid":20001}
$ curl http://localhost:9999
{"counter":2,"pid":20002}
$ curl http://localhost:9999
{"counter":3,"pid":20001}

Intermediate Use Case:

Distributed Task Scheduler

Job Scheduler using Callbacks

  • Naïve approach: Nest related Redis commands using callbacks
var redis = require('redis').createClient();
var JOB_QUEUE = 'jobs';

redis.zadd(JOB_QUEUE, Date.now() + 5 * 1000, 'email user 1');
redis.zadd(JOB_QUEUE, Date.now() + 10 * 1000, 'email user 2');

setInterval(function() {
  var now = Date.now();
  redis.zrangebyscore(JOB_QUEUE, 0, now, function(err, jobList) { // get jobs until now
    redis.zremrangebyscore(JOB_QUEUE, 0, now, function(err) { // delete jobs until now
      console.log('jobs', jobList.length ? jobList : 'N/A', process.pid) // perform work
    });
  });
}, 1 * 1000);
  • ZRANGEBYSCORE: Get list of jobs scheduled until now
  • ZREMRANGEBYSCORE: Delete list of jobs scheduled until now

Job Scheduler Callback Output

  • Commands being issued to Redis (tracked via MONITOR)
1474247436.153554 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247436153"
1474247436.153999 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247436153"
1474247437.155540 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247437155"
1474247437.156171 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247437155"
1474247437.157580 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247437157"
1474247437.158185 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247437157"
1474247438.161422 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474247438160"
1474247438.161558 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247438160"
1474247438.162285 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474247438160"
1474247438.162373 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247438160"
1474247439.164502 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474247439164"
1474247439.165080 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474247439164"
  • Race Conditions in console output, note the duplicated work!
jobs N/A 20000
jobs N/A 20001
jobs N/A 20000
jobs ['email user 1'] 20000
jobs ['email user 1'] 20001
jobs N/A 20001

Job Scheduler using MULTI/EXEC

  • This time we wrap commands using a MULTI block:
var redis = require('redis').createClient();
var TASK_QUEUE = 'tasks';
redis.zadd(TASK_QUEUE, Date.now() + 5 * 1000, 'email user 1');
redis.zadd(TASK_QUEUE, Date.now() + 10 * 1000, 'email user 2');

setInterval(function() {
  var now = Date.now();
  redis.multi() // Same concept as a DB transaction
    .zrangebyscore(JOB_QUEUE, 0, now) // get jobs until now
    .zremrangebyscore(JOB_QUEUE, 0, now) // delete jobs until now
    .exec(function(error, data) {
      var jobList = data[0];
      console.log('jobs', jobList.length ? jobList : 'N/A', process.pid) // perform work
    });
}, 1 * 1000);
  • Now the two Redis commands will be run atomically
  • No other processes can issue commands in-between

Job Scheduler MULTI/EXEC Output

  • Commands being issued to Redis (tracked via MONITOR)
1474250213.374094 [0 127.0.0.1:20000] "multi"
1474250213.374140 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474250213373"
1474250213.374174 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474250213373"
1474250213.374200 [0 127.0.0.1:20000] "exec"
1474250213.377766 [0 127.0.0.1:20001] "multi"
1474250213.377821 [0 127.0.0.1:20001] "zrangebyscore" "jobs" "0" "1474250213377"
1474250213.377872 [0 127.0.0.1:20001] "zremrangebyscore" "jobs" "0" "1474250213377"
1474250213.377899 [0 127.0.0.1:20001] "exec"
1474250214.380577 [0 127.0.0.1:20000] "multi"
1474250214.380623 [0 127.0.0.1:20000] "zrangebyscore" "jobs" "0" "1474250214380"
1474250214.380657 [0 127.0.0.1:20000] "zremrangebyscore" "jobs" "0" "1474250214380"
1474250214.380682 [0 127.0.0.1:20000] "exec"
  • Console output, duplicated work is now impossible
jobs N/A 20000
jobs ['email user 1'] 20001
jobs N/A 20000

Advanced Use Case:

GeoLocation City Lookup by Distance

Logic in Redis with EVAL/EVALSHA

  • Unlike MULTI, command output can be used for command input
  • EVAL executes a Lua script, sending the entire script
  • EVALSHA executes a Lua script, sending just the script hash
  • Use a library like lured to manage scripts, SHA1 hashes
  • Need to specify all affected keys as arguments
  • Think of it as a DB Stored Procedure
-- get-cities.lua: Find cities within 10km of query

local key_geo = KEYS[1]
local key_data = KEYS[2]

local longitude = ARGV[1]
local latitude = ARGV[2]

local city_ids = redis.call('GEORADIUS', key_geo, longitude, latitude, 10, 'km')
return redis.call('HMGET', key_data, unpack(city_ids))

Logic in Redis with EVAL/EVALSHA

var redis = require('redis').createClient();
var fs = require('fs');
var CITY = 'geo-city', DATA = 'hash-city';
var lua = {
  find: {
    script: fs.readFileSync(__dirname + '/get-cities.lua', {encoding:'utf8'})
  }
};
var lured = require('lured').create(redis, lua);

redis.geoadd(CITY, -122.419103, 37.777068, 'san-francisco');
redis.hset(DATA, 'san-francisco', JSON.stringify({name: 'San Francisco', temp: 65}));

redis.geoadd(CITY, -122.272938, 37.807235, 'oakland');
redis.hset(DATA, 'oakland', JSON.stringify({name: 'Oakland', temp: 72}));

var BERKELEY = {lon: -122.273412, lat: 37.869103};
lured.load(function(err) {
  redis.evalsha(lua.find.sha, 2, CITY, DATA, BERKELEY.lon, BERKELEY.lat, function(e, data) {
    console.log('cities near Berkeley', data.map(JSON.parse));
    // [ {name:"Oakland",temp:72} ]
  });
});

Conclusion

  1. Redis supports many useful data structures and is fast to query
  2. Use it to keep data shared across multiple Node processes
  3. Use MULTI/EXEC for performing simple atomic operations
  4. Use EVAL/EVALSHA for input/output chainable atomic operation

Follow me at @tlhunter

I'm writing a book on Microservices: bit.ly/2hlATo2