@tlhunter


Friends don't let Friends Cluster

Thomas Hunter II


Adapted from Distributed Systems with Node.js:
bit.ly/34SHToF

Situation: You're Building a Web Service

Problem: One Process isn't Enough

  • A single process can get overwhelmed
  • CPU intensive work can block the event loop
  • App might have high memory requirement
  • Node.js processes can crash

Solution: Run more than one Process

  • Also known as Horizontal Scaling
  • In theory, two processes can handle twice the load
  • If one process crashes, others may survive

What tools are available
to run multiple processes?

Introducing cluster

cluster makes Scaling Easy

const cluster = require('cluster');
const http = require('http');
const LUCKY_NUMBER = Math.random();

if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);
  cluster.fork(); cluster.fork(); // Assumes 2 CPUs
} else {
  http.createServer((_req, res) => {
    res.writeHead(200);
    res.end(`Hello World! ${LUCKY_NUMBER}`);
  }).listen(3000);
  console.log(`Worker ${process.pid} started`);
}

cluster makes Scaling too Easy

  • The docs make it look like a single global
  • Is the LUCKY_NUMBER variable shared?
  • Does the master process use the http module?
  • How does this thing actually work?!

A Clearer cluster Example

// master.js
const cluster = require('cluster');
console.log(`Master ${process.pid} is running`);
cluster.setupMaster({exec: __dirname+'/worker.js'});
cluster.fork(); cluster.fork();
// worker.js
const LUCKY_NUMBER = Math.random();
require('http').createServer((_req, res) => {
  res.writeHead(200);
  res.end(`Hello World! ${LUCKY_NUMBER}`);
}).listen(3000);
console.log(`Worker ${process.pid} started`);

cluster uses N+1 Node.js Processes

  • Each Worker tells Master what port(s) to listen on
  • Master sends messages, round robin, to workers
  • If listen on 0 (random high port) it's consistent

cluster has its Shortcomings

  • Master and Workers must be on the same machine!
    • Still left with RAM and CPU contention
  • Requires 3 Node.js instances instead of 2
    • In practice, many apps load all modules in Master
  • Layer 4 routing, doesn't route long-lived requests
    • Routes TCP connection, unaware of “messages”
    • For example, gRPC over HTTP2
    • All calls from one client are sent to same Worker

Introducing the Reverse Proxy

A Reverse Proxy can...

  • Proxy incoming requests to backend services
  • Conditionally route to different backends
  • Perform encoding duties (TLS, gzip)
  • Sanitize requests, sticky sessions, modify headers

HAProxy is a Reverse Proxy

  • Routes using round robin by default
  • Configuration is declarative
  • HAProxy is very efficient, event loop
  • 13MB binary written in C

HAProxy Configuration

  • Route incoming requests to two backends
defaults
  mode http

frontend inbound
  bind localhost:3000
  default_backend web-api

backend web-api
  server web-api-1 localhost:3001
  server web-api-2 localhost:3002

HAProxy Configuration

  • Route incoming requests to two backends

Why Prefer HAProxy over cluster?

  • Able to route requests between separate machines
  • Smaller resource footprint
  • Offload complexity to code you don't maintain!
    • Harder to crash than a Node.js application
    • Simplifies application (no gzip, TLS, certs)
  • Is both Layer 4 and Layer 7 aware

Performance Comparison

Testing Approach

  • Each test puts HAProxy in front of Node.js
  • HAProxy introduces latency via network hop
  • Perform encoding in either HAProxy or Node.js
  • Fastify, routing, JSON, not just http module

Node.js v13.6.0 | HAProxy v2.1.0 | fastify v2.12.0 | fastify-compress v2.0.1 | bit.ly/39AgMxS

HAProxy vs Node.js: TLS Termination

  • 17kb payload: HAProxy is ~13% higher throughput

Node.js v13.6.0 | HAProxy v2.1.0 | fastify v2.12.0 | fastify-compress v2.0.1 | bit.ly/39AgMxS

HAProxy vs Node.js: TLS Termination

  • 170kb payload: HAProxy is ~18% higher throughput

Node.js v13.6.0 | HAProxy v2.1.0 | fastify v2.12.0 | fastify-compress v2.0.1 | bit.ly/39AgMxS

HAProxy vs Node.js: gzip Compression

  • 17kb payload: HAProxy is ~601% higher throughput

Node.js v13.6.0 | HAProxy v2.1.0 | fastify v2.12.0 | fastify-compress v2.0.1 | bit.ly/39AgMxS

HAProxy vs Node.js: gzip Compression

  • 170kb payload: HAProxy is ~407% higher throughput

Node.js v13.6.0 | HAProxy v2.1.0 | fastify v2.12.0 | fastify-compress v2.0.1 | bit.ly/39AgMxS

Fin