Setting Open Graph Tags without Server Side Rendering

Multithreaded JavaScript has been published with O'Reilly!

I've been on a bit of a writing hiatus lately as I'm focusing all my efforts on Radar Chat, a mobile application for creating and communicating via custom maps. But, I wanted to write up this guide on one of the features I just finished implementing. Radar Chat is an application that is built using Vue. It can be referred to as a Single Page Application (SPA) and a Progressive Web App (PWA). This guide should work for you even if you're not using Vue.

Using a Single HTML File

Up until now, Radar Chat used the exact same HTML file when handling any request that represents an application route (e.g. not for an image). That said, the URL does matter as far as the application is concerned. The application code updates the URL when navigating using push state, and when a full page reload happens, the application maintains the same screen. The Vue router parses the URL when rendering the page to open the correct screen and handles all of the push state magic.

Configuration of the Vue Router is doing like so:

const router = createRouter({
  history: VueRouter.createWebHistory(),
  routes: { ... },
});

Here are some of the dozens of routes used by Radar Chat:

With traditional SPAs, the path in the URL wouldn't change. Instead, only the fragment would change. For example, with the URL https://example.com/#/bar, the path of the URL is /, and the URL fragment is #/bar. When updating the fragment and refreshing the page, the URL might change to https://example.com/#/blam. When this URL is first loaded by the browser it still sends a request to the server for the path /. The fragment is only used by the JavaScript running in the browser to do things like determine a logical application "path". In fact, Vue Router can do this too by using the following configuration:

const router = createRouter({
  history: VueRouter.createWebHashHistory(),
  routes: { ... },
});

I would recommend that you do not use this configuration. It will cause you headaches in the future, not to mention it's entirely incompatible with the rest of this document and you can't get Open Graph tags working.

To support full URLs, a server usually needs to be configured to support a "catch-all" route. This means that URLs with dynamic paths like https://example.com/bar are all captured and return some form of HTML that can be used to bootstrap the frontend application. Since I use Netlify, I'm using the following netlify.toml configuration file to support this:

[[redirects]]
  from = "/*"
  to = "/index.html"
  status = 200

Traditional servers like Apache and Nginx have their configuration equivalents.

This takes any URL that doesn't resolve to a file on disk and instead returns the contents of a file named index.html in the root of the project. So, regardless of if the browser requests /map/global or /settings, the same HTML file is returned.

But, therein lies the problem. When someone shares a link on an external service, such as Twitter, a crawler reaches out and makes a request for that URL to pull in metadata. When every HTML response is exactly the same the embedded links aren't going to captivate users and get them to click.

Dynamically Generated HTML

Normally, at this point the developer would choose to implement something called Server Side Rendering (SSR). This is essentially what PHP and Ruby applications have been doing for decades but with a slight twist. With SSR, the server replies with either a completely-formed or mostly-formed version of the webpage. To pull this off, there needs to be code that both runs on the server and on the client which is capable of performing application logic. There is some theoretical performance benefits of doing this too as the browser has a little less work to do once it gets the HTML. The twist is that the frontend framework is able to handle the rendered DOM without the need to re-render it.

Personally, I don't care all that much about rendering the application on the server. This kind of stuff has been referred to as isomorphic or universal JavaScript, depending on the era.

Since all I want to do is render some very basic HTML tags in the <head> of the document, implementing a complete SSR solution feels like overkill. Instead, something really basic that updates or otherwise injects the tags would be ideal. I'm also already hosting the frontend application (static content) on Netlify and ideally want to support this while doing as minimal work as possible, and also using the same repository. For Radar Chat, tons of effort goes into keeping API server performance smooth, but when it comes to social media tags, I'm willing to do something simpler.

Introducing Netlify Functions

Netlify Functions are essentially a wrapper around AWS Lambda. Under the hood it uses Node.js but the interface is a bit different than, say, an Express app. The code runs somewhere magically and you don't really need to care much about the underlying infrastructure. You're able to install packages from npm by including them in a package.json file in the root of the project.

Functions that you create end up having a proxy automatically created for them at a pre-determined URL. The netlify.toml file allows you to create rewrites so that a nice URL you choose can then be mapped to this Function proxy URL.

To support this redirect, I made the following change to my netlify.toml where my existing catch-all rule was configured:

[functions]
  directory = "./functions"

[[redirects]]
  from = "/test/*"
  to = "/.netlify/functions/content/:splat"
  status = 200

The first [functions] rule reconfigures the default functions directory to one named functions/ in the root of the project. This is just my preference. The second [[redirects]] rule, much like the previous one, only rewrites a request if it would otherwise result in a 404 error. In this case, the request is rewritten to a path which represents a proxy to the function.

All of the functions are available to the public at /.netlify/functions/. The next segment, content in this case, is the name of the function. Finally, the /:splat means that the incoming request path is provided to the function. This means that a request to /map/global is rewritten to /.netlify/functions/content/map/global. In translation, the content function will know that the original path request was for /map/global.

Also, this is configured for testing your application. URLs under the /test/* path will trigger the Netlify Function. When you're ready to use this in production, change the from clause to /* instead.

Next, it's time to create the function.

The "Content" Function Code

Your Function is going to need data from your backend server in some manner. It could be retrieved via SQL queries. In my case, the data is retrieved via an HTTP API. You'll need to modify the Function to suite your needs. Netlify does support secrets for things like DB credentials or API auth. This example assumes no secrets are required.

Sadly, the Function doesn't have access to the index.html file during the deployment. The code that runs on AWS Lambda is a subset of the code in your repository. This could probably be fixed using some sort of complex build step that copies the file contents into the Function file. In this example the Function will download the index.html file from the web server over HTTP.

Your Function will probably also want a cache. The index.html file should be cached for starters. The upstream data that is retrieved in order to generate the social media tags should also be cached. Who knows how often Twitter might request the exact same document when your app starts trending.

And finally, assuming you need to support several different routes like I do, your Function will need some sort of router. This part is kind of lame as you'll need to recreate some of your application routes within the function. This can lead to drift between the frontend and backend routing, a non-issue with true SSR. But, in my case, the URLs don't change often, and I only support a small subset of overall routes, so it's an acceptable risk for me.

To install the required packages you'll need to run the following:

$ npm install node-fetch lru-cache url-router

The Function file lives at functions/content.js. Even the stripped-down version that I'm providing in this post is pretty long so I'm going to go over small chunks of the file at a time. Here's the first chunk:

const fetch = require("node-fetch");
const Cache = require('lru-cache');
const Router = require('url-router').default;

const HTML_URL = 'https://app.radar.chat/index.html';
const API_ROOT = 'http://api.example.org';

const http_cache = new Cache({
  max: 100,
  ttl: 1000 * 60 * 1,
});

This first chunk requires the three packages, defines the location of the HTML file, and defines the root URL used by the API. After that a Least Recently Used (LRU) cache is defined using the lru-cache package. Essentially this is a cache that has been configured to only store 100 entries and each entry will expire within a minute. The cache is a simple key/value store. In this case the key is the URL, and the value is the response payload.

Note that this cache is on a per-Function-instance basis. For more information on how these Functions run, check out my other post Basic Node.js Lambda Function Concepts. Basically, when there are tons of requests happening at the same time, or ramping up heavily, other Functions will get instantiated and those will have empty caches. But, when requests are received at a slower or a very steady rate, the caches should all be utilized.

Here's the next chunk of the function, used for route configuration:

const ROUTE_POST = 'route-post';
// define other route constants here

const router = new Router({
  '/post/:map_name/:post_id': ROUTE_POST,
  // define other routes here
});

This instantiates the router using the url-router package and defines a single post. Routes are named using constants (in this case the only route is named ROUTE_POST). The router itself is configured using key value pairs where the key is the route and the value is the "handler". It could be a callback but in my case I'm just referencing the constant. This will be used later to determine how to handle a given request.

This next Function file chunk handles outbound request caching:

async function cacheFetch(url, json = false) {
  if (http_cache.has(url)) return http_cache.get(url);

  const request = await fetch(url);

  if (request.status >= 400) throw new Error(`unable to retrieve URL ${url}`);

  if (json) {
    const payload = await request.json();
    http_cache.set(url, payload);
    return payload;
  }

  const body = await request.text();
  http_cache.set(url, body);
  return body;
}

The cacheFetch() function takes two arguments. The first is the URL to be loaded, and the second is whether the response should be parsed as JSON. The function checks to see if the URL response is in the cache. If so, it serves the response. If not, it requests the file, parses the JSON if needed, puts it in the cache, and returns the response.

Note that if you load the same URL once with JSON and once without JSON, you'll get the wrong data type back. But you probably shouldn't be doing that anyway.

The next chunk of the file modifies the HTML document:

function setTitle(html, title) {
  return html
    .replace(/<title>.*<\/title>/, `<title>${title}</title>`)
    .replace('</head>', `<meta property="og:title" content="${title}"></head>`);
}

function setDesc(html, description) {
  description = description.replaceAll(/[<>"]/g, ''); // requires a polyfill

  reutrn html
    .replace(/<meta name="description" content="[^"]*"/, `<meta name="description" content="${description}"`)
    .replace('</head>', `<meta property="og:description" content="${description}"></head>`);
}

function setUrl(html, url) {
  return html.replace('</head>', `<meta property="og:url" content="${url}"></head>`);
}

function setImage(html, image_url) {
  return html.replace('</head>', `<meta property="og:image" content="${image_url}"></head>`);
}

Not my proudest moment, let me assure you.

These functions use regular expressions to modify the HTML document and to inject new HTML. If you are following along then you'll need to modify these regular expressions based on your HTML document. You may find that the build process for your site mangles the index.html file that you have on disk and that what is downloaded from the internet is different (download the file from your domain to your dev machine and check).

This introduces a tight coupling between the layout of your index.html file and this Function code. You may find that at some point the tags stop being overwritten properly. Be sure to create an acceptance test for this code to prevent it from happening.

Also note that the String#replaceAll() method is missing in the version of Node.js that is used by the Function. You'll need to paste a polyfill to support it.

A better way to do all of this would be to use a Node.js DOM parser, such as the jsdom package. Upon downloading the index.html file, parse the DOM, and store that in the cache. Then, when you want to do manipulations, use the DOM parsing library to manipulate the document, cache the result, and serve that as the response. It'll be slower but much safer.

Also note that calling setTitle() and setDesc() multiple times is not an idempotent action. Doing so will create duplicate tags in the DOM which may anger the social media gods.

For the final chunk we have the Function handler. This is the code that is executed once per every incoming request:

exports.handler = async ({ path }) => {
  if (path.startsWith('/test')) path = path.substr(5); // for testing

  let html = await cacheFetch(HTML_URL);

  const route = router.find(path);

  html = setUrl(html, `https://app.radar.chat${path}`);

  try {
    switch (route.handler) {
      // declare other routes here
      case ROUTE_POST:
        {
          const { map_name, post_id } = route.params;

          const data = await cacheFetch(`${API_ROOT}/getmappost/${map_name}/${post_id}`, true);

          html = setTitle(html, `A post in #${data.ch} by @${data.uname}`);
          html = setDesc(html, `A post by @${data.uname} in the #${data.ch} map.`);
        }
        break;
    }
  } catch (err) {
    console.error(err);
  }

  return {
    statusCode: 200,
    body: html,
    headers: {
      'Content-Type': 'text/html',
    },
  };
};

The philosophy for this project is that, at the very least, we can always comfortably fall back to serving up the existing index.html file without modifications. If something goes wrong, or if we don't know the route being requested, just return the HTML document without any of the Open Graph tags and let the frontend application run as usual.

To that end, almost everything is wrapped in a try/catch. Everything save for the initial request to download the index.html document. If that fails, we don't have anything to server to the user anyway, so we simply throw an error.

Within the try/catch is a switch that handles the routes. In here you create a separate case for each of the routers. In this example file there is a single route. The router extracts the named parameters from the path and those are provided in route.params. Next, the input variables are used to make an outbound API request. Once we get the response we call the appropriate functions to modify the HTML document.

After the try/catch is the code that returns the HTML document. If an error occurs, it'll return the HTML. If no routes are processed, it'll return the HTML. If a route is processed and fails half-way, it'll return HTML, though not all replacements might have happened. And if the route is process successfully, it'll return HTML.


And that's that. This is a simpler version of SSR that is only used to set metadata / Open Graph tags in the head of the document. Depending on your hosting setup you will likely find that an approach like this is simpler than implementing a complete SSR solution. It does have its drawbacks, but it's a quick and dirty way to get social media embeds working for an SPA.

If you have a moment, do me a favor and sign up for Radar Chat. It's a mobile app available in the Android Play Store and the iOS App Store. It's also available as a PWA. Radar Chat lets you communicate based on geolocation. You can create custom maps that other users can subscribe to and that you can embed in your own website. I'll be opening up the API at some point in the future and you can bet it'll have decent documentation.

Thomas Hunter II Avatar

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.