Why you should avoid ORMs (with examples in Node.js)

14 min read Oct 31, 2018

Visit the original version of this article on Log Rocket.

In this post, we'll consider several reasons why you should avoid using ORM (Object Relational Mapping) in your project. While the concepts discussed in this post are applicable to every language and platform, the code examples will be written in Node.js-flavored JavaScript and we will be considering packages obtained from the npm repository.

First and foremost: I don't intend to diss any of the modules mentioned in this post. A lot of hard work has gone into each and every one of them. They are used by production applications all around the world which merrily respond to plenty of requests every day. I've also deployed applications using ORMs and regret nothing.

Follow Along

ORMs are powerful tools. The ORMs we'll be examining in this post are able to communicate with SQL backends such as SQLite, PostgreSQL, MySQL, and MSSQL. The examples in this post will make use of PostgreSQL, which is a very powerful open source SQL server. There are ORMs capable of communicating with NoSQL backends, such as the Mongoose ORM backed by MongoDB, but we won't be considering those in this post.

First, run the following commands to start an instance of PostgreSQL locally. It will be configured in such a way that requests made to the default PostgreSQL port on localhost:5432 will be forwarded to the container. It'll also write the files to disk in your home directory so that subsequent instantiations will retain the data we're already created.

mkdir -p ~/data/pg-node-orms
docker run \
  --name pg-node-orms \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=hunter12 \
  -e POSTGRES_USER=orm-user \
  -e POSTGRES_DB=orm-db \
  -v ~/data/pg-node-orms:/var/lib/postgresql/data \
  -d \
  postgres

Now that you've got a database running we need to add some tables and data to the database. This will allow us to query against the data and get a better understanding of the various layers of abstraction. Run the next command to start an interactive PostgreSQL prompt:

docker run \
  -it --rm \
  --link pg-node-orms:postgres \
  postgres \
  psql \
  -h postgres \
  -U orm-user \
  orm-db

At the prompt type in the password from the previous code block, hunter12. Now that you're connected, copy and paste the following queries into the prompt and press enter.

CREATE TYPE item_type AS ENUM (
  'meat', 'veg', 'spice', 'dairy', 'oil'
);

CREATE TABLE item (
  id    SERIAL PRIMARY KEY,
  name  VARCHAR(64) NOT NULL,
  type  item_type
);

CREATE INDEX ON item (type);

INSERT INTO item VALUES
  (1, 'Chicken', 'meat'), (2, 'Garlic', 'veg'), (3, 'Ginger', 'veg'),
  (4, 'Garam Masala', 'spice'), (5, 'Turmeric', 'spice'),
  (6, 'Cumin', 'spice'), (7, 'Ground Chili', 'spice'),
  (8, 'Onion', 'veg'), (9, 'Coriander', 'spice'), (10, 'Tomato', 'veg'),
  (11, 'Cream', 'dairy'), (12, 'Paneer', 'dairy'), (13, 'Peas', 'veg'),
  (14, 'Ghee', 'oil'), (15, 'Cinnamon', 'spice');

CREATE TABLE dish (
  id     SERIAL PRIMARY KEY,
  name   VARCHAR(64) NOT NULL,
  veg    BOOLEAN NOT NULL
);

CREATE INDEX ON dish (veg);

INSERT INTO dish VALUES
  (1, 'Chicken Tikka Masala', false), (2, 'Matar Paneer', true);

CREATE TABLE ingredient (
  dish_id   INTEGER NOT NULL REFERENCES dish (id),
  item_id   INTEGER NOT NULL REFERENCES item (id),
  quantity  FLOAT DEFAULT 1,
  unit      VARCHAR(32) NOT NULL
);

INSERT INTO ingredient VALUES
  (1, 1, 1, 'whole breast'), (1, 2, 1.5, 'tbsp'), (1, 3, 1, 'tbsp'),
  (1, 4, 2, 'tsp'), (1, 5, 1, 'tsp'),
  (1, 6, 1, 'tsp'), (1, 7, 1, 'tsp'), (1, 8, 1, 'whole'),
  (1, 9, 1, 'tsp'), (1, 10, 2, 'whole'), (1, 11, 1.25, 'cup'),
  (2, 2, 3, 'cloves'), (2, 3, 0.5, 'inch piece'), (2, 13, 1, 'cup'),
  (2, 6, 0.5, 'tsp'), (2, 5, 0.25, 'tsp'), (2, 7, 0.5, 'tsp'),
  (2, 4, 0.5, 'tsp'), (2, 11, 1, 'tbsp'), (2, 14, 2, 'tbsp'),
  (2, 10, 3, 'whole'), (2, 8, 1, 'whole'), (2, 15, 0.5, 'inch stick');

You now have a populated database. You can type \quit to disconnect from the psql client and get control of your terminal back. If you ever want to run raw SQL commands again you can run that same docker run command again.

Finally, you'll also need to create a file named connection.json containing the following JSON structure. This will be used by the Node applications later to connect to the database.

{
  "host": "localhost",
  "port": 5432,
  "database": "orm-db",
  "user": "orm-user",
  "password": "hunter12"
}

Layers of Abstraction

Before diving into too much code let's clarify a few different layers of abstraction. Just like everything in computer science, there are tradeoffs as we add layers of abstraction. With each added layer of abstraction we attempt to trade a decrease in performance with an increase in developer productivity (though this is not always the case).

Low Level: Database Driver

This is basically as low-level as you can get-short of manually generating TCP packets and delivering them to the database. A database driver is going to handle connecting to a database (and sometimes connection pooling). At this level you're going to be writing raw SQL strings and delivering them to a database, and receiving a response from the database. In the Node.js ecosystem there are many libraries operating at this layer. Here are three popular libraries:

mysql: MySQL (13k stars / 330k weekly downloads)
pg: PostgreSQL (6k stars / 520k weekly downloads)
sqlite3: SQLite (3k stars / 120k weekly downloads)

Each of these libraries essentially works the same way: take the database credentials, instantiate a new database instance, connect to the database, and send it queries in the form of a string and asynchronously handle the result. Here is a simple example using the pg module to get a list of ingredients required to cook Chicken Tikka Masala:

#!/usr/bin/env node

// $ npm install pg

const { Client } = require('pg');
const connection = require('./connection.json');
const client = new Client(connection);

client.connect();

const query = `SELECT
  ingredient.*, item.name AS item_name, item.type AS item_type
FROM
  ingredient
LEFT JOIN
  item ON item.id = ingredient.item_id
WHERE
  ingredient.dish_id = $1`;

client
  .query(query, [1])
  .then(res => {
    console.log('Ingredients:');
    for (let row of res.rows) {
      console.log(`${row.item_name}: ${row.quantity} ${row.unit}`);
    }

    client.end();
});

Middle Level: Query Builder

This is the intermediary level between using the simpler Database Driver module vs a full-fledged ORM. The most notable module which operates at this layer is Knex. This module is able to generate queries for a few different SQL dialects. This module depends on one of the aforementioned libraries-you'll need to install the particular ones you plan on using with Knex.

knex: Query Builder (8k stars / 170k weekly downloads)

When creating a Knex instance you provide the connection details, along with the dialect you plan on using and are then able to start making queries. The queries you write will closely resemble the underlying SQL queries. One nicety is that you're able to programmatically generate dynamic queries in a much more convenient way than if you were to concatenate strings together to form SQL (which often introduces security vulnerabilities).

Here is a simple example using the knex module to get a list of ingredients required to cook Chicken Tikka Masala:

#!/usr/bin/env node

// $ npm install pg knex

const knex = require('knex');
const connection = require('./connection.json');
const client = knex({
  client: 'pg',
  connection
});

client
  .select([
    '*',
    client.ref('item.name').as('item_name'),
    client.ref('item.type').as('item_type'),
  ])
  .from('ingredient')
  .leftJoin('item', 'item.id', 'ingredient.item_id')
  .where('dish_id', '=', 1)
  .debug()
  .then(rows => {
    console.log('Ingredients:');
    for (let row of rows) {
      console.log(`${row.item_name}: ${row.quantity} ${row.unit}`);
    }

    client.destroy();
});

High Level: ORM

This is the highest level of abstraction we're going to consider. When working with ORMs we typically need to do a lot more configuration ahead of time. The point of an ORM, as the name implies, is to map a record in a relational database to an object (typically, but not always, a class instance) in our application. What this means is that we're defining the structure of these objects, as well as their relationships, in our application code.

sequelize: (16k stars / 270k weekly downloads)
bookshelf: Knex based (5k stars / 23k weekly downloads)
waterline: (5k stars / 20k weekly downloads)
objection: Knex based (3k stars / 20k weekly downloads)

In this example, we're going to look at the most popular of the ORMs, Sequelize. We're also going to model the relationships represented in our original PostgreSQL schema using Sequelize. Here is the same example using the Sequelize module to get a list of ingredients required to cook Chicken Tikka Masala:

#!/usr/bin/env node

// $ npm install sequelize pg

const Sequelize = require('sequelize');
const connection = require('./connection.json');
const DISABLE_SEQUELIZE_DEFAULTS = {
  timestamps: false,
  freezeTableName: true,
};

const { DataTypes } = Sequelize;
const sequelize = new Sequelize({
  database: connection.database,
  username: connection.user,
  host: connection.host,
  port: connection.port,
  password: connection.password,
  dialect: 'postgres',
  operatorsAliases: false
});

const Dish = sequelize.define('dish', {
  id: { type: DataTypes.INTEGER, primaryKey: true, autoIncrement: true },
  name: { type: DataTypes.STRING },
  veg: { type: DataTypes.BOOLEAN }
}, DISABLE_SEQUELIZE_DEFAULTS);

const Item = sequelize.define('item', {
  id: { type: DataTypes.INTEGER, primaryKey: true, autoIncrement: true },
  name: { type: DataTypes.STRING },
  type: { type: DataTypes.STRING }
}, DISABLE_SEQUELIZE_DEFAULTS);

const Ingredient = sequelize.define('ingredient', {
  dish_id: { type: DataTypes.INTEGER, primaryKey: true },
  item_id: { type: DataTypes.INTEGER, primaryKey: true },
  quantity: { type: DataTypes.FLOAT },
  unit: { type: DataTypes.STRING }
}, DISABLE_SEQUELIZE_DEFAULTS);

Item.belongsToMany(Dish, {
  through: Ingredient, foreignKey: 'item_id'
});

Dish.belongsToMany(Item, {
  through: Ingredient, foreignKey: 'dish_id'
});

Dish.findOne({where: {id: 1}, include: [{model: Item}]}).then(rows => {
  console.log('Ingredients:');
  for (let row of rows.items) {
    console.log(
      `${row.dataValues.name}: ${row.ingredient.dataValues.quantity} ` +
      row.ingredient.dataValues.unit
    );
  }

  sequelize.close();
});

Now that you've seen an example of how to perform similar queries using the different abstraction layers let's dive into the reasons you should be wary of using an ORM.

Reason 1: You're learning the wrong thing

A lot of people pick up an ORM because they don't want to take the time to learn the underlying SQL (Structured Query Language). The belief often being that SQL is hard to learn and that by learning an ORM we can simply write our applications using a single language instead of two. At first glance, this seems to hold up. An ORM is going to be written in the same language as the rest of the application, while SQL is a completely different syntax.

There is a problem with this line of thinking, however. The problem is that ORMs represent some of the most complex libraries you can get your hands on. The surface area of an ORM is very large and learning it inside and out is no easy task.

Once you have learned a particular ORM this knowledge likely won't transfer that well. This is true if you switch from one platform to another, such as JS/Node.js to C#/.NET. But perhaps even less obvious is that this is true if you switch from one ORM to another within the same platform, such as Sequelize to Bookshelf with Node.js. Consider the following ORM examples which each generate a list of all recipe items which are vegetarian:

Sequelize:

#!/usr/bin/env node

// $ npm install sequelize pg

const Sequelize = require('sequelize');
const { Op, DataTypes } = Sequelize;
const connection = require('./connection.json');
const DISABLE_SEQUELIZE_DEFAULTS = {
  timestamps: false,
  freezeTableName: true,
};

const sequelize = new Sequelize({
  database: connection.database,
  username: connection.user,
  host: connection.host,
  port: connection.port,
  password: connection.password,
  dialect: 'postgres',
  operatorsAliases: false
});

const Item = sequelize.define('item', {
  id: { type: DataTypes.INTEGER, primaryKey: true, autoIncrement: true },
  name: { type: DataTypes.STRING },
  type: { type: DataTypes.STRING }
}, DISABLE_SEQUELIZE_DEFAULTS);

// SELECT "id", "name", "type" FROM "item" AS "item"
//     WHERE "item"."type" = 'veg';
Item
  .findAll({where: {type: 'veg'}})
  .then(rows => {
    console.log('Veggies:');
    for (let row of rows) {
      console.log(`${row.dataValues.id}\t${row.dataValues.name}`);
    }
    sequelize.close();
});

Bookshelf:

#!/usr/bin/env node

// $ npm install bookshelf knex pg

const connection = require('./connection.json');
const knex = require('knex')({
  client: 'pg',
  connection,
  // debug: true
});
const bookshelf = require('bookshelf')(knex);

const Item = bookshelf.Model.extend({
  tableName: 'item'
});

// select "item".* from "item" where "type" = ?
Item
  .where('type', 'veg')
  .fetchAll()
  .then(result => {
    console.log('Veggies:');
    for (let row of result.models) {
      console.log(`${row.attributes.id}\t${row.attributes.name}`);
    }
    knex.destroy();
});

Waterline:

#!/usr/bin/env node

// $ npm install sails-postgresql waterline

const pgAdapter = require('sails-postgresql');
const Waterline = require('waterline');
const waterline = new Waterline();
const connection = require('./connection.json');

const itemCollection = Waterline.Collection.extend({
  identity: 'item',
  datastore: 'default',
  primaryKey: 'id',
  attributes: {
    id: { type: 'number', autoMigrations: {autoIncrement: true} },
    name: { type: 'string', required: true },
    type: { type: 'string', required: true },
  }
});

waterline.registerModel(itemCollection);

const config = {
  adapters: {
    'pg': pgAdapter
  },

  datastores: {
    default: {
      adapter: 'pg',
      host: connection.host,
      port: connection.port,
      database: connection.database,
      user: connection.user,
      password: connection.password
    }
  }
};

waterline.initialize(config, (err, ontology) => {
  const Item = ontology.collections.item;
  // select "id", "name", "type" from "public"."item"
  //     where "type" = $1 limit 9007199254740991
  Item
    .find({ type: 'veg' })
    .then(rows => {
      console.log('Veggies:');
      for (let row of rows) {
        console.log(`${row.id}\t${row.name}`);
      }
      Waterline.stop(waterline, () => {});
    });
});

Objection:

#!/usr/bin/env node

// $ npm install knex objection pg

const connection = require('./connection.json');
const knex = require('knex')({
  client: 'pg',
  connection,
  // debug: true
});
const { Model } = require('objection');

Model.knex(knex);

class Item extends Model {
  static get tableName() {
    return 'item';
  }
}

// select "item".* from "item" where "type" = ?
Item
  .query()
  .where('type', '=', 'veg')
  .then(rows => {
    for (let row of rows) {
      console.log(`${row.id}\t${row.name}`);
    }
    knex.destroy();
});

The syntax for a simple read operation varies greatly between these examples. As the operation you're trying to perform increases in complexity, such as operations involving multiple tables, the ORM syntax will vary from between implementations even more.

There are at least dozens of ORMs for Node.js alone, and at least hundreds of ORMs for all platforms. Learning all of those tools would be a nightmare!

Lucky for us, there are really only a few SQL dialects to worry about. By learning how to generate queries using raw SQL you can easily transfer this knowledge between different platforms.

Reason 2: Complex ORM calls can be Inefficient

Recall that the purpose of an ORM is to take the underlying data stored in a database and map it into an object that we can interact within our application. This often comes with some inefficiencies when we use an ORM to fetch certain data.

Consider, for example, the queries we first looked at in the section on layers of abstraction. In that query, we simply wanted a list of ingredients and their quantities for a particular recipe. First we made the query by writing SQL by hand. Next, we made the query by using the Query Builder, Knex. Finally, we made a query by using the ORM, Sequelize. Let's take a look at the queries which have been generated by those three commands:

Hand-written with "pg" Driver:

This first query is exactly the one we wrote by hand. It represents the most succinct method to get exactly the data we want.

SELECT
  ingredient.*, item.name AS item_name, item.type AS item_type
FROM
  ingredient
LEFT JOIN
  item ON item.id = ingredient.item_id
WHERE
  ingredient.dish_id = ?;

When we prefix this query with EXPLAIN and send it to the PostgreSQL server, we get a cost operation of 34.12.

Generated with "knex" Query Builder:

This next query was mostly generated for us, but due to the explicit nature of the Knex Query Builder, we should have a pretty good expectation of what the output will look like.

select
  *, "item"."name" as "item_name", "item"."type" as "item_type"
from
  "ingredient"
left join
  "item" on "item"."id" = "ingredient"."item_id"
where
  "dish_id" = ?;

Newlines have been added by me for readability. Other than some minor formatting and unnecessary table names in my hand-written example, these queries are identical. In fact, once the EXPLAIN query is run, we get the same score of 34.12.

Generated with "Sequelize" ORM:

Now let's take a look at the query generated by an ORM:

SELECT
  "dish"."id", "dish"."name", "dish"."veg", "items"."id" AS "items.id",
  "items"."name" AS "items.name", "items"."type" AS "items.type",
  "items->ingredient"."dish_id" AS "items.ingredient.dish_id",
  "items->ingredient"."item_id" AS "items.ingredient.item_id",
  "items->ingredient"."quantity" AS "items.ingredient.quantity",
  "items->ingredient"."unit" AS "items.ingredient.unit"
FROM
  "dish" AS "dish"
LEFT OUTER JOIN (
  "ingredient" AS "items->ingredient"
  INNER JOIN
  "item" AS "items" ON "items"."id" = "items->ingredient"."item_id"
) ON "dish"."id" = "items->ingredient"."dish_id"
WHERE
  "dish"."id" = ?;

Newlines have been added by me for readability. As you can tell this query is a lot different from the previous two queries. Why is it behaving so differently? Well, due to the relationships we've defined, Sequelize is trying to get more information than what we've asked for. In particular, we're getting information about the Dish itself when we really only care about the Ingredients belonging to that Dish. The cost of this query, according to EXPLAIN, is 42.32.

Reason 3: An ORM Can't Do Everything

Not all queries can be represented as an ORM operation. When we need to generate these queries we have to fall back to generating the SQL query by hand. This often means a codebase with heavy ORM usage will still have a few handwritten queries strewn about it. The implications here are that, as a developer working on one of these projects, we end up needing to know BOTH the ORM syntax as well as some underlying SQL syntax.

A common situation which doesn't work too well with ORMs is when a query contains a subquery. Consider the situation where I know that I have already purchased all the ingredients for Dish #2 in our database, however, I still need to purchase whatever ingredients are needed for Dish #1. In order to get this list I might run the following query:

SELECT *
FROM item
WHERE
  id NOT IN
    (SELECT item_id FROM ingredient WHERE dish_id = 2)
  AND id IN
    (SELECT item_id FROM ingredient WHERE dish_id = 1);

To the best of my knowledge, this query cannot be cleanly represented using the aforementioned ORMs. To combat these situations it's common for an ORM to offer the ability to inject raw SQL into the query interface.

Sequelize offers a .query() method to execute raw SQL as if you were using the underlying database driver. With both the Bookshelf and Objection ORMs you get access to the raw Knex object which you provide during instantiation and can use that for its Query Builder powers. The Knex object also has a .raw() method to execute raw SQL. With Sequelize you also get a Sequelize.literal() method which can be used to intersperse raw SQL in various parts of a Sequelize ORM call. But in each of these situations, you still need to know some underlying SQL to generate certain queries.

Query Builders: The Sweet Spot

Using the low-level Database Driver modules is rather enticing. There is no overhead when generating a query for the database as we are manually writing the query. The overall dependencies our project relies upon is also minimized. However, generating dynamic queries can be very tedious, and in my opinion is the biggest drawback of using a simple Database Driver.

Consider, for example, a web interface where a user can select criteria when they want to retrieve items. If there is only a single option that a user can input, such as color, our query might look like the following:

SELECT * FROM things WHERE color = ?;

This single query works nicely with the simple Database Driver. However, consider if the color is optional and that there's a second optional field called is_heavy. We now need to support a few different permutations of this query:

SELECT * FROM things; -- Neither
SELECT * FROM things WHERE color = ?; -- Color only
SELECT * FROM things WHERE is_heavy = ?; -- Is Heavy only
SELECT * FROM things WHERE color = ? AND is_heavy = ?; -- Both

However, due to the aforementioned reasons, a fully featured ORM isn't the tool we want to reach for either.

The Query Builder ends up being a pretty nice tool in these situations. The interface exposed by Knex is so close to the underlying SQL query that we are forced to always know what the SQL query looks like. This relationship is similar to how something like TypeScript translates to JavaScript.

Using a Query Build is a fine solution as long as you fully understand the underlying SQL it is generating. Never use it as a tool to hide from what is happening at a lower layer. Only use it as a matter of convenience and in situations where you know exactly what it's doing. If you ever find yourself questioning what a generated query actually looks like you can add a debug field to the Knex() instantiation call. Doing so looks like this:

const knex = require('knex')({
  client: 'pg',
  connection,
  debug: true // Enable Query Debugging
});

In fact, most of the libraries mentioned in this post include some sort of method for debugging the calls being executed.

We've looked at three different layers of abstracting database interactions, namely the low-level Database Drivers, Query Builders, and the high-level ORMs. We've also examined the tradeoffs of using each layer as well as the SQL queries being generated: this includes the difficulty of generating dynamic queries with a Database Driver, the added complexity of ORMs, and finally the sweet spot of using a Query Generator.

Thank you for reading and be sure to take this into consideration when you build your next project.

Once you're done following along you may run the following commands to completely remove the docker container and remove the database files from your computer:

docker stop pg-node-orms
docker rm pg-node-orms
sudo rm -rf ~/data/pg-node-orms

Discussion:

Log Rocket

blog.logrocket.com

Hacker News

news.ycombinator.com

Reddit

/r/node

Reddit

/r/programming

Tags: #nodejs #database #guestblog

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.