Techblog

Migrations with Node.js and PostgreSQL

Von Raphael Pigulla
12. April 2021

Every project that persists data has to inevitably face the fact that the structure of that data is not static and needs to adapt. The reasons for this are manifold: business requirements change, entities grow too large and need to be broken apart or performance issues require additional indices or denormalization. Today we will be looking at how to tackle the problem of managing structural changes in the context of Node.js and PostgreSQL.

Managing changes to the database and executing them safely is essential – a botched migration can result in inconsistencies, loss of data or even bring down the entire system. For this reason, we establish the following hard constraints:

  • All or nothing
    Just like most business operations, a migration must be done in an all-or-nothing fashion: either the entire data set must be migrated, or no changes must be applied at all. While practically all relational databases are ACID-compliant and support transactions, not all do so for structural changes. For instance, the widely popular MySQL does not.
     
  • Rolling back
    Even if a migration was run successfully, it could still be necessary to quickly revert the changes made. Suppose that a migration introduced a database constraint that does not properly align with the business requirements or the assumptions made in other parts in the code, resulting in users not being able to actually save some records. Restoring the database to a recent backup is a risky option as it will probably cause the loss of data unrelated to the issue at hand. Instead, a down migration that cleanly reverses the applied changes is preferable.
     
  • Infrastructure as Code
    The structure of the database, i.e. the tables, columns, foreign keys, constraints and so on, is very closely coupled to the application itself. Even if object-relational mapping (ORM) is employed, and the persistence implementation is cleanly decoupled from the rest of the code base, a structural change to the database will typically require at least some small change to the application itself or its configuration. For this reason, the migrations should be part of the code base so that the (desired) state of the database always aligns with the implementation for any point in time.

 

The Weapon of Choice

Using an ORM like Sequelize usually gets us a migration framework for free, and many Database Abstraction Layers (DBAL) or query builders like knex also come with migration support baked in. However, unless supporting multiple databases is actually a requirement or desired feature for the application, doing so will slow down development for little tangible benefit. Switching databases halfway through a project rarely happens, and when it does it never is quite as seamless as the marketing slides of the abstraction layers make us believe.

Consequently, in environments where support for or switching between databases is not an immediate concern, using non-generic tools leads to fewer layers of abstraction. This, in turn, results in code that is easier to understand and simpler to debug. On top of that, database-specific features, which typically require falling back to raw SQL when using a DBAL, are much less awkward to use.

In our example we will be using PostgreSQL which is not only a very mature RDBMS but also widely available as a managed or even serverless solution on all major cloud platforms. Needless to say, it satisfies all the previously defined constraints, making it a rock-solid choice for our use case. On the software side of things, we go with pg-migrate which integrates well with our fictive TypeScript application.

 

Setting up pg-migrate

After installing the library, we must first configure how it connects to the database. This can be done in a variety of ways, such as defining a DATABASE_URL environment variable, using node-config or dotenv. While pg-migrate has built-in support for the latter two, they must be installed explicitly - if this is omitted, the library will not complain but just silently ignore the corresponding files.

Going forward, we will be using node-config and a PostgreSQL Docker container with its default settings and postgres as the password:

config/default.json

  1. {
  2.     "db": {
  3.         "url": "postgres://postgres:postgres@localhost:5432/postgres",
  4.         "tsconfig": "./tsconfig.json",
  5.         "migration-filename-format": "utc"
  6.     }
  7. }

The main ingredient here is the database URL (in a real-world scenario we wouldn't store credentials in a plain-text config file, but that is a different story). Luckily, pg-migrate comes with support for TypeScript right out of the box, all we have to do is point it to the right configuration file. Finally, merely as a cosmetic change, we want migration files to be prefixed with human-readable date and time string instead of the default Unix timestamp.

 

Getting started

At its core, the way migrations work is pretty straight forward:

  • each migration is a file in the /migrations directory which exports an up() and optionally a down() function
  • a special pgmigrations table in the database (which is created automatically) keeps track of which migrations were already applied
  • executing or reverting migrations simply means sequentially invoking the required up() and down() methods, respectively

Behind the scenes, pg-migrate takes care of enumerating and loading the migration files, detecting which of them (if any) need to run, and wrapping everything in a transaction. Once the set up is done, writing migrations becomes a very easy task.

Our application requires a simple user table, so let's set this up by creating our first migration:

$ ./node_modules/.bin/node-pg-migrate create user-table

Note that all this does is create a scaffold for our migration, so we should see a new file in the /migrations folder. The user-table argument is only used for generating the file name, and we can change it as desired until the migration is actually run. As a matter of fact, at this point pg-migrate has not yet talked to the database at all!

After removing some cruft, our migration file will look something like this:

  1. import { MigrationBuilder } from 'node-pg-migrate';
  2.  
  3. export async function up(pgm: MigrationBuilder): Promise<void> {
  4. }

The scaffolded down() function was intentionally removed for reasons that will be explained in a minute. As we can see, everything revolves around the MigrationBuilder that the library provides to us. There is a lot that can be done with it but let's start with something very basic:

  1. export async function up(pgm: MigrationBuilder): Promise<void> {
  2.     pgm.createTable('users', {
  3.         id: 'id',
  4.         email: { type: 'string', notNull: true, unique: true },
  5.         date_of_birth: { type: 'date' },
  6.     })
  7. }

This creates a new users table with three columns. The value id is a convenient shortcut for creating a primary key column of type serial. Not explicitly setting notNull to true makes date_of_birth nullable by default. It is important to understand that the command only instructs the library to create a table, it does not actually execute the corresponding statement directly (which is why we don't have to await it).

Now let's fire up a Docker container and test the migration:

  1. $ docker run --detach --env POSTGRES_PASSWORD=postgres --publish 5432:5432 postgres
  2. $ ./node_modules/.bin/node-pg-migrate up
  3. > Migrating files:
  4. > - 20210309072213194_user-table
  5. ### MIGRATION 20210309072213194_user-table (UP) ###
  6. CREATE TABLE "users" (
  7. "id" serial PRIMARY KEY,
  8. "email" text UNIQUE NOT NULL,
  9. "date_of_birth" date
  10. );
  11. INSERT INTO "public"."pgmigrations" (name, run_on) VALUES ('20210309072213194_user-table', NOW());
  12.  
  13. Migrations complete!

The up command runs all pending migrations, which in our case is only one. If we try to run it again, pg-migrate will detect that it has already been applied and do nothing:

  1. $ ./node_modules/.bin/node-pg-migrate up
  2. No migrations to run!
  3. Migrations complete!

 

What goes up...

Let's assume that after deploying everything to production, the business team decides that they want to enable users to optionally add their real name to their profiles. As far as migrations are concerned, all we need to do is create a new one:

$ ./node_modules/.bin/node-pg-migrate create add-user-name

And implement the change:

  1. export async function up(pgm: MigrationBuilder): Promise<void> {
  2.     pgm.addColumn('users', {
  3.         name: { type: 'string' },
  4.     })
  5. }

After applying the new migration - just as we did previously -, we now have an additional name column that we can use. Great success!

 

...must come down

When testing the new feature on the staging environment, the business team realized that it would be much better to store given and family name separately. As it turns, this is actually not trivial so the entire feature was scrapped for the time being. Luckily, rolling this back in the database is easy as pie:

  1. $ ./node_modules/.bin/node-pg-migrate down
  2. > Migrating files:
  3. > - 20210312130143080_add-user-name
  4. ### MIGRATION 20210312130143080_add-user-name (DOWN) ###
  5. ALTER TABLE "users"
  6. DROP "name";
  7. DELETE FROM "public"."pgmigrations" WHERE name='20210312130143080_add-user-name';
  8.  
  9.  
  10. Migrations complete!

Note that while up runs all pending migrations, down only reverts the most recent one.

But wait a minute - we never implemented the down() function! What is happening? As it turns out, many methods like pgm.createTable() or pgm.addColumn() "know" how to undo their own changes which enables pg-migrate to conveniently generate the reverse migration for us. Unfortunately, many of the more frequently used functions, most prominently pgm.alterColumn() and pgm.alterTable(), do not fall under that category. Attempting to migrate down will fail, it is in these circumstances that one has to manually implement the down() function:

  1. export async function up(pgm: MigrationBuilder): Promise<void> {
  2.     pgm.alterColumn('users', 'name', { type:'varchar(255)' })
  3. }
  4.  
  5. export async function down(pgm: MigrationBuilder): Promise<void> {
  6.     pgm.alterColumn('users', 'name', { type: 'text' })
  7. }

A special case here is pgm.addConstraint() which does not support inferring the down migration if we rely on the auto-generated constraint name, but will work just fine if we don't. There are also some features that are not yet supported by the MigrationBuilder (such as granting permissions or inserting, deleting or updating rows) in which case we have to resort to raw SQL statements with pgm.sql(). Executing arbitrary SQL obviously means that it is impossible for the library to automatically infer the down migration, so it will again throw an error and tell us as much if we ask it to do so. As a side note, pg-migrate does not come with support for escaping so using an additional library such as pg-escape is recommended.

Sometimes it is plain impossible to undo a migration, for instance if you drop a column or a table and there is no way to reconstruct the data once it is gone. In this case we can inform the library like so:

export const down = false;

 

Conclusion

If PostgreSQL is the database-basket you put your eggs in and you don't use an ORM that comes with a migration framework, pg-migrate is a rock-solid, lightweight and simple to use choice. There are still a few issues to solve in practice, such as how and when to actually run the migrations. While this not trivial to answer in general terms, finding a practical solution in a concrete project with a known architecture is usually pretty straight forward.