Moving Django Apps Between Databases With Zero Downtime

If you've been following Cedar’s Tech Decode blog, you know that we use Django, and that we've been hard at work addressing performance bottlenecks as we scale our platform. In mid-2021 Cedar reached a particularly insidious bottleneck: the ability to vertically scale our core production Postgres database in AWS. The ‘default’ database that supports our core Django project was housed on the largest RDS Postgres instance available, and its CPU thresholds started exceeding 80%, threatening platform stability.

This wasn't a surprise and Cedar wasn't altogether unprepared. You don't upgrade your core production database to the largest instance available in AWS without having a plan in place to horizontally scale your platform. Lack of planning wasn't the problem–it was anticipating just how quickly that largest RDS Postgres instance would start to alarm.

Cedar's engineering team needed a stopgap. That's where Django support for multiple databases comes in. Cedar's platform team quickly decided to isolate some high-traffic tables in our core database and migrate them to adjacent Postgres instances. We implemented horizontal scaling within a single Django project without needing a maintenance window or causing production downtime by taking the following steps:

  1. Know when to move apps to a new database, and when not to
  2. Isolate each app prior to routing it to a new database
  3. Instantiate each app model's relation schema on the new database
  4. Replicate data onto the new database
  5. Warehouse data from the new database prior to cutting over database routing

Preparing an app for its new database

Isolating an app in the context of having multiple Django databases in one project means removing foreign key relationships that will span those databases. This change stops Django from allowing joins across databases. This can be avoided in application code by never writing a query with a join on one of these foreign keys. But at Cedar we prefer to be explicit over implicit.

To see what this looks like, let's say we have the following Django model for our Tech Decode blog:

from django.db import models
 
class TechDecode(models.Model):
   id = models.BigAutoField(primary_key=True)
   author = models.ForeignKey(
       "engineering.Engineer",
       null=True,
       on_delete=models.PROTECT,
   )

Let's assume an engineer's primary key is an integer. Django represents this foreign key in the database as a nullable integer column 'author_id' with a foreign key constraint. To isolate our Django app we want to remove the constraint and prevent Django from managing the relationship between the ‘Engineer’ model and the ‘TechDecode’ model. We want to make this foreign key relationship identical to an integer field:

from django.db import migrations, models
 
 
class Migration(migrations.Migration):
 
   dependencies = [
       ("blog", "0001_initial"),
   ]
 
   operations = [
       migrations.AlterField(
           model_name="techdecode",
           name="author",
           field=models.ForeignKey(
               to="engineering.Engineer",
               null=True,
               on_delete=models.deletion.DO_NOTHING,
               db_constraint=False,
           ),
       ),
   ]

We can alter Django's model state and tell it that the 'author' field is just an 'author_id' integer field. This is done in two migrations to ensure the foreign key constraint is broken before the field type is changed. The following does not alter the database column, it only stops Django from using a related manager on a field it can no longer relate to:

from django.db import migrations, models
 
 
class Migration(migrations.Migration):
 
   dependencies = [
       ("blog", "0002"),
   ]
 
   operations = [
       migrations.SeparateDatabaseAndState(
           state_operations=[
               migrations.RemoveField(
                   model_name="techdecode",
                   name="author",
               ),
               migrations.AddField(
                   model_name="techdecode",
                   name="author_id",
                   field=models.IntegerField(null=True),
               ),
           ]
       )
   ]

At this point, with this pattern applied to any foreign key relationship that will span databases, we're ready to instantiate our app, in this case 'blog,' onto a new database.Preparing a database for its new app

Let's assume we have our new database and it's configured properly. We can even assume Django knows about it, even if it isn't using it yet. The app doesn't know about the database and the database doesn't have any tables from the app. The next step is to set up replication from our default database to this new database.

If you use logical replication in Postgres, the subscribing database needs an equivalent table for each model in the app we're moving (each table the default database will publish). This is preferred over physical replication which replicates data at the block (disk) level and cannot discriminate between tables. The natural way to initialize these tables is with their Django migrations, but that isn't likely to work automatically. Most migrations are written to run at a point in time and to avoid causing downtime in production, not to be re-run in a fresh isolated environment.

We’ll cover prepping an app’s migrations before moving it onto a new database in another post. One must apply retroactive continuity to that app's migrations, so it's as if the app never related to any other app and that its migrations only contain idempotent (or no) data operations. The best way to accomplish the former is by squashing the app's migrations. There isn't necessarily a "best way" to accomplish the latter, but at Cedar we found that apps that were strong candidates to have their own database had migration histories with data operations that were easy to make idempotent (or delete altogether).

With our app's migrations ready, we need to run them against the new database. Django routes operations (read, write and migrate) with database routers. Cutting over our app from the default to the new database means updating or adding a database router to tell Django where to route these operations. We aren't there yet, we just want to temporarily allow 'migrate' operations for our app to run against both its current and its future database. This can be done with a router that looks like the following:

class TemporaryDbMoveRouter:
   def allow_migrate(self, db, app_label, model_name=None, **hints):
       if db == "new_database" and app_label == "blog":
           return True
       return None

This router should be given precedence in our project's Django settings:

DATABASE_ROUTERS = [
   "our_project.blog.router.TemporaryDbMoveRouter",
   # There may be many reasons to route database operations
   # to other databases.
   ...,
]

Then, either from a production command-line interface or through some other method for running Django management commands in production, we can do the following:

$ ./manage.py migrate blog --database new_database

Without the temporary router this operation would be disallowed. Although it may populate the new database's 'django_migrations' table with 'blog' entries, it will not create the tables we need for replication.

Cutover!

With all of this, we're ready to cut over our app to a new database. It has a copy of our production data and our data warehouse is kept up to date from the new database. All we need to do is add the following router to our Django settings:

$ ./manage.py migrate blog --database new_database

And:

DATABASE_ROUTERS = [
   "our_project.blog.router.DbRouter",
   ...,
]

Releasing this router will cut our 'blog' app over from its old to its new database, taking some pressure off the core database. This router is intentionally rudimentary and is meant to demonstrate Django routing functionality. The ‘db_for_read’ and ‘db_for_write’ methods only provide the direction that reading or writing for models in the ‘blog’ app should be routed to our ‘new_database.’ The ‘allow_migrate’ method lets the Django ‘migrate’ command know that the ‘blog’ app should be migrated onto the ‘new_database’ and should not be migrated anywhere else. Each of these methods returns ‘None’ otherwise, indicating that this router does not have an opinion on any other operations in our project.

There we have it, a clean and (relatively) succinct process for horizontally scaling an application within a Django project. If you found this interesting, stay tuned! We have a few follow-on pieces in the works. We’ll discuss our move to service-oriented architecture, to a point-of-delivery production model and most relevantly our discovery of some insidious behavior in our Django connections and subsequent implementation of a connection pooler.

Colin Payne-Rogers

Colin Payne-Rogers

Colin is a Senior Software Engineer with Cedar's Platform team.