Managing Multiple Databases in a Single Rails Application

What? Why?

It may sound crazy at first, since most of the Rails applications we tend to work with connect to just one database. But, it actually isn't that unusual to have a single app communicate with more than one database. Consider the following use cases:

source

You have a legacy system connecting to a legacy database. You spin up a shiny new Rails 5 app and incorporate all the new hotness, but still need to query and incorporate data from your legacy system. Enter...the second database.

source

Your enterprise system relies on a workhorse of a database with hundreds of thousands of intergalactic prisoners I mean records. Certain queries that your end-user makes are extremely slow and negatively impact user experience. You decide to create a data warehouse––a brand new DB in which you aggregate and index data from various database tables in your original database in order to increase query times. Your brand new single-purpose data warehouse app needs to be able to connect to and grab records from the original DB in order to populate them in your shiny new warehouse db.

Now that we see it is not at all unusual to design a Rails application that can manage connections to multiple databases, let's take a look at how we can implement such a design in an efficient manner.

How?

As a first-time multiple database user, my basic understanding of how Rails + Active Record manages the interactions with a single database went something like this:

Rails gives us a database.yml file where we write some mysterious configurations.
Active Record does the things (technical term) and magically connects the app to a database.
Active Record maps each of my ActiveRecord::Base inherited models to a database table.
More magic ensues.

As you can see from the number of times I just used the word "magic", Rails and Active Record abstract away a lot of the work of managing database connections. In order to configure our app to connect to more than one database, we're going to have to dive a little deeper.

Let's get started!

Domain Model

Let's say we're helping Rick Sanchez build an app to catalogue his many inventions. Our inventory app catalogues inventions in an inventions table. Some inventions, however, deal with inter-dimensional travel and can't be stored in our regular database--they are stored in the 4th dimension. These inventions need to be stored in their very own database all the way over in the 4th dimension.

source

Our domain model is pretty simple. We have an Invention model and a InterdimensionalInvention model. The Invention model will correspond to an
inventions table in the main database for our application, and the InterdimensionalInvention model will correspond to a interdimensional_inventions table in our fourth dimension database.

Let's say that Rick's fourth dimension database also keeps track of the invention's permits, since the time cops of the fourth dimension are really big sticklers for that kind of thing. So we'll also have a Permit model and a permits table in the fourth dimension db as well. An interdimensional invention has many permits and a permit belongs to an interdimensional invention.

First things first, we'll create a new YAML file to hold our second database's configuration.

Configuring the Second Database

Our app will be configured to connect to our second database with the help of a YAML file, fourth_dimension_database.yml.

# config/fourth_dimension_database.yml
default: &default
  encoding: utf8
  adapter: mysql2
  port: 5500

development:
  <<: *default
  database: fourth_dimension_db_prod
  host:
  username:
  password:

test:
  <<: *default
  host: localhost
  username:
  password:
  database: fourth_dimension_db_test

Here, we assume that the fourth dimension DB has already been created, and that it can be found under a name of fourth_dimension_db_prod, running on port 5500.

The purpose of this configuration file is to have a nice clean place to store our database connection configuration options to be passed to the Active Record method we will call to establish the connection.

Connecting the the Second Database

Active Record provides us with an API for connecting to databases and managing those connections.

The ActiveRecord::Base.establish_connection method creates an instance of ActiveRecord::ConnectionAdapters::ConnectionPool using the configuration hash that we pass as an argument:

ActiveRecord::Base.establish_connection(config_options)

We stored our configuration options in a YAML file, so let's create an initializer to read the YAML file into a Ruby hash and store it in an environment variable.

# config/initializers/fourth_dimension_database.rb

FOURTH_DIMENSION_DB = YAML.load_file(File.join(Rails.root, "config", "fourth_dimension_database.yml"))[Rails.env.to_s]

Now we can call .establish_connection with an argument of FOURTH_DIMENSION_DB.

But where should we invoke this method?

We have to tell any and all classes that need to map to database tables in our separate fourth dimension database to establish and use a connection to that database. We could call .establish_connection in both of the models that need to use this connection:

class InterdimensionInvention < ActiveRecord::Base
  establish_connection FOURTH_DIMENSION_DB
end

class Permit < ActiveRecord::Base
  establish_connection FOURTH_DIMENSION_DB
end

This isn't terribly efficient though, and will result in more database connections than we bargained for.

To understand why, we need to take a closer look at the .establish_connection method and learn how Active Record manages database connections in the context of our Rails app.

Active Record and the Database Connection Pool

Establishing the Connection Pool

The .establish_connection method returns an instance of ActiveRecord::ConnectionAdapters::ConnectionPool

def establish_connection(spec)
  owner_to_pool[spec.name] = ConnectionAdapters::ConnectionPool.new(spec)
end

The connection pool instance manages the connections that your application opens up to the database. The default pool size is 5, although we can specify some other pool size via the optional pool: attribute in our YAML configuration file.

"Pool size" refers to the number of connections that your application will allow to be opened to the database. Active Record will only open a new connection to the database when a new thread or process attempts to communicate to the database by executing a SQL query.

Connecting to the Database

When this happens, Active Record will grab the connection pool instance and invoke another method on it, the #connection method.

def connection          
 @thread_cached_conns[connection_cache_key(Thread.current)] ||= checkout
end

The #connection method looks up up a stored map, @thread_cached_conns, in which the keys represent thread ids and the values represent database connections.

This map is created, empty, when the connection pool instance initializes, and gets populated with database connections when the connections are created.

Let's take a look at a portion of the connection pool instance's #initialize method:

def initialize(spec)
  super()
  @spec = spec
  ...
  @thread_cached_conns = Concurrent::Map.new(:initial_capacity => @size)
  @connections = []
end

So, the #connection_method introspects on the @threaded_cached_conns map. If the map already contains a key representing the current thread, #connection will return the connection, otherwise it will call another method, #checkout to open and store such a connection.

The #checkout method will in turn, eventually, invoke the #new_connection method, which does the actual work of creating our database connection:

def new_connection
  Base.send(spec.adapter_method, spec.config).tap do |conn|
    conn.schema_cache = schema_cache.dup if schema_cache
  end
end

Managing Connections via the Connection Pool

Active Record will use the connection pool instance that we created when we called establish_connection to manage the creation and use of each database connection.

When we initialized our connection pool instance via .establish_connection, it set an instance variable, @size, equal to the pool size defined in the configuration that we passed as an argument to .establish_connection, defaulting it to 5 if no such option is included in the configuration.

Here's a look at the relevant portion of the #initialize method:

def initialize(spec)
  super()
  @spec = spec
  ...
  # default max pool size to 5
  @size = (spec.config[:pool] && spec.config[:pool].to_i) || 5
end

So, the pool is aware of the max connection size, which defaults to 5, and will only allow 5 connections to be "checked out", i.e. created and/or used, at a given time. The connection pool's #checkout method implements some conditional logic, more or less saying: "if the number of open connections is less than the number of allowed connections, call #new_connection, else, throw an error."

The error that is raised if your app tries to use more connections than your pool has made available will look something like this:

ActiveRecord::ConnectionTimeoutError - could not obtain a database connection

So, if our app is trying to connect to our fourth dimension database by enacting a SQL query with any fourth_dimension_database_prod-connected tables, Active Record will look for available connections and use them if they are found. Or it will open a new connection if no open connections are available. The app determines whether or not it can open a new connection to the database by asking the connection pool how many connections are already opened and how many connections are allowed to be open.

But, if we call .establish_connection twice, once in each of the InterdimensionalInvention and Permit models, then we are creating two connection pools for our one fourth dimension databse.

Both connection pools will do their work at the same time, effectively allowing twice as many database connections to be opened between our app and our fourth dimension database than we have anticipated or intended. If our fourth dimension database isn't configured to allow for that many concurrent connections, our database will error out and we won't be able to interact with it at all! Oh no!

The fourth dimension time cops would not stand for this--look what they did to Albert Einstein because he overloaded their database:

source

We need to ensure that we only establish one connection pool instance to manage connections between our application and our fourth dimension db.

To do this, we'll implement a base class to handle the connection pool and we'll inherit the appropriate models from this base class.

Handling the Connection Pool in a Base Class

# app/models/fourth_dimension_db_base.rb

class FourthDimensionDbBase < ActiveRecord::Base
  self.abstract_class = true
  establish_connection FOURTH_DIMENSION_DB
end

# app/models/fourth_dimension_db_base.rb

class InterdimensionalInvention < FourthDimensionDbBase
  has_many :permits
end

# app/models/fourth_dimension_db_base.rb

class Permit < FourthDimensionDbBase
  belongs_to :interdimensional_invention
end

Note that we've made our parent class an abstract class. This leverages an ActiveRecord::Base class method, .abstract_class which tells Active Record that we do not want this class to map to a database table. Active Record will not look for and query a table with the name of this class and it will understand that this class does not have any attributes.

That should do it for our refactor. Our app is now all set up to efficiently manage connections to two different databases!