Wednesday, May 09, 2007

Moving models to a different database

There many reasons to use multiple databases (DBs) and when this is done, there is often a case when a model needs to be moved from one DB to another. The impetus could be that part of the data is referential and this is being reflected by moving it to a read-only DB. Another possibility is that we want to protect some data with an additional layer of security, so we extract it to a secure DB. In all cases, the challenge is to migrate the existing data. When the amount of data is considerably large, there is no choice but to do it via SQL data loaders or similar techniques. On the other hand, if it is acceptable to leverage rails db migrations and you prefer to do any data manipulation through them, there are some challenges to face.

Often, you need to have an access to old and new models during data migration. One solution is to move or copy an existing model to a separate namespace and put the new model instead at the old namespace. Let's look at a couple of examples:


Extracting referential data

I have a model Fruit in our main DB which gets its data from an external source, so we only access it read-only. We want to enforce it by moving the data to a DB which we access with a read-only account. First, I create a referential_db entry in database.yml:

dbs:
database: main

referential_db:
database: referential

Then, I copy the original model, Fruit, to a dedicated namespace, so the model becomes RetiredModels::Fruit. I add establish_connection to the original namespace model:
# create_table :fruits do |t|
# t.column :name, :string
# end
class Fruit < ActiveRecord::Base
establish_connection configurations[RAILS_ENV]['referential_db']
end

Everything is set for migration. Since it is a referential data, the migration needs to preserve data integrity so the models belonging to Fruit can still reference it by an old id:
def self.up
RetiredModels::Fruit.find(:all).each do |old_record|
Fruit.new(old_record.attributes) { |new_record| new_record.id = old_record.id }.save!
end
end

After the successful migration run, all data is replicated to a new DB. The retired model can be removed during next deployment, and the original table dropped.

There is one caveat for development and test modes. If you don't want to bother with multiple databases in those modes, you need to take care of having no table name clashing. So, the new model would have to use different table names via set_table_name.


Securing sensitive data

One of the models belonging to Fruit is SecretFruit. It contains a secret name for every fruit out there. Our legal department asked the development team to protect that data in case our DB is stolen. We decided to migrate the existing SecretFruit data to a protected DB and keep sensitive data encrypted with help from Acts As Secure. First, I create a secure_db entry in database.yml:
dbs:
database: main

secure_db:
database: secure
host: protected_host

Then, I copy the original model, SecretFruit, to a dedicated namespace, so the model becomes RetiredModels::SecretFruit. I modify the model in the original namespace to reflect the new requirements:
# create_table :secret_fruits do |t|
# t.column :name, :binary
# t.column :fruit_id, :integer
# end
class SecretFruit < ActiveRecord::Base
establish_connection configurations[RAILS_ENV]['secure_db']
acts_as_secure :crypto_provider => MasterKeyProvider
belongs_to :fruit
end

Since data encryption is done on-the-fly and there are no data integrity requirements, the migration is straightforward:
def self.up
RetiredModels::SecretFruit.find(:all).each { |old| SecretFruit.create!(old.attributes) }
end

I can now safely delete RetiredModels::SecretFruit and associated data.

No comments: