Content Migration: Data Models, Data Mappings, & Data Migration

Migrating a website can be a challenging task. Sufficient planning is the most important prerequisite to a successful site migration. It is important to have a thorough understanding of both the structure of the content in your current site and how you are structuring the content in your new site.

Data Models

It is important to understand the structure of your content on your site before you can begin any migration. It may not be feasible to create a complete model of your current site’s data, but you should at the very least create a model of existing content types with their fields and high-level type information.

What is a Data Model?

A data model is simply a definition of the structure of your content. Creating a visual representation of your content structure and relationships will help you define your data migrations. A data model should include each type of content, the individual fields (and types) on each type of content, and any content relationships.

If you have a site with a lot of related content, it may also be helpful to draw a simple Entity Relationship Diagram (ERD) to help visualize the various relationships. This will be helpful as you develop your migrations to understand migration dependencies.

Creating a Data Model

The easiest way to get started with creating a data model is to use a spreadsheet to define each type of content. This sheet should be created as you are planning your new site to define the content types and relationships you will be creating. Often, these models become outdated once the site has been in existence for some time and changes are made, but it should be considered the “source of truth” in early phases of development.

A separate sheet which lists all entity types, and information about them is helpful to get an overview of the types of content your site contains. It should include things like

  • Machine name
  • Human readable label
  • Type of content (node, media, paragraph, taxonomy term)
  • Will there be a content moderation workflow?
  • Any default path alias pattern
  • Any rules for automatically generated titles

Next, create a sheet for each type of content from your list above and document each field that appears on the content type. It should include things like

  • Bundle (from above)
  • Field machine name
  • Field label
  • Field group, if any
  • Type (text, integer, long text, entity reference, boolean, link, etc)
  • Required?
  • Cardinality (i.e. the number of values allowed for the field, a numeric value or “unlimited”)
  • Type of form widget (textbox, link, radio buttons/checkboxes, etc)
  • Any default value
  • For list type fields, specify the allowed values
  • For entity reference fields, the types of entities which may be referenced

Other things which can be added to your content model definition that are not as necessary for migrations are a sheet for migration workflows which define the states and transitions for each workflow and who has permission to move content from one moderation state to another.

Mapping Your Data

Part of your migration plan should include documenting how you will map your existing content to your new site’s content model. This can be as simple as a direct copy if your data structure is not changing, but often requires some amount of transformation if your new site is not simply a copy of your old site.

Your migration map will also help you understand migration dependencies so that you can accurately specify them in your migration plan. For instance, if you plan to retain the identity of your node authors, you must migrate your users first.

Defining a Migration Map

Once again, a spreadsheet is your friend when defining your migration map. For simple entities like taxonomy terms, often you can migrate them all at once if they all have the same fields (which you will know because you created those content models earlier). You can usually use a single map for all bundles of simple types like taxonomy term, media entity, and files.

For more complex entity types, the following information should be included

  • ID of the migration responsible for this field
  • Destination
    • Entity type
    • Bundle
    • Field machine name
  • Source
    • Entity type
    • Bundle
    • Field machine name
  • Notes
    • Migration to use to look up entity reference for entity reference fields
    • Note if custom processing will need to be created
    • Note if a map from current values to new values will need to be created

If you have many migrations with many dependencies, it will be helpful to have an overview sheet which lists all the migrations which will be created along with their dependencies.

Migrating Your Data

Once you have completed the documentation phase of your migration project, you can begin to create your specific migration implementation. There are several considerations to take into account, including what kind of CMS (if any) your current site is using, how you will access that content, and if there is content that will need extra attention either before, during, or after the migration.

Migration Considerations

Before you begin writing any automated migration scripts, the following scenarios should be considered to decide the best approach.

Primarily static HTML

If you are not using a CMS, or you are using Cold Fusion or classic ASP or some other technology to create a template into which content is added, you will need to scrape your current site in some way if you decide to go the automated migration route. 

If the content has been created with a template, this will be somewhat easier as you need to be able to identify a consistent HTML tag which identifies the content portion of the page markup. If there is no template and little consistency in the page structure, it may be easier to simply manually migrate the content into the new site. You can still use a migration map to define which page sections belong in which fields if you are moving to more structured content.

Other CMS/Custom Application

In the case where you are storing content in a database, you will need to determine whether you wish to migrate directly from the database with a set of custom SQL migration sources, or export your existing content into CSV files which can be used as migration sources. 

One consideration in this case is how your migration would access the current database. Often, legacy databases will not be accessible to your automated migration and you need to either create a dump of the database which can be imported into a secondary database accessible by your migrations or export the data into CSV. Often, creating data export to CSV is the easier route, as you do not need to create custom migration sources. If your data source database is not MySQL or an equivalent, creating compatible database dumps is also difficult.

If you go the route of exporting content to CSV, your source data for your migration map will simply be the columns of your CSV file. You can include a multiple value field in a single table cell and use your migration to split them.

Drupal 7

If your current site is Drupal 7, you can simply create a database dump which you can add to your new Drupal 9 site and use the provided Drupal 7 migration sources. If you define your database connection properly in your Drupal 9 site, you do not even need to specify the source database for Drupal 7 migration sources.

When evaluating your Drupal 7 site for migration readiness, you should consider the modules you are using to provide field types and ensure they have support for the equivalent field type in Drupal 9. If you are using Field Collection, for instance, you will need to convert those to paragraph entities in Drupal 9.

If you are Panels and Panelizer in your Drupal 7 site, you will more than likely be migrating to Layout Builder in Drupal 9. This is not a trivial exercise, and it may be more efficient to create the D7 panels pages in D9 as a layout builder enabled content type and manually move the content contained in your D7 panels manually. If you are primarily embedding custom blocks in D7, you may be able to migrate the block content automatically and then hook them up post-migration.

Migrating Files

You should also make a plan for how to physically migrate your files into your new Drupal 9 site. These files are those uploaded by content authors when creating content and can include files embedded in content as well as files that may be attached to content in Drupal 7.

It is more efficient if you can simply copy the files from your current site to a directory outside of your Drupal codebase that is accessible to the site and use a local copy to shift them to their desired end location.

Another consideration for file migration is whether the files will be served from local public files, local private files, or will be using some other service, like S3. You also should consider if you will be changing the destination location relative to your original file destination. If you are moving from D7 to D9 and retaining the same URLs, you will be able to use existing links.

Moving files from static sites will require that you extract image and file links from the source markup and copy them as part of your content migration.

Defining Automated Migrations

The Drupal 9 MIgration API is used to migrate content into Drupal. Migrations are defined as Extract – Transform – Load processes using YAML files. The file defines each section of the process. All components of a migration are defined as plugins.

  • Extract:  Defined in the “source” section, this is the mechanism to retrieve the source data one row at a time in the source plugin. Examples are SQL or CSV.
  • Transform: Defined in the “process” section, this specifies the mapping from source field to destination field, and contains a “process chain” which describes each transformation that must be performed. This can be as simple as a direct map from source to destination, or include complex data transformations or lookups for referenced entities.
  • Load: Defined in the “destination” section,  this specifies how the data will be stored in the database. It specifies the entity type this data row will be stored in. 

Your migration plan will be invaluable when you begin creating your migrations as the source of mappings for the process section.  If you have fields which will be populated as references to other migrated entities, you will need to specify that migration as a migration dependency of the migration in which the field is defined.

Migration Strategies

Once you have a good understanding of where you are and where you are going, you can begin to plan how you will migrate your data. You may use a combination of strategies, depending on the complexity of your content, the amount of content you need to migrate, and how you will be accessing the data to import.

“Lift and Shift”

This type of migration attempts to recreate your existing site in Drupal 9 with the same look and feel and functionality as your existing site. This can be a valid option if you are happy with the functionality and appearance of your existing site. 

If you are moving from Drupal 7 to Drupal 9, this will be the easiest route provided your Drupal 7 site is not particularly complex and all functionality can be reproduced in Drupal 9.  If the site is simple enough, you may be able to take advantage of the Migrate Upgrade module to generate migrations for you, or even to complete your migration. Note if you go this route, you must have your new Drupal 9 site installed with all required modules.

Incremental Migration

If your current site is primarily unstructured content, it may make sense to use this strategy to simply import the unstructured content into a content type that is also unstructured and then manually split it out into something more structured after the content has been migrated. This can allow you to take advantage of the benefits of moving to Drupal 9 and allow you to incrementally create new more structured content which can be deployed to replace the existing content over time.

Replatform

This is the most complex scenario of the three, as you may be restructuring your content model in addition to changing the appearance of your site. This may be the only option available to you if your site features cannot be easily replicated in Drupal 9. 

Sufficient planning is key to a successful replatform, especially if you are changing your content model. You will need to plan how to transform existing content into the new data structure; for example, content that has been split up into multiple entities may be combined into a single entity. In that case, it may require multiple migration runs to generate a single new piece of content.

In Conclusion…

The single most important thing you can do to help ensure a successful content migration is to sufficiently plan ahead of time. It is tempting to just jump in and start writing and running migrations, but you will have better results with less need to rework if you make a plan first.

You should be familiar with the structure of your source data as well as have a good understanding of what the content contains. When you are developing your migration plan, you should take some time to review your existing content for anomalies and cases where the content doesn’t quite match what you are expecting.

Share on facebook
Share on twitter
Share on linkedin