Notes on Exporting Large Movable Type Databases
It is old news now that Movable Type fell off the map in the competition between blogging platforms. Versions 4 and 5 were the last openly available distributions, and can no longer be updated or patched easily. Version 6 is closed, an enterprise product. If forced to pick one root cause for the failure of Movable Type to compete with the likes of WordPress, it would be that Movable Type is written in Perl. The number of people capable of - and also interested in - working in Perl to produce quality plugins and the other tools of an open ecosystem for a web application is small in comparison to those who use PHP, and Perl is so different from PHP that there won't be much crossover at the middling skill levels that are most prevalent in the development community. It is a lesson to consider when choosing a language for a new project.
A few years on from all that, and we're now in a time at which legacy Movable Type 4.* and 5.* installations are painful to maintain, as opposed to merely inconvenient. The cost-benefit evaluation for migration to another platform, such as WordPress, looks more favorable now. The hole was dug, and it is time to climb out of it. Unfortunately extracting data from Movable Type in a useful way is not a well supported process. You'll find it challenging to identify a reliable recipe even for the well-worn path of export from Movable Type and import to WordPress, and things only become more challenging if you have a large database - a thousand posts or more. The principal problems here are:
- The built-in Movable Type export tool in the blog administrative web UI will time out for a database of any size.
- The built-in Movable Type export format still leaves you managing assets and other items manually.
- Even the better WordPress plugins for Movable Type import require careful handling and clean-up work after import. They will also cheerfully time out if used via the web interface.
I was recently working on the migration of a ~10,000 post Movable Type 4.* installation encrusted with a decade of custom additions, something that was put off many times in the past when it would have been less work than is the case now. I tinkered with a number of approaches to extracting the data, of course finding all of the pitfalls along the way by dint of walking into them.
A Perl Export Script for Movable Type
As a method of evading web application timeouts and gaining greater control over the process of backing up to a Movable Type export format file, I threw together a quick script that runs the Movable Type export directly. Perl isn't a language I spend any significant time with, so this is probably terrible in several ways, but it works for Movable Type 4.*, and will probably do just fine for 5.* versions as well. There wasn't all that much difference between those releases:
#!/usr/bin/perl # # A command line interface to export a Movable Type blog to the standard output # format. # # Developed for Movable Type 4.*, but will probably work for 5.* with minor # alterations. # # This bypasses all the issues with exporting from the web interface, such as # timeouts for large blogs. It will only work if you have a correctly configured # MT instance set up on this server, and can point this script to the MT_HOME # location. # # Usage: mt-cli-export.pl <mt_home> <blog_id> > export.txt # # E.g. mt-cli-export.pl /var/www/html/cgi-bin/mt 1 # use warnings; use strict; BEGIN { # Abort if lacking all of the arguments. if ($#ARGV < 1) { print <<EOF; Usage: mt-cli-export.pl <mt_home> <blog_id> > export.txt E.g. mt-cli-export.pl /var/www/html/cgi-bin/mt 1 EOF exit 1; } # Necessary for the code to locate the configuration file and other odds and # ends. # # Since I'm going to use this in use statements, it has to be in the BEGIN # block. $ENV{MT_HOME} = $ARGV[0]; } # Wherever you happen to be keeping the Movable Type installation. use lib "$ENV{MT_HOME}/lib"; use lib "$ENV{MT_HOME}/extlib"; require MT::Blog; require MT::ImportExport; require MT::ObjectDriverFactory; my $blog_id = $ARGV[1]; my $blog = MT::Blog->load($blog_id); MT::ImportExport->export( # Object representation of the blog to export. $blog, # Callback. sub { print @_ } ) or exit 1; exit 0;
The downside of this is that the output is the standard Movable Type export format, which isn't ideal for import to other blog platforms: there isn't much in the way of good up-to-date tooling out there now that Movable Type is effectively dead for small-scale blogging.
Create a Template that Forms a Backup File in another Format
Most blog platforms have some sort of import/export format, through the degree to which they actually support it is varied. WordPress has the WXR format based on RSS, for example, but this isn't formally specified anywhere and the related tool ecosystem is pretty varied in quality. I'll use it as an example here to illustrate that you can, in theory, generate any files in any backup format you like in Movable Type by creating a suitable index template. In the case of WXR a kind soul has done this work for us and put it up at GitHub with a GPL license. Here is a copy for posterity:
<?xml version="1.0" encoding="<$mt:PublishCharset$>"?> <$mt:Var name="WXR_VERSION" value="1.1"$> <$mt:Var name="number_of_entries" value="50"$> <$mt:Var name="entries_offset" value="0"$> <$mt:Var name="export_entries" value="1"$> <$mt:Var name="export_pages" value="1"$> <$mt:Var name="export_assets" value="1"$> <!-- This is a WordPress eXtended RSS file generated by Movable Type <$mt:Version encode_xml="1"$> as an export of your site. --> <!-- It contains information about your site's posts, pages, comments, categories, and other content. --> <!-- You may use this file to transfer that content from one site to another. --> <!-- This file is not intended to serve as a complete backup of your site. --> <!-- To import this information into a WordPress site follow these steps: --> <!-- 1. Log in to that site as an administrator. --> <!-- 2. Go to Tools: Import in the WordPress admin panel. --> <!-- 3. Install the "WordPress" importer from the list. --> <!-- 4. Activate & Run Importer. --> <!-- 5. Upload this file using the form provided on that page. --> <!-- 6. You will first be asked to map the authors in this export file to users --> <!-- on the site. For each author, you may choose to map to an --> <!-- existing user on the site or to create a new user. --> <!-- 7. WordPress will then import each of the posts, pages, comments, categories, etc. --> <!-- contained in this file into your site. --> <!-- generator="<$mt:ProductName encode_xml="1"$>/<$mt:Version encode_xml="1"$>" created="<$mt:Date format="%Y-%m-%d %H:%M" encode_xml="1"$>" --> <rss version="2.0" xmlns:excerpt="http://wordpress.org/export/<$mt:Var name="WXR_VERSION"$>/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/<$mt:Var name="WXR_VERSION"$>/" > <channel> <title><$mt:BlogName encode_xml="1"$></title> <link><$mt:BlogURL encode_xml="1"$></link> <description><$mt:BlogDescription encode_xml="1"$></description> <pubDate><$mt:Date format_name="rfc822" utc="1" encode_xml="1"$></pubDate> <generator><$mt:ProductName encode_xml="1"$> <$mt:Version encode_xml="1"$></generator> <language><$mt:BlogLanguage encode_xml="1"$></language> <wp:wxr_version><$mt:Var name="WXR_VERSION"$></wp:wxr_version> <wp:base_site_url><mt:BlogParentWebsite><$mt:WebsiteURL encode_xml="1"$></mt:BlogParentWebsite></wp:base_site_url> <wp:base_blog_url><$mt:BlogURL encode_xml="1"$></wp:base_blog_url> <mt:Authors> <wp:author> <wp:author_id><$mt:AuthorID$></wp:author_id> <mt:If tag="AuthorName"> <wp:author_login><$mt:AuthorName encode_xml="1"$></wp:author_login></mt:If> <mt:If tag="AuthorEmail"> <wp:author_email><$mt:AuthorEmail encode_xml="1"$></wp:author_email></mt:If> <mt:If tag="AuthorDisplayName"> <wp:author_display_name><$mt:AuthorDisplayName encode_xml="1"$></wp:author_display_name></mt:If> </wp:author> </mt:Authors> <mt:Categories> <wp:category> <wp:term_id><$mt:CategoryID$></wp:term_id> <wp:category_nicename><$mt:CategoryBasename separator="-" encode_xml="1"$></wp:category_nicename> <mt:If tag="ParentCategory"> <wp:category_parent><mt:ParentCategory><$mt:CategoryBasename separator="-" encode_xml="1"$></mt:ParentCategory></wp:category_parent></mt:If> <mt:If tag="CategoryLabel"> <wp:cat_name><$mt:CategoryLabel encode_xml="1"$></wp:cat_name></mt:If> <mt:If tag="CategoryDescription"> <wp:category_description><$mt:CategoryDescription encode_xml="1" $></wp:category_description></mt:If> </wp:category> </mt:Categories> <mt:Tags> <wp:tag> <wp:term_id><$mt:TagID$></wp:term_id> <wp:tag_slug><$mt:TagName dirify="-" encode_xml="1"$></wp:tag_slug> <wp:tag_name><$mt:TagName encode_xml="1"$></wp:tag_name> </wp:tag> </mt:Tags> <mt:If var="export_entries"> <mt:Entries sort_by="authored_on" sort_order="ascend" limit="$number_of_entries" offset="$entries_offset"> <item> <title><$mt:EntryTitle encode_xml="1"$></title> <link><$mt:EntryPermalink encode_xml="1"$></link> <pubDate><$mt:EntryDate format_name="rfc822" encode_xml="1"$></pubDate> <dc:creator><$mt:EntryAuthorUsername encode_xml="1"$></dc:creator> <guid isPermaLink="false"><$mt:EntryPermalink encode_xml="1"$></guid> <description></description> <content:encoded><$mt:EntryBody encode_xml="1"$><mt:EntryIfExtended><!--more--><$mt:EntryMore encode_xml="1"$></mt:EntryIfExtended></content:encoded> <mt:If tag="EntryExcerpt"> <excerpt:encoded><$mt:EntryExcerpt encode_xml="1"$></excerpt:encoded></mt:If> <wp:post_id><$mt:EntryID encode_xml="1"$></wp:post_id> <wp:post_date><$mt:EntryDate format="%Y-%m-%d %H:%M:%S" encode_xml="1"$></wp:post_date> <wp:post_date_gmt><$mt:EntryDate format="%Y-%m-%d %H:%M:%S" utc="1" encode_xml="1"$></wp:post_date_gmt> <wp:comment_status><mt:EntryIfAllowComments>open<mt:Else>closed</mt:EntryIfAllowComments></wp:comment_status> <wp:ping_status><mt:EntryIfAllowPings>open<mt:Else>closed</mt:EntryIfAllowPings></wp:ping_status> <wp:post_name><$mt:EntryBasename separator="-" encode_xml="1"$></wp:post_name> <wp:status><mt:If tag="EntryStatus" eq="Publish">publish<mt:ElseIf tag="EntryStatus" eq="Review">pending<mt:ElseIf tag="EntryStatus" eq="Future">future<mt:Else>draft</mt:If></wp:status> <wp:post_parent>0</wp:post_parent> <wp:menu_order>0</wp:menu_order> <wp:post_type>post</wp:post_type> <wp:post_password></wp:post_password> <wp:is_sticky>0</wp:is_sticky> <mt:If tag="EntryCategory"><mt:EntryCategories> <category domain="category" nicename="<$mt:CategoryBasename separator="-" encode_xml="1"$>"><$mt:CategoryLabel encode_xml="1"$></category> </mt:EntryCategories></mt:If> <mt:If tag="EntryTags"><mt:EntryTags> <category domain="post_tag" nicename="<$mt:TagName dirify="-" encode_xml="1"$>"><$mt:TagName encode_xml="1"$></category> </mt:EntryTags></mt:If> <mt:Comments> <wp:comment> <wp:comment_id><$mt:CommentID encode_xml="1"$></wp:comment_id> <wp:comment_author><$mt:CommentAuthor encode_xml="1"$></wp:comment_author> <mt:If tag="CommentEmail"> <wp:comment_author_email><$mt:CommentEmail encode_xml="1"$></wp:comment_author_email></mt:If> <mt:If tag="CommentURL"> <wp:comment_author_url><$mt:CommentURL encode_xml="1"$></wp:comment_author_url></mt:If> <mt:If tag="CommentIP"> <wp:comment_author_IP><$mt:CommentIP encode_xml="1"$></wp:comment_author_IP></mt:If> <wp:comment_date><$mt:CommentDate format="%Y-%m-%d %H:%M:%S" encode_xml="1"$></wp:comment_date> <wp:comment_date_gmt><$mt:CommentDate format="%Y-%m-%d %H:%M:%S" utc="1" encode_xml="1"$></wp:comment_date_gmt> <wp:comment_content><$mt:CommentBody encode_xml="1"$></wp:comment_content> <wp:comment_approved><mt:CommentIfModerated>1<mt:Else>0</mt:CommentIfModerated></wp:comment_approved> <wp:comment_type></wp:comment_type> <mt:IfCommentParent> <wp:comment_parent><mt:CommentParent><$mt:CommentID encode_xml="1"$></mt:CommentParent></wp:comment_parent></mt:IfCommentParent> <mt:IfCommenterIsAuthor> <wp:comment_user_id><$mt:CommenterID encode_xml="1"$></wp:comment_user_id></mt:IfCommenterIsAuthor> </wp:comment> </mt:Comments> </item> </mt:Entries> </mt:If> <mt:If var="export_pages"><mt:Pages sort_by="authored_on" sort_order="ascend"> <item> <title><$mt:PageTitle encode_xml="1"$></title> <link><$mt:PagePermalink encode_xml="1"$></link> <pubDate><$mt:PageDate format_name="rfc822" encode_xml="1"$></pubDate> <dc:creator><$mt:EntryAuthorUsername encode_xml="1"$></dc:creator> <guid isPermaLink="false"><$mt:PagePermalink encode_xml="1"$></guid> <description></description> <content:encoded><$mt:PageBody encode_xml="1"$><$mt:PageMore encode_xml="1"$></content:encoded> <mt:If tag="PageExcerpt"> <excerpt:encoded><$mt:PageExcerpt encode_xml="1"$></excerpt:encoded></mt:If> <wp:post_id><$mt:PageID encode_xml="1"$></wp:post_id> <wp:post_date><$mt:PageDate format="%Y-%m-%d %H:%M:%S" encode_xml="1"$></wp:post_date> <wp:post_date_gmt><$mt:PageDate format="%Y-%m-%d %H:%M:%S" utc="1" encode_xml="1"$></wp:post_date_gmt> <wp:comment_status><mt:IfCommentsAccepted>open<mt:Else>closed</mt:IfCommentsAccepted></wp:comment_status> <wp:ping_status><mt:IfPingsAccepted>open<mt:Else>closed</mt:IfPingsAccepted></wp:ping_status> <wp:post_name><$mt:PageBasename separator="-" encode_xml="1"$></wp:post_name> <wp:status><mt:PageIfTagged tag="@draft">draft<mt:Else>publish</mt:PageIfTagged></wp:status> <wp:post_parent>0</wp:post_parent> <wp:menu_order>0</wp:menu_order> <wp:post_type>page</wp:post_type> <wp:post_password></wp:post_password> <wp:is_sticky>0</wp:is_sticky> <mt:If tag="PageFolder"><mt:PageFolder> <category domain="folder" nicename="<$mt:FolderBasename separator="-"$>"><$mt:FolderLabel encode_xml="1"$></category> </mt:PageFolder></mt:If> <mt:PageIfTagged> <mt:PageTags><category domain="post_tag" nicename="<$mt:TagName dirify="-" encode_xml="1"$>"><$mt:TagName encode_xml="1"$></category></mt:PageTags> </mt:PageIfTagged> <mt:Comments> <wp:comment> <wp:comment_id><$mt:CommentID encode_xml="1"$></wp:comment_id> <wp:comment_author><$mt:CommentAuthor encode_xml="1"$></wp:comment_author> <mt:If tag="CommentEmail"> <wp:comment_author_email><$mt:CommentEmail encode_xml="1"$></wp:comment_author_email></mt:If> <mt:If tag="CommentURL"> <wp:comment_author_url><$mt:CommentURL encode_xml="1"$></wp:comment_author_url></mt:If> <mt:If tag="CommentIP"> <wp:comment_author_IP><$mt:CommentIP encode_xml="1"$></wp:comment_author_IP></mt:If> <wp:comment_date><$mt:CommentDate format="%Y-%m-%d %H:%M:%S" encode_xml="1"$></wp:comment_date> <wp:comment_date_gmt><$mt:CommentDate format="%Y-%m-%d %H:%M:%S" utc="1" encode_xml="1"$></wp:comment_date_gmt> <wp:comment_content><$mt:CommentBody encode_xml="1"$></wp:comment_content> <wp:comment_approved><mt:CommentIfModerated>1<mt:Else>0</mt:CommentIfModerated></wp:comment_approved> <wp:comment_type></wp:comment_type> <mt:IfCommentParent> <wp:comment_parent><mt:CommentParent><$mt:CommentID encode_xml="1"$></mt:CommentParent></wp:comment_parent></mt:IfCommentParent> <mt:IfCommenterIsAuthor> <wp:comment_user_id><$mt:CommenterID encode_xml="1"$></wp:comment_user_id></mt:IfCommenterIsAuthor> </wp:comment> </mt:Comments> </item> </mt:Pages></mt:If> <mt:If var="export_assets"><mt:Assets sort_by="created_on" sort_order="ascend"> <item> <title><$mt:AssetLabel encode_xml="1"$></title> <link><$mt:AssetURL encode_xml="1"$></link> <pubDate><$mt:AssetDateAdded format_name="rfc822" encode_xml="1"$></pubDate> <dc:creator><$mt:AssetAddedBy encode_xml="1"$></dc:creator> <guid isPermaLink="false"><$mt:AssetURL encode_xml="1"$></guid> <description></description> <mt:If tag="AssetDescription"> <content:encoded><$mt:AssetDescription encode_xml="1"$></content:encoded></mt:If> <wp:post_id><$mt:AssetID encode_xml="1"$></wp:post_id> <wp:post_date><$mt:AssetDateAdded format="%Y-%m-%d %H:%M:%S" encode_xml="1"$></wp:post_date> <wp:post_date_gmt><$mt:AssetDateAdded format="%Y-%m-%d %H:%M:%S" utc="1" encode_xml="1"$></wp:post_date_gmt> <wp:post_name><$mt:AssetFilename encode_xml="1"$></wp:post_name> <wp:status>inherit</wp:status> <wp:post_parent>0</wp:post_parent> <wp:menu_order>0</wp:menu_order> <wp:post_type>attachment</wp:post_type> <wp:post_password></wp:post_password> <wp:is_sticky>0</wp:is_sticky> <wp:attachment_url><$mt:AssetURL encode_xml="1"$></wp:attachment_url> <mt:AssetIfTagged><mt:AssetTags> <category domain="post_tag" nicename="<$mt:TagName dirify="-" encode_xml="1"$>"><$mt:TagName encode_xml="1"$></category></mt:AssetTags></mt:AssetIfTagged> </item> </mt:Assets></mt:If> </channel> </rss>
Note that you can create several of these templates while using the offset and number of entries to control the size of each. This is a crude but effective way to work around timeout constraints on publishing these templates. An alternative approach for very large export files is as follows:
- Create the template, and set publishing to Static mode, but don't publish it.
- Disable the Movable Type tools/run-periodic-tasks cron job if you have it set up.
- Set the template to be published via the Publish Queue mode.
- Edit one post to set up queued jobs to run.
- Manually run the periodic tasks command and wait for it to complete:
# Or whatever your Movable Type directory is. cd /var/www/html/cgi-bin/mt perl ./tools/run-periodic-tasks -verbose
This might take a long time for a very large database. After it is done set the template back to Static mode and re-enable the cron job. Then download the file directly from where it was created on the server.
Or Write Your Own Export/Import Process at the Database Level
If you have the time there is a lot to be said for writing an import/export script in your language of choice that reads from the Movable Type database and writes to the destination blog platform database. This should not be exceptionally hard, but in the case of WordPress, for example, it is a cost-benefit question as to whether it is worth the time versus just manually patching up the issues following WXR import using the standard WordPress Importer plugin. It is easy to quickly try the WXR export / import process for a Movable Type to WordPress migration, possibly for just a fraction of your content, so you can at least get a sense of whether or not you should forge ahead.
For the 10,000+ post Movable Type installation I worked with it turned out that importing via WXR created a few annoying issues but none of them were show-stoppers or even particularly hard to manually resolve. Thus I'm afraid to say that I have no database-to-database script to illustrate here, for all that it would be a much more effective approach overall.