Migrating email from Gmail to Maildir (Dovecot)

Migrating email from Gmail to Maildir (Dovecot)

In this post I explain how I moved my email from Gmail to my own mail server running Dovecot with the emails in Maildir format.

Introduction

This is not the only way to do this. This might not even be the best way, but it worked for me and I had full control over each step, as well as certainty that the mails were imported in a way that made them (almost) impossible to differentiate from emails already on the server.

Given how specific this topic is, I am not going to give an introduction to the concepts referenced in this post.

The Maildir Format

I highly recommend reading up on the Maildir format to understand the directories and what each part of the filename does. Dovecot have a good explanation on their website (archive link).

In case you do not, I do briefly explain the bits that are relevant to the migration.

Note About IMAP Method

There is another way documented on Dovecot's website on how to do this with doveadm and IMAP. I did not try this.

Make a Backup

Before doing anything, make a backup of the Maildir on your server, I take no responsibility for anything going wrong. I found most of this process quite scary, but knowing it is easy to extract a tarball to restore a backup brings peace of mind.

Exporting From Gmail

The first step of this journey is to get our emails out of Gmail. Google makes this easy with their Takeout feature. It lets you download your emails in the mbox format, one file per Gmail category.

Head over there and export your emails, I will not explain the exact steps here as it is fairly straightforward and it might change with time. If the above link is broken, Google (or duck) "Google Takeout".

For the remainder of the steps, I recommend doing one mbox file at a time.

Converting to Maildir

There is a script called mb2md.pl currently available on Dovecot's GitHub tools section.

This is an unmaintained script and can be found in various places online. It is also available in some Linux distribution packages. As of writing this the easiest way is to grab it off the links above.

All this script does is convert an mbox file to Maildir. I had to look at the source to figure out how to use it. The top of the file has a pretty useful section of comments explaining everything.

The main thing to be aware of is that the directories you pass in are relative to your user's home directory. All I needed was to give it a source and destination directory, and I threw in -R to make it recursively search through the source folder.

I created a folder called input where I put my mbox file, and an empty folder called output for the script to place the Maildir files. Then I ran the script.

$ ./mb2md.pl -s downloads/gmail/Takeout/input -R -d \
downloads/gmail/Takeout/ouput

Preparing Maildir Files

Now we have our emails as Maildir files, there are a few things missing. The conversion script does not provide us with any size attributes normally present in Maildir email files, these are the S and W attributes. Normally, Maildir files have the server name in them as well, whereas the script gives us a placeholder, mbox.

We are also missing the category which is the number towards the end of the file, this (usually) is 2 in my case. Given the emails are old, you will probably want them marked as read, which is done with an S at the end of the filename.

Finally, all the emails will have the UNIX timestamp of the time you ran the conversion script. This results in Dovecot serving you up the emails as if they were received today (unless your client is smart enough maybe).

Size and Server Name Attributes

There is a script on GitHub to add the S (size) attribute, but it does not add W (wsize), so it is not ideal. I found the easiest way to fix this was just to import the files to my mail server. It has the added bonus of setting the correct server name as well.

I realised while writing this that you could potentially skip the previous step and just import the mbox file here (replace maildir: with mbox: in the command). I have not tested this though.

The key here is to tell the importer to put the imported emails in a separate mailbox in your Maildir, so you can easily move them out. This should not affect the rest of your emails.

I placed my emails on the root of the filesystem and changed the ownership to the user used by Dovecot. Then I ran the import.

# doveadm -D import -u <EMAILUSER> maildir:/gmail/NewArchived/ \
NewArchived ALL

This imported all of my emails into a new mailbox in my Maildir called NewArchived. Once it finished I moved the folder (.NewArchived for me) out of my Maildir to my home directory.

I then made a tarball of the emails and imported them to my laptop, then extracted them, ready for the next step.

Category and Read Attributes

For this step we need to add :2,S to all the email files in the directory called new. The emails in the cur folder will already have this.

I had a lot of emails and I have an old laptop, so I just kicked off a find | xargs and went to bed. You could also achieve this with rename, but there are two flavours of it.

$ find . -type f | xargs -I '{}' mv '{}' '{}':2,S

Once this finishes, move all the files from new into cur, all your mails should be in this folder now, you can get rid of everything else except this folder at this point.

Timestamp Attribute

To fix the timestamps on our emails, there is another script we need to run. It is called maildir-timestamp-fix.sh available as a GitHub Gist.

I downloaded it an ran it on the cur folder. This might take a while if you have a lot of emails.

$ ./fix-timestamps.sh cur/

Importing files to Dovecot

I wanted to import all the emails from Gmail into an existing mailbox on my server, rather than a new one, so that is what I will describe here.

I made a tarball from the cur folder from the previous step and used scp to get it on to my mail server. I then extracted it on my server and changed the ownership of all the files to the user Dovecot runs as.

Then I moved all the files from the cur folder into the cur folder of the mailbox I wanted the files to end up in, .Archive in my case.

Recreating Dovecot index

I realised that emails were not showing up properly on my client however. Some new mails were not showing up despite being on the server and some general weirdness was going on.

To fix this all I did was delete the Dovecot files in the .Archive folder.

# rm dovecot*

Conclusion

If you followed the steps, all your emails should now load into your client in the correct order, both old and new. If you did not follow it step by step, then I hope at least the information and methods used were helpful in some way.