Migrating email from Gmail to Maildir (Dovecot)
Migrating email from Gmail to Maildir (Dovecot)
In this post I explain how I moved my email from Gmail to my own mail server running Dovecot with the emails in Maildir format.
Introduction
This is not the only way to do this. This might not even be the best way, but it worked for me and I had full control over each step, as well as certainty that the mails were imported in a way that made them (almost) impossible to differentiate from emails already on the server.
Given how specific this topic is, I am not going to give an introduction to the concepts referenced in this post.
The Maildir Format
I highly recommend reading up on the Maildir format to understand the directories and what each part of the filename does. Dovecot have a good explanation on their website (archive link).
In case you do not, I do briefly explain the bits that are relevant to the migration.
Note About IMAP Method
There is another way documented on Dovecot's website on how to do this with
doveadm
and IMAP. I did not try this.
Make a Backup
Before doing anything, make a backup of the Maildir on your server, I take no responsibility for anything going wrong. I found most of this process quite scary, but knowing it is easy to extract a tarball to restore a backup brings peace of mind.
Exporting From Gmail
The first step of this journey is to get our emails out of Gmail. Google makes this easy with their Takeout feature. It lets you download your emails in the mbox format, one file per Gmail category.
Head over there and export your emails, I will not explain the exact steps here as it is fairly straightforward and it might change with time. If the above link is broken, Google (or duck) "Google Takeout".
For the remainder of the steps, I recommend doing one mbox file at a time.
Converting to Maildir
There is a script called mb2md.pl currently available on Dovecot's GitHub tools section.
This is an unmaintained script and can be found in various places online. It is also available in some Linux distribution packages. As of writing this the easiest way is to grab it off the links above.
All this script does is convert an mbox file to Maildir. I had to look at the source to figure out how to use it. The top of the file has a pretty useful section of comments explaining everything.
The main thing to be aware of is that the directories you pass in are relative
to your user's home directory. All I needed was to give it a source and
destination directory, and I threw in -R
to make it recursively search
through the source folder.
I created a folder called input
where I put my mbox file, and an empty folder
called output
for the script to place the Maildir files. Then I ran the
script.
$ ./mb2md.pl -s downloads/gmail/Takeout/input -R -d \
downloads/gmail/Takeout/ouput
Preparing Maildir Files
Now we have our emails as Maildir files, there are a few things missing. The
conversion script does not provide us with any size attributes normally present
in Maildir email files, these are the S and W attributes. Normally, Maildir
files have the server name in them as well, whereas the script gives us a
placeholder, mbox
.
We are also missing the category which is the number towards the end of the
file, this (usually) is 2
in my case. Given the emails are old, you will
probably want them marked as read, which is done with an S
at the end of the
filename.
Finally, all the emails will have the UNIX timestamp of the time you ran the conversion script. This results in Dovecot serving you up the emails as if they were received today (unless your client is smart enough maybe).
Size and Server Name Attributes
There is a script on GitHub to add the S
(size) attribute, but it does not
add W
(wsize), so it is not ideal. I found the easiest way to fix this was
just to import the files to my mail server. It has the added bonus of setting
the correct server name as well.
I realised while writing this that you could potentially skip the previous step
and just import the mbox file here (replace maildir:
with mbox:
in the
command). I have not tested this though.
The key here is to tell the importer to put the imported emails in a separate mailbox in your Maildir, so you can easily move them out. This should not affect the rest of your emails.
I placed my emails on the root of the filesystem and changed the ownership to the user used by Dovecot. Then I ran the import.
# doveadm -D import -u <EMAILUSER> maildir:/gmail/NewArchived/ \
NewArchived ALL
This imported all of my emails into a new mailbox in my Maildir called
NewArchived
. Once it finished I moved the folder (.NewArchived
for me) out
of my Maildir to my home directory.
I then made a tarball of the emails and imported them to my laptop, then extracted them, ready for the next step.
Category and Read Attributes
For this step we need to add :2,S
to all the email files in the directory
called new
. The emails in the cur
folder will already have this.
I had a lot of emails and I have an old laptop, so I just kicked off a find |
xargs
and went to bed. You could also achieve this with rename
, but there
are two flavours of it.
$ find . -type f | xargs -I '{}' mv '{}' '{}':2,S
Once this finishes, move all the files from new
into cur
, all your mails
should be in this folder now, you can get rid of everything else except this
folder at this point.
Timestamp Attribute
To fix the timestamps on our emails, there is another script we need to run. It is called maildir-timestamp-fix.sh available as a GitHub Gist.
I downloaded it an ran it on the cur folder. This might take a while if you have a lot of emails.
$ ./fix-timestamps.sh cur/
Importing files to Dovecot
I wanted to import all the emails from Gmail into an existing mailbox on my server, rather than a new one, so that is what I will describe here.
I made a tarball from the cur
folder from the previous step and used scp
to
get it on to my mail server. I then extracted it on my server and changed the
ownership of all the files to the user Dovecot runs as.
Then I moved all the files from the cur
folder into the cur
folder of the
mailbox I wanted the files to end up in, .Archive
in my case.
Recreating Dovecot index
I realised that emails were not showing up properly on my client however. Some new mails were not showing up despite being on the server and some general weirdness was going on.
To fix this all I did was delete the Dovecot files in the .Archive
folder.
# rm dovecot*
Conclusion
If you followed the steps, all your emails should now load into your client in the correct order, both old and new. If you did not follow it step by step, then I hope at least the information and methods used were helpful in some way.