For many years, we’ve allowed retrieving of email from Hotmail accounts via the Pop Links screen. The system we use to retrieve the email uses the same system that Outlook Express uses (the httpmail protocol based on webdav). A couple of years ago, Microsoft announced that they would be disabling external Outlook Express access. It appears they’re process of actually disabling external access to accounts has been slow and haphazard. Most existing accounts created before the change in policy still work, but most newly created accounts do not.
On top of this, Microsoft seem to now be moving to a new system again. Windows Live Mail Desktop Beta seems to be using another new protocol and service endpoint again (http://mail.services.live.com/DeltaSync_v1.0.0/sync.aspx). It seems all new Live mail accounts and Hotmail accounts are able to access this new protocol, but all access via the old protocol to these accounts has been disabled.
From what I can tell, this new protocol isn’t actually documented anywhere, so I’ve spent some time trying to reverse engineer the new protocol so that we can retrieve email from these accounts. I came quite close and was able to authenticate, retrieve a list of folders and messages and even retrieve each message, but unfortunately the final message data is compressed in some format that I can’t find any documentation or example of. What this means is that I’ve currently reached a dead end and can’t see how we can retrieve hotmail messages using the new protocol :(
For others interested in helping work out what’s needed, here’s a summary of where I’ve got to:
1. You first have to get an authentication token by sending a request to https://login.live.com/RST.srf. An example of this is available here: http://msnpiki.msnfanatic.com/index.php/MSNP13:SOAPTweener. You use the endpoint mail.services.live.com.
2. Once you’ve got the authentication ticket, you use the DeltaSync endpoint. Using the retrieved ticket, you send a POST request to “http://mail.services.live.com/DeltaSync_v1.0.0/Sync.aspx?$ticket” to get a new set of sync tokens.
<?xml version="1.0" encoding="utf-8"?><Sync xmlns="AirSync:" xmlns:A="EMAIL:" xmlns:B="HMMAIL:" xmlns:C="HMFOLDER:" xmlns:D="HMSYNC:"><Collections><Collection><Class>Email</Class><SyncKey>0</SyncKey></Collection><Collection><Class>Folder</Class><SyncKey>0</SyncKey></Collection></Collections></Sync>
The lack of spaces and newlines seems to be important. You may also get a redirect, so follow that first if you do obviously. The returned result will have two SyncKey values, one for the Email class and one for the Folder class, just extract those sync keys.
3. Repeat the request in 2, but replace the “0” values with the SyncKeys returned from the request. You should get a complete list of emails and folders back.
4. You can then retrieve a particular email by sending a POST to “/DeltaSync_v1.0.0/ItemOperations.aspx” (use whatever host you were redirected to) with the following request.
<?xml version="1.0" encoding="utf-8"?><ItemOperations xmlns="ItemOperations:" xmlns:A="HMMAIL:"><Fetch><Class>Email</Class><A:ServerId>$ServerId</A:ServerId><A:Compression>hm-compression</A:Compression></Fetch></ItemOperations>
Replace $ServerId with one of the id’s returned in the list from step 3.
5. The returned result will be a DIME encoded result. Using a DIME parser, you find the payload with an id of “uuid:$ServerId” and get the content data
At this point, it seems we have the email content, but it seems to be compressed in some way, but I don’t know what and haven’t been able to identify it. In the request where we pass “<A:Compression>hm-compression</A:Compression>”, I’ve tried a number of different options. Removing the tags, leaving the text between empty, using the text none or raw, but always just get a fault code in response.
If anyone wants to do some experimentation and work out what’s going on here, email me at email@example.com with what you find out.
Of course there are other solutions to accessing hotmail and yahoo accounts that involve screen scraping, but screen scraping techniques are notoriously fragile. Every time Microsoft/Yahoo/etc change their web interface slightly, it can break the screen scraping software. Also there may be differences in language that may not be picked up, so it might work for some users but not others. So it’s a constant battle to keep it up to date and deal with the large number of support problems it can generate when something does break. Because of these reasons, we don’t intend to use screen scraping solutions.