Mimsy Were the Borogoves

Hacks: Articles about programming in Python, Perl, PHP, and whatever else I happen to feel like hacking at.

Using appscript with Apple Mail to get emails and attachments

Jerry Stratton, October 16, 2011

I had a need recently to watch over a folder in Apple Mail, looking for incoming messages and saving attachments. Apparently, there used to be an easier way of getting attachments via appscript, but something in Snow Leopard changed it.

The core of this script is going to be grabbing the messages of a known folder in OS X Mail, and then looping through them.

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import appscript
  • mail = appscript.app(u'Mail')
  • mailbox = mail.mailboxes[u'Archives'].mailboxes[u'Alarms']
  • messages = mailbox.messages[(appscript.its.read_status == False).AND(appscript.its.deleted_status == False)]
  • unPublishedMessages = messages.get()
  • #loop through all unread messages
  • for message in unPublishedMessages:
    • print message.subject.get()

This very simple script just loops through all unread messages1 in the “Alarms” folder of the “Archives” folder, and prints the subject.

Here’s a longer version that gets a bit more information about each message:

[toggle code]

  • #!/usr/bin/python
  • # -*- coding: utf-8 -*-
  • import appscript
  • import sys
  • from optparse import OptionParser
  • parser = OptionParser()
  • parser.add_option('-c', '--commit', help='commit action with mails', action='store_true')
  • (options, args) = parser.parse_args()
  • mail = appscript.app(u'Mail')
  • mailbox = mail.mailboxes[u'Archives'].mailboxes[u'Alarms']
  • messages = mailbox.messages[(appscript.its.read_status == False).AND(appscript.its.deleted_status == False)]
  • messageCount = messages.count(each=appscript.k.item)
  • if not messageCount:
    • print 'No messages to process.'
    • sys.exit()
  • unPublishedMessages = messages.get()
  • #make sure the earliest ones are done first, in case they reference each other
  • def messageSortKey(message):
    • return message.date_sent.get()
  • unPublishedMessages.sort(key=messageSortKey)
  • class MessageParser(object):
    • def __init__(self, message):
      • self.message = message
      • self.sender = message.sender.get()
      • self.subject = message.subject.get()
      • self.dateSent = message.date_sent.get()
      • self.body = message.content.get()
      • self.messageId = message.message_id.get()
      • self.replyTo = None
    • def parse(self):
      • #get rid of image placeholders
      • self.body = self.body.replace(unichr(65532), '')
      • #get the Id of the message this is in reply to
      • self.parseReplyTo()
    • def parseReplyTo(self):
      • replyTo = self.message.headers[u'in-reply-to']
      • if replyTo.exists():
        • self.replyTo = replyTo.content.get()
  • #loop through all unread messages
  • for message in unPublishedMessages:
    • #make sure the message is still untouched
    • if message.read_status.get() == True:
      • #exit the loop--another process has already started up and is dealing with this and subsequent mails
      • break
    • #mark it as read so that if we somehow take too long, the next process doesn't grab this one
    • if options.commit:
      • message.read_status.set(True)
    • post = MessageParser(message)
    • post.parse()
    • print post.subject
    • print post.dateSent.strftime('%Y-%m-%d %H:%M:%S')
    • print post.sender
    • print post.messageId
    • if post.replyTo:
      • print 'In reply to', post.replyTo
    • print

It will print out the subject, the time the message was sent, the sender, and the message’s unique id. If there is a Reply-To: id, it will print that as well.

It also marks each message as read as it’s looping through them, so that if this were a cron job the next time around the cron job won’t try to handle the same message.

For testing purposes, it only marks messages as read if called with “--commit”.

Another thing it does is get rid of image placeholders. When grabbing the body of the message from Mail, Mail does not return the attachments. But it does tell us where the attachments were using the unicode “object replacement character”. You might find them useful if you’re going to reformat the mail into a web page, but for now I’m going to get rid of them.

The next step is to start extracting the attachments. Let’s make an Attachment class to handle them.

[toggle code]

  • class Attachment(object):
    • def __init__(self, attachment):
      • self.attachment = attachment
      • #attachment information
      • properties = attachment.properties()
      • #filename info
      • self.file = properties[appscript.k.name]
      • self.fileName, self.fileExtension = os.path.splitext(self.file)
      • if self.fileExtension:
        • self.fileExtension = self.fileExtension[1:]
      • #filetype info
      • self.fileType = properties[aem.AEType('attp')]
      • self.fileKind, self.fileFormat = self.fileType.split('/')
      • self.path = None
    • def save(self, directory):
      • #save attachment
      • self.makePath(directory)
      • destination = appscript.mactypes.File(self.path)
      • mail.save(self.attachment, in_=destination)
    • def makePath(self, directory):
      • path = os.path.join(directory, self.fileName) + '.' + self.fileExtension
      • counter = 1
      • while os.path.exists(path):
        • path = os.path.join(directory, self.fileName + '-' + str(counter)) + '.' + self.fileExtension
        • counter = counter + 1
      • self.path = path

This will accept an attachment (which we’ll get to in a moment) and then extract properties from it, such as its MIME type, name, and extension. It creates a unique filename for it in the given directory, turns that path into an AppleScript path, and then tells Mail to save the attachment to that path.2

In order to use this, replace the top two “import” lines with:

  • import appscript, aem
  • import os, sys

Initialize a blank attachment list in the __init__ method of MessageParser:

[toggle code]

  • def __init__(self, message):
    • self.replyTo = None
    • self.attachments = []

Add two new methods to MessageParser, one to parse out the attachments and make a list of them, and another to save the attachments to a specified directory:

[toggle code]

  • def parseAttachments(self):
    • #it appears that this is supposed to work, but it doesn't
    • #attachments = self.message.mail_attachments.get()
    • attachments = self.message.AS_aemreference.property('attc')
    • for attachment in mail.AS_newreference(attachments).get():
      • self.attachments.append(Attachment(attachment))
  • def saveAttachments(self, directory):
    • for attachment in self.attachments:
      • attachment.save(directory)

From things I’ve seen on the net, this used to be much easier: it used to be possible to get attachments the same as getting anything else from the message, by using, in this case, message.mail_attachments.get(). That doesn’t work now, however, so we have to go into a lower-level API to access them. That’s what “AS_aemreference.property('attc')” is doing, and why we had to import “aem” along with appscript.

And in the “parse” method of MessageParser, call the parseAttachments method after the other two sections:

[toggle code]

  • def parse(self):
    • #are there any attachments?
    • self.parseAttachments()

Finally, in the “loop through all unread messages” loop, add a call to saveAttachments, giving it a directory to save the attachments in:

  • post = MessageParser(message)
  • post.parse()
  • post.saveAttachments('/path/to/destination/folder')

Any attachments on unread messages in your chosen OS X Mail folder should now show up in that folder when you run the script. If you run the script more than once without committing, it will start appending ‘-1’, ‘-2’, etc., to the filenames.

I’m going to talk later about getting the dimensions of attached images, but if you want to get started early, take a look at /usr/bin/sips.

October 20, 2011: Image dimensions and orientation in Mac OS X Python

Often you’ll want to get the height and width of attachments that are images. I was able to find two command-line programs on Mac OS X that will provide an image’s dimensions: /usr/bin/sips and /usr/bin/mdls. “SIPS” stands for Scriptable Image Processing System, where “mdls” is for listing metadata.

In my tests, sips was more accurate: the metadata appears to not get filled out immediately. It’s fast enough for humans, but not fast enough for scripts. The height and width metadata were empty when using mdls in the script, whereas sips was always able to provide it.

Add these two methods to the Attachment class:

[toggle code]

  • def dimensions(self):
    • if not self.isImageSaved():
      • return None, None
    • #get dimensions from sips
    • dimensionArgs = ['/usr/bin/sips', '-g', 'pixelHeight', '-g', 'pixelWidth', self.path]
    • sips = subprocess.Popen(dimensionArgs, stdout=subprocess.PIPE)
    • dimensions = sips.stdout.read()
    • dimensions = re.findall(r'pixel(Height|Width): ([0-9]+)', dimensions)
    • if len(dimensions) == 2:
      • label, height = dimensions[0]
      • label, width = dimensions[1]
      • height = int(height)
      • width = int(width)
      • if height and width:
        • return width, height
    • return False, False
  • def isImageSaved(self):
    • if not self.path:
      • print 'Attachment', self.file, 'has not yet been saved.'
      • return False
    • if self.fileKind != 'image':
      • print 'Attachment', self.path, 'is not an image:', self.fileKind
      • return False
    • return True

The “isImageSaved” method just checks to make sure that the attachment has been saved and it is in fact an image. The “dimensions” method returns width, height if it can find it, using “/usr/bin/sips -g pixelHeight -g pixelWidth image path”. It uses a regular expression to parse the response from sips. Pretty basic stuff.

At the bottom of the script, in the “for message in unPublishedMessages:” loop, add an attachments loop:

[toggle code]

  • if post.replyTo:
    • print 'In reply to', post.replyTo
  • if post.attachments:
    • print 'Attachments:'
    • for attachment in post.attachments:
      • print "\t", attachment.path, attachment.dimensions()
  • print

Not too bad. You could use this to do further things with the attachments, depending on what kind of files they are, how big they are, etc.

  1. And undeleted—deleted messages can sometimes hang around for a while in AppleScript requests, so I’m specifically requesting undeleted messages here.

  2. Note that this version of the script trusts that the file’s name is safe. You’ll probably want to run it through some safety filter first; I use Django’s slugify function, because I’m using this as part of a Django app.

  1. <- fix_ampersands & abbreviations
  2. Tomato ->