Processing Bounced Email (in ColdFusion, For My Sins)

November 6th, 2010 Permalink

I have sent out my own email newsletter on a weekly basis since 2003 or so, and the list has grown to a few thousand subscribers over the years. I wrote and then tinkered with all of my own newsletter subscription, editing, sending, and bounce processing code back then - but in ColdFusion, of all things. In my defense, at the time it seemed like a valid choice as a platform to learn while working on a personal side project. If there is a single common theme to the technological progress of mankind over the past century, it is the eternal need to deal with the consequences of earlier choices. In this case the consequence is the looming rewrite for an open platform such PHP, a task I can't put off for too much longer.

The email environment has changed massively since 2003, and keeping up has been at times a frustrating journey of discovery for someone like myself, running a small list on my own software and sending through a variety of hosts over that time. Hosting companies typically have little patience for small customers who send more than personal email, and the towering ecosystems of email service providers, black hole lists, and large-scale spammers locked in a life-and-death struggle will cheerfully devour a well-intentioned amateur.

Fail to remove dead addresses in a timely manner? Black holed. Send a newsletter that contains a word or term that spammers have suddenly started using? Black holed. Fail to recognize a new form of delivery refusal from a lesser email provider? Black holed. When you're the little guy, it's no fun trying to find out why a major email provider such as Yahoo! or Gmail suddenly started blocking or bouncing your newsletter - and even less fun trying to obtain a resolution.

[This is a very different set of circumstances from, say, email delivery for a gaming company with tens of thousands of subscribers, hundreds of thousands of free players, and potentially millions of emails sent a month. Ensuring delivery under those conditions is vastly more costly, and you wind up having to develop and maintain good relations with all of the large email providers individually. Making sure that your customer emails show up where and when you want them to becomes an ongoing business process and a big time sink. It isn't a problem you can just solve and let run: the anti-spam ecosystem is so reactive and complex that there won't be a day gone by without delivery to some provider falling off the cliff for reasons that will never be completely clear. An organization with big delivery needs by necessity requires dedicated delivery staff to keep up, and has the economic means to do so.]

I will say that matters are much better now than they were back in 2006 - when Yahoo! started in on greylisting incoming mail, and a variety of other providers did similarly disruptive things in response to the growing spam deluge. It has been a couple of years now since I've had any major issues with delivery, a state of affairs which I ascribe to a combination of (a) the way in which my present host, Hosting.com, deals with list-scale outgoing email, and (b) a general improvement in the state of the fight against spam: better processes and better software.

But improvement or otherwise, if you are maintaining a newsletter yourself then processing and reacting to non-delivery reports - bounces - is critical to your ongoing ability to deliver email. If you don't process bounces well then you make yourself look like a spammer to the grinding, impersonal, massive systems that are expressly designed to rapidly destroy a spammer's ability to deliver email. You don't want that to happen, because once you're there it can be very hard to get out again, and you will make yourself persona non grata with your hosting provider. They don't want their mail servers and IP addresses marked as spam sources any more than you do.

When I send out a newsletter, I set the bounce address to a different email address from the one used to send the emails. Then as non-delivery reports turn up, they can be processed by retrieving mail from that account. I also add a unique identifying header value to each email, and store that value in a database table of sent mails. When I see that header in a non-delivery report email, I can match it up to the recipient and the specific newsletter sent to that recipient.

<cfset unique_mail_id = hash("salt-goes-here" & now() & recipient_id)>
<cfset unique_mail_id = replace(unique_mail_id, "4", "s", "ALL")>
<cfset unique_mail_id = replace(unique_mail_id, "5", "r", "ALL")>

<cfmail to="#recipient_email#"
    from="#header_from#"
    failto="newsletter-bounce@mydomain.com"
    subject="#header_subject#"
    server="#mail_server#"
    username="#mail_server_user#"
    password="#mail_server_password#"
    port="25"
    timeout="20">
<cfmailparam name="x-unique-mail-id" value="#unique_mail_id#">
#email_body_goes_here#
</cfmail>

...

<cfquery name="insertintosent" datasource="#datasource#">
insert into lm_mail_sent (
    unique_mail_id,
    recipient_id,
    newsletter_id
    ) values (
    '#unique_mail_id#',
    '#recipient_id#',
    '#reference_to_the_newsletter_sent_here#
)
</cfquery>

You'll notice I'm replacing some digits in the mail identifier - this is because the diagnostic code that should be provided in a bounce email per the RFCs typically looks like "4.5.1", "551", or similar, and will have to be parsed out. I don't want to make the process of identifying the type of delivery refusal any more complex than it already is by allowing my identifier to accidentally form a valid rejection code.

I use the following code to classify the non-delivery emails that show up after I send a newsletter, and thereby decide what to do next. It started with a consideration of the email RFCs that describe SMTP codes, and then grew a little beyond that as new and interesting departures from the RFCs turned up. One of the first things learned in writing your own bounce processing code is that the RFCs seem to be treated as more of a guideline than a set of hard and fast rules in the world of SMTP implementations - at least in those portions pertaining to rejected email, and thus also to spam.

<!---
   if you developed in the ColdFusion 5 - 7 era, you'll no doubt recall how
   horribly broken the default CFPOP tag was. It would die unpredictably
   when encountering UTF-7 encoding, for example. So everyone used
   the POP3 CFX tag instead.
--->
<cfx_pop3 server = "#pop_server#"
      username = "#bounce_account_login#"
      password = "#bounce_account_password#"
      action = "getAll"
      name = "bounced_emails"
      attachmentPath = "#attachment_path#"
      timeout = "20"
      maxrows = "#limit#"
      generateUniqueFilenames = "yes">

CFX_POP3 gives you a result you can loop over like a query, where each row contains data for one email: subject, body, path to temporarily stored attached files, and so forth. Parsing out the unique ID and the SMTP error code is the goal, but they might be anywhere: there is no real standard as to how providers choose to lay out a non-delivery report, and some seem to delight in being difficult. Within the loop, the search for meaning looks something like this:

<cfset unique_mail_id = "">
<!---
   look inside the message body elements for the
   identifying x-unique-mail-id header from the bounced email
--->
<cfif REFindNoCase("x-unique-mail-id.*?n", textBody)>
   <cfset sLenPos=REFindNoCase("x-unique-mail-id.*?\n", textBody, 1, "True")>
   <cfset matched = mid(textBody, sLenPos.pos[1], sLenPos.len[1])>
   <cfset unique_mail_id = trim(listlast(matched, ':'))>
<cfelseif REFindNoCase("x-unique-mail-id.*?\n", HTMLBody)>
   <cfset sLenPos=REFindNoCase("x-unique-mail-id.*?\n", HTMLBody, 1, "True")>
   <cfset matched = mid(HTMLBody, sLenPos.pos[1], sLenPos.len[1])>
   <cfset unique_mail_id = trim(listlast(matched, ':'))>
</cfif>

<!---
   If bounced emails have been attached as files, and I haven't
   yet found what I'm looking for, then I need to look inside them.
--->
<cfif not len(unique_mail_id) and len(attachments)>
   <cfloop index="file_path" list="#attachmentfiles#" delimiters="#Chr(9)#">

      <cffile action="read" FILE="#file_path#" VARIABLE="file_text">
      <cfif REFindNoCase("x-unique-mail-id.*?\n", file_text)>
         <cfset sLenPos=REFindNoCase("x-unique-mail-id.*?\n", file_text, 1, "True")>
         <cfset matched = mid(file_text, sLenPos.pos[1], sLenPos.len[1])>
         <cfset unique_mail_id = trim(listlast(matched, ':'))>
         <cfbreak>
      </cfif>

   </cfloop>
</cfif>

<cfif len(unique_mail_id)>
   <!---
      find the SMTP error code. Note that I'm mixing and matching codes
      from different RFCs here - which is not as horrible as you might
      think in practice for this sort of less rigorous use.
   --->
   <cfset smtp_error_code = "">
   <cfset smtp_regex = "[45].?[0-9].?[0-9]">
   <cfif REFindNoCase(smtp_regex, textBody)>
      <cfset sLenPos=REFindNoCase(smtp_regex, textBody, 1, "True")>
      <cfset smtp_error_code = mid(textBody, sLenPos.pos[1], sLenPos.len[1])>
   <cfelseif REFindNoCase(smtp_regex, HTMLBody)>
      <cfset sLenPos=REFindNoCase(smtp_regex, HTMLBody, 1, "True")>
      <cfset smtp_error_code = mid(HTMLBody, sLenPos.pos[1], sLenPos.len[1])>
   <cfelse>

      <cfloop index="file_path" list="#attachmentfiles#" delimiters="#Chr(9)#">
         <cffile action="read" FILE="#file_path#" VARIABLE="file_text">
         <cfif REFindNoCase("smtp_regex", file_text)>
            <cfset sLenPos=REFindNoCase("smtp_regex", file_text, 1, "True")>
            <cfset smtp_error_code = mid(file_text, sLenPos.pos[1], sLenPos.len[1])>
            <cfbreak>
         </cfif>
      </cfloop>

   </cfif>

   <cfif REFindNoCase("5.?5.?[0134]", smtp_error_code)>
      <!---
         hard bounce - most likely the recipient address no longer exists.
      --->
      <cfset bounce_action = 'remove'>
   <cfelseif REFindNoCase("5.?0.?[01234]", smtp_error_code)>
      <!---
         hard bounce - forms of server error. Rare nowadays. If one of these
         shows up a few times in a row from a small domain, write them off.
      --->
      <cfset bounce_action = 'remove after three'>
   <cfelseif REFindNoCase("[45].?5.?[2]", smtp_error_code)>
      <!---
         the soft and hard bounce forms of out of space error. A lot of the more
         legitimate mass mailers treat both as a soft bounce, but I treat both as
         hard bounces and remove the recipient.
      --->
      <cfset bounce_action = 'remove'>
   <cfelseif REFindNoCase("4.?5.?1", smtp_error_code)>
      <!---
         Most likely greylisting. Wait an hour or two and resend these mails.
      --->
      <cfset bounce_action = 'resend soon'>
   <cfelseif (
      (
         REFindNoCase("yahoo", textBody)
         or REFindNoCase("yahoo", HTMLBody)
      ) and (
         REFindNoCase("Resources temporarily unavailable", textBody)
         or REFindNoCase("Resources temporarily unavailable", HTMLBody
      ))>
      <!---
         Early Yahoo! greylisting - it was more reliable to grep for the
         text than to figure things out from the codes (or lack of codes)
         at one point. This was a real pain when they were first iterating
         their greylisting techniques. I can imagine the gnashing of teeth in the
         delivery groups for large companies at the time.
      --->
      <cfset bounce_action = 'resend soon'>
   <cfelseif REFindNoCase("4.?5.?0", smtp_error_code)>
      <!---
         Varying forms of soft bounce. I'll try resending once.
      --->
      <cfset bounce_action = 'resend soon'>
   <cfelseif REFindNoCase("4.?2.?2", smtp_error_code)>
      <!---
         A less commonly seen form of out of space error.
      --->
      <cfset bounce_action = 'remove'>
   <cfelseif REFindNoCase(smtp_regex, smtp_error_code)>
      <!---
          Everything else. In practice, I see "everything else" rarely enough that
          I can afford not to pay more careful attention. That wouldn't be the case
          if I was sending more mail. If I keep seeing errors from a given address,
          I drop it.
      --->
      <cfset bounce_action = 'remove after three'>
   <cfelse>
      <!---
          We didn't actually find a code. This happens. You'd be surprised at
          what turns up in an account supposedly restricted to bounce notices.
          But better safe than sorry - if we see a few of these oddities, remove
          the recipient from the list.
      --->
      <cfset bounce_action = 'remove after three'>
   </cfif>

   <!---
      update the database if we have something to update it with
   --->
   <cfif len(smtp_error_code)>
      <cfquery name="mail" datasource="#datasource#">
         UPDATE lm_mail_sent
         SET bounce_action = '#bounce_action#',
         SET smtp_code = '#smtp_error_code#'
         WHERE unique_id = '#unique_mail_id#'
      </cfquery>
   </cfif>

</cfif>

At the simplest level of consideration, there are "hard" bounces, permanent error codes in the 500s, and "soft" bounces, temporary error codes on the 400s. A hard bounce implies that you should remove the recipient from your list, while a soft bounce implies that you should resend later or take no action and keep the recipient on your list. This is, however, an oversimplification. Permanent error codes generated after mail server errors are very rare for major providers, and don't reflect on the viability of the recipient address when they do occur. Temporary error codes include out of space messages - a legacy of past years in which storage space was at a premium, but most accounts would have to be abandoned for a long time to generate an out of space bounce nowadays.

Once you have identified the type of bounce notification received, you essentially have three options: (a) resend the mail later (for some definition of "later"), (b) remove the recipient from future mailings, or (c) do nothing. In practice, what you should do with a given type of bounce varies by provider - you don't treat gmail.com addresses the same as yahoo.com addresses, and both are treated differently from bob@bobsownsmalldomain.com. The popular wisdom is that one should also give some consideration to the past bounce history of the recipient, which again is more or less important for different providers. If you see a soft bounce - such as code 452, an indication that the recipient's mailbox is full - then don't resend the present mail but keep the recipient on your list. If it happens again, remove them the next time around (or the third, fourth, or fifth, depending on how optimistic you are, and how tolerant of such retries their email provider is).

Personally, I just remove recipients that return an out of space non-delivery code the first time it happens - in this day and age of rapidly increasing storage space, it seems safe to assume an account with no space has been defunct and ignored for a while. For my needs in any case; yours will of course be different.

All in all, the handling of non-delivery reports and consequent gardening of your email list is a deep rabbit hole to explore - deliverability of email is a specialized domain with it's own esoteric lore and body of practice, all of which is changing rapidly year after year. Still, if you run your own newsletter with your own bounce processing code, then you're going to be heading on down that hole. Look on it as a learning experience, one which might make you more sympathetic towards the prices charged by email marketing companies.