How to Write an Email Bounce Processing Module for Drupal

February 17th, 2011 Permalink

Update for 6/15/2012: The original post below outlines the construction of a Drupal 6 module. A completed Drupal 7 module called Bounce ultimately emerged from the notes here, however. Bounce is a full-featured and extensible way to manage non-delivery reports generated by outgoing email - take a look if the subject interests you.


How much email is your Drupal website sending out into the world? If you are using the Notifications module, the answer to that question might be "a lot." Many other modules also send out a fair few emails to users in the course of their operation. When you find yourself in the business of sending email, it is a very good idea to make sure that you respect the bounces - the non-delivery reports - otherwise you'll soon enough find yourself persona non grata. You look like a spammer when you continue sending emails to addresses that bounce, and that means that eventually all sorts of automated systems will step in and start to cut off your ability to deliver any email.

Here, I'll walk through the components of a hypothetical module for Drupal 6 that will keep track of non-delivery reports and prevent mail from being sent to address that are generating too many of the wrong sort of non-delivery report.

What You'll Need to Set Up First

You will need to have the following items in hand:

  • A Drupal 6 test instance - meaning you have some sort of functioning PHP development environment, Apache server, and MySQL database up and running.
  • The IMAP package installed to provide PHP functions for connecting to IMAP and POP3 servers.
  • The toolkit code from a previous post for quick and easy POP3 usage.
  • An email account for receiving non-delivery reports. Let's call it drupal-bounce@mydomain.com.
  • The ability for the server running your test Drupal site to be able to connect to a POP3 service on the mail server that hosts drupal-bounce@mydomain.com.

Most people who are writing modules from scratch will probably already be working with a registered domain that has its own web and mail server, and for which the firewalls are suitably configured to allow traffic between the two. A bounce processing module as described here will still work under other circumstances, such as email provided by a domain registrar and a shared hosting service, but you may have to do some more legwork to make sure that everything can talk to everything else.

Create Your Module

Let's call the module "bounce_processing." You'll create a folder /sites/all/modules/bounce_processing, and put three files in there:

1. bounce_processing.info

name = Bounce Processing
description = Prevents mail from being sent to bouncing email addresses.
core = 6.x

2. bounce_processing.install

<?php

/**
 * Implementation of hook_install().
 */
function bounce_processing_install() {
   drupal_install_schema('bounce_processing');
}

/**
 * Implementation of hook_uninstall().
 */
function bounce_processing_install() {
   drupal_uninstall_schema('bounce_processing');
}

/**
 * Implementation of hook_schema().
 */
function bounce_processing_schema() {
   // TODO: fill this out with the desired schema definitions
}

3. bounce_processing.module

<?php
   // nothing here yet!

You'll be adding code to this bare minimum as you move forward - the first thing you'll want to do is copy in the POP3 access toolkit, as you'll be using those functions later.

Set the Return-Path Mail Header

An email address for non-delivery reports is only useful if other mail servers know about it. In order for this to happen, every email sent out must have the Return-Path header set to show your drupal-bounce@mydomain.com address. This header is quite different in function from the From, Reply-To, and other address-related headers, and after you have set the Return-Path header, those other headers will remain as they were before. Most mail clients will not show the Return-Path header value in their normal displays, and it is only used to determine where to send non-delivery reports.

Setting the Return-Path header is accomplished via hook_mail_alter in your bounce_processing.module file, as shown below:

/**
 * Implementation of hook_mail_alter().
 */
function bounce_processing_mail_alter(&$message) {
   $return_path = variable_get('bounce_processing_address');
   if( $return_path ) {
      $message['headers']['Return-Path'] = $return_path;
   }
}

We would like to be able to configure the module in the administration interface and enter the address of the account used there. That will require a menu entry for a configuration page:

/**
 * Implementation of hook_menu().
 */
function bounce_processing_menu() {
   $items['admin/settings/bounce_processing'] = array(
      'title' => 'Bounce Processing Settings',
      'description' => 'Configure settings for the Bounce Processing module.',
      'page callback' => 'drupal_get_form',
      'page arguments' => array('bounce_processing_settings'),
      'access arguments' => array('administer site configuration'),
   );
   return $items;
}

/**
 * A configuration form for module settings.
 */
function bounce_processing_settings() {
   $form['bounce_processing_account'] = array(
      '#type' => 'textfield',
      '#title' => t('Non-Delivery Report Address'),
      '#description' => t('Email address to receive non-delivery reports.'),
      '#default_value' => variable_get('bounce_processing_address'),
   );
   return system_settings_form($form);
}

To be neat and tidy we should remove the variable when the module is uninstalled, and we do this by adding a line to the implementation of hook_uninstall():

/**
 * Implementation of hook_uninstall().
 */
function bounce_processing_install() {
   drupal_uninstall_schema('bounce_processing');
   variable_del('bounce_processing_account');
}

Nice and simple, yes? Well, unfortunately that isn't quite the case. As things stand, you may run into issues.

Return-Path Complications

Depending on how your Drupal instance is set up to send mail, any Return-Path header set in PHP may get overwritten by the mail software on the server (e.g. sendmail) or by a mail server en route to the final destination. The most common circumstance, where you are sending using the PHP mail() function, is almost certainly going to see you losing the Return-Path set in the code above.

Here are two straightforward approaches to work around this issue:

  • If you install and configure the SMTP Authentication Support module then Drupal will send mail directly to an outgoing mail server. If you're still losing the Return-Path header at that point, then the problem is a mail server configuration issue, not a Drupal issue.
  • Install the Return-Path module, which tweaks the way in which Drupal talks to sendmail. This may or may not work for you, depending on the exact details of your environment and needs for outgoing mail.

Set Up the Database Schema

Processing bounce emails generates a fair amount of data, and we'll need somewhere to store that data while working on it. The following SQL for the MySQL database outlines the needed tables. Converting it into a Drupal schema definition is left as an exercise for the reader, with a cautionary note that the specific size of those mediumtext columns does in fact matter. I've written it up as SQL here because the schema definitions are neither compact nor easy to read at a glance.

# first of all, we have to keep track of who has been
# removed from receiving further email

create table bounce_processing_addresses (
   mail varchar(255) not null,
   ut_created int(11) not null,
   primary key(mail)
)

# we need to mark each sent mail with a unique header,
# and keep a record of that header to simplify the later
# identification of the address that is generating non-delivery reports.

create table bounce_processing_mails (
   mail_id int not null auto_increment,
   header_id varchar(255) not null,
   mail varchar(255) not null,
   ut_created int(11) not null,
   primary key(mail_id),
   key(header_id),
   key(mail),
   key(ut_created)
);

# each incoming non-delivery report must be analyzed, and
# the records of that analysis kept to decide whether we need
# to stop sending to specific users.

create table bounce_processing_analysis (
   analysis_id int not null auto_increment
   mail varchar(255) not null,
   smtp_code varchar(32) not null,
   ut_created int(11) not null,
   primary key(analysis_id),
   key(mail),
   key(analysis),
   key(ut_created)
);

# to help decide when to stop sending mail to an address,
# we keep a scorecard for different types of non-delivery
# report. The rows inserted into this table are far from
# a complete list of codes and types, but I find it does
# cover 99% of what is likely to show up at the door.

create table bounce_processing_scores (
   smtp_code varchar(32) not null,
   description varchar(255) not null,
   score int not null,
   primary key(analysis)
);
insert into bounce_processing_scores (smtp_code, score, description) values
('421', 0, 'Probably greylist response'),
('422', 0, 'Server out of space error'),
('450', 0, 'Varying soft bounces'),
('451', 0, 'Greylist response'),
('452', 50, 'Soft account out of space bounce'),
('454', 0, 'Temporary server error'),
('500', 25, 'Server error'),
('501', 25, 'Server error'),
('502', 25, 'Server error'),
('503', 25, 'Server error'),
('504', 25, 'Server error'),
('550', 50, 'Definitive hard bounce'),
('551', 50, 'Definitive hard bounce'),
('553', 50, 'Definitive hard bounce'),
('554', 50, 'Definitive hard bounce'),
('552', 50, 'Hard account out of space bounce'),
('4.1.1', 25, 'No such mailbox'),
('4.2.1', 25, 'Mailbox disabled'),
('4.2.2', 25, 'Mailbox full'),
('4.7.1', 0, 'Greylist response'),
('5.1.1', 50, 'No such mailbox'),
('5.2.1', 50, 'Mailbox disabled'),
('5.2.2', 50, 'Mailbox full');

Some personal opinion is injected into the scores in the bounce_processing_scores table. For example, out of space errors should be very rare in this age of massive and cheap hard drives - accordingly it's the sign of a long-abandoned account, rather than a transient error to be ignored. Similarly, it is my opinion that the many SMTP response codes missing from the list above are largely unimportant if all you care about is not looking like a spammer.

Mark Outgoing Mail With Unique Headers

Adding a unique identifying header to each outgoing mail greatly simplifies the later identification of the offending address. A correctly functioning mail server should include the headers of the original email in its response, so it is possible to parse out the identifier. You could just parse email addresses from the text and check them against local records of sent mail, but there will be more than one email address in a non-delivery report, so it is a slower and less efficient process. Which is not to mention that the returned copy of the message itself might contain unrelated email addresses included by the original sender.

To add the header and store the data, we'll extend the implementation of hook_mail_alter() a little more:

/**
 * Implementation of hook_mail_alter().
 */
function bounce_processing_mail_alter(&$message) {
   $return_path = variable_get('bounce_processing_account');
   if( $return_path ) {
      $message['headers']['Return-Path'] = $return_path;

      // extract emails from the "to" string
      $emails = _bounce_processing_emails_from_text($message['to']);

      // if there's more than one email we can't set the same header for all
      // of them - so just record them with no header.
      if( count($emails) == 1 ) {
         $unique = gen_uuid();
         // trivial replacements to avoid issues with the
         // search for SMTP response codes.
         $unique = str_replace('4', '!', $unique);
         $unique = str_replace('5', '?', $unique);
         $message['headers']['x-bounce-identifier'] = $unique;
      } else {
         $unique = '';
      }

      foreach( $email as $email ) {
         $sql = "insert into {bounce_processing_mails}"
               . " (header_id, mail, ut_created) values ('%s', '%s', '%d')";
         db_query($sql, $unique, $email, time());
      }
   }
}

Write the Processing Code

What should the core of the module actually be doing? Non-delivery report processing has the following components:

  • Retrieve an email from the Return-Path account.
  • Link the email to a specific user.
  • Identify what sort of non-delivery report it is.
  • Write the appropriate data to the database.
  • Decide whether to mark the user to prevent future delivery of email.

The following functions run through those steps. They make use of the POP3 toolkit I mentioned earlier, with the pop3_login function modified in the obvious way to accept parameters in place of other configuration methods.

/**
 * Empty out the bounced mail account and examine the contents.
 * Then record the information discovered, and take action as needed.
 */
function bounce_processing_run_processing() {
   $host = variable_get('bounce_processing_mail_server');
   $ssl = variable_get('bounce_processing_is_ssl', true);
   $folder = variable_get('bounce_processing_folder', 'INBOX');
   $user = variable_get('bounce_processing_account');
   $pass = variable_get('bounce_processing_account_password');
   $limit = variable_get('bounce_processing_limit_per_run', 50);

   if( !pop3_login($host, $ssl, $folder, $user, $pass) ) {
      $error = 'POP3 Login Failure: ' . imap_last_error();
      watchdog('bounce_processing', $error);
      return;
   }
   $info = pop3_get_mailbox_info();
   if( $info['Nmsgs'] ) {
      $limit = min($limit, $info['Nmsgs']);
      for( $message_id = 1; $message_id <= $limit; $message_id++ ) {
         // note that the message will be an array of data, with the
         // first element being headers we can largely ignore
         $message = pop3_get_message($message_id);
         pop3_mark_message_for_deletion($message_id);
         _bounce_processing_process_message($message);
      }
   }
   pop3_logout();

   // take action as required by the data we've gathered.
   bounce_processing_mark_users();

   // lastly, clear out old data we're not going to use any more
   bounce_processing_clean_old_data();
}

/**
 * Analyze the contents of one non-delivery report email.
 */
function _bounce_processing_process_message($message) {
   $analysis = array(
      'mail' => 0,
      'smtp code' => '',
   );

   if( !is_array($message) || count($message) < 2 ) {
      return $analysis;
   }

   // the first element of the array will be headers for the non-delivery report mail
   // these don't generally reveal anything helpful
   array_shift($message);

   // now loop through the rest of the mail in search of information.
   foreach( $message as $part ) {

      // first see if we can find the unique header ID for a given mail

      if( empty($analysis['mail']) ) {
         $match = array();
         $matched = preg_match(
            '/x-bounce-identifier.?:(.*)\r?\n/',
            $part['data'],
            $match
         );
         if( $matched ) {
            $header_id = trim($match[1]);
            $sql = "select mail from {bounce_processing_mails}"
                    . " where header_id = '%s'";
            $result = db_query($sql, $header_id);
            if( $row = db_fetch_array($result)) && $row['mail'] ) {
               $analysis['mail'] = $row['mail'];
            }
         }
      }

      // if we don't have that, then we'll try identifying the user the hard way,
      // by examining all of the email addresses we can find.

      if( empty($analysis['mail']) ) {
         $emails = _bounce_processing_emails_from_text($part['data']);
         foreach( $emails as $email ) {
            // note that this may pick up mails you don't want it to - such as the
            // Return-Path account, or system accounts you add to mail footers.
            // Think about adding an if-statement here to exclude those accounts.
            $email = strtolower($email);
            $sql = "select count(1) as c from {bounce_processing_mails}"
                    . " where mail = '%s'";
            $result = query($sql, $email);
            if( $row = db_fetch_array($result)) && $row['c'] ) {
               $analysis['mail'] = $email;
               break;
            }
         }
      }

      // lastly, we try to find out what sort of non-delivery report this is.
      // we'll be using smtp codes, plus some other general categories

      if( empty($analysis['smtp code']) ) {
         $code = _bounce_processing_smtp_code_from_text($part['data']);
         $analysis['smtp code'] = $code;
      }
   }

   // store the results if something was found
   if( $analysis['mail'] && $analysis['smtp code'] ) {
      $sql = "insert into {bounce_processing_analysis}"
            . " (mail, smtp_code, ut_created) values ('%d', '%s', '%d')";
      db_query($sql, $analysis['mail'], $analysis['smtp code'], time());
   }

   return $analysis;
}

/**
 * Helper function to parse out an emails from text.
 */
function _bounce_processing_emails_from_text($text) {
   $matches = array();
   preg_match_all(
      "/([A-Z0-9._%-]+@[A-Z0-9.-]+.[A-Z]{2,4})/i",
      $part['data'],
      $matches
   );
   return $matches[1];
}

/**
 * Helper function to parse out an SMTP response code from text.
 */
function _bounce_processing_smtp_code_from_text($text) {
   // rfc821 return code e.g. 550
   $matches = array();
   if( preg_match('/b([45][01257][012345])b/', $text, $matches) ) {
      return $matches[1];
   }
   // rfc1893 return code e.g. 5.1.1
   if( preg_match('/([45].[01234567].[012345678])/', $text, $matches)){
      return $matches[1];
   }
   return '';
}

/**
 * Run through the non-delivery report data and see if we have to mark
 * any users to prevent future delivery. We are doing this by counting up
 * past reports within a given time frame and marking users who have a
 * high enough total.
 */
function bounce_processing_mark_users() {
   $trigger_score = variable_get('bounce_processing_trigger_score', 50);
   $trigger_age = variable_get('bounce_processing_trigger_days', 100);
   $trigger_age = $trigger_age * 60 * 60 * 24; // convert to seconds

   $sql = "select tmp.mail from"
      . " ("
      . " select sum(s.score) as score, a.mail"
      . " from {bounce_processing_analysis} a"
      . " inner join {bounce_processing_scores} s using (smtp_scores)
      . " where a.ut_created > unix_timestamp() - %d"
      . " group by a.mail"
      . " ) tmp"
      . " where tmp.score >= '%d'";
   $result = db_query($sql, $trigger_age, $trigger_score);
   while( $row = db_fetch_array($result) ) {
      $sql = "replace into {bounce_processing_addresses} (mail, ut_created)"
            . " values ('%s', '%d')";
      db_query($sql, $row['mail'], time());
      $sql = "delete from {bounce_processing_analysis} where mail = '%s'";
      db_query($sql, $row['mail']);
   }
}

/**
 * Clear out the old data that we're no longer using.
 */
function bounce_processing_clean_old_data() {
   $trigger_age = variable_get('bounce_processing_trigger_days', 100);
   $trigger_age = $trigger_age * 60 * 60 * 24; // convert to seconds

   $sql = "delete from {bounce_processing_analysis}"
            . " where ut_created < unix_timestamp() - %d";
   db_query($sql, $trigger_age);
   $sql = "delete from {bounce_processing_mails}"
            . " where ut_created < unix_timestamp() - %d";
   db_query($sql, $trigger_age);
}

It will be necessary to allow the site admins to set the Drupal variables used in the processing functions above, so the administration settings form will expand out to include those other values:

/**
 * A configuration form for module settings.
 */
function bounce_processing_settings() {
  $form['bounce_processing_account'] = array(
    '#type' => 'textfield',
    '#title' => t('Non-Delivery Report Account'),
    '#description' => t('Email address of the account to receive non-delivery reports.'),
    '#default_value' => variable_get('bounce_processing_address'),
  );
  $form['bounce_processing_account_password'] = array(
    '#type' => 'password',
    '#title' => t('Password for POP3 Access'),
    '#description' => t('Password for the account to receive non-delivery reports.'),
    '#default_value' => variable_get('bounce_processing_address_password'),
  );
  $form['bounce_processing_mail_server'] = array(
    '#type' => 'textfield',
    '#title' => t('POP3 Server'),
    '#description' => t('e.g. mail.mydomain.com.'),
    '#default_value' => variable_get('bounce_processing_mail_server'),
  );
  $form['bounce_processing_folder'] = array(
    '#type' => 'textfield',
    '#title' => t('Mail Folder Name'),
    '#description' => t('Don't change this unless you know what you are doing.'),
    '#default_value' => variable_get('bounce_processing_mail_server', 'INBOX'),
  );
  $form['bounce_processing_is_ssl'] = array(
    '#type' => 'select',
    '#title' => t('Use Secure SSL Connection?'),
    '#description' => t('Are you connecting to a POP3 or POP3 SSL server?'),
    '#options' => array(
      true => t('POP3-SSL (Encrypted connection)'),
      false => t('POP3 (No encryption used)'),
    ),
    '#default_value' => variable_get('bounce_processing_is_ssl', true);
  );
  $form['bounce_processing_limit_per_run'] = array(
    '#type' => 'textfield',
    '#title' => t('Number of Non-Delivery Reports Per Cron Run'),
    '#description' => t('Maximum number of bounce reports processed per cron run.'),
    '#default_value' => variable_get('bounce_processing_limit_per_run', '50'),
  );
  $form['bounce_processing_trigger_score'] = array(
    '#type' => 'textfield',
    '#title' => t('Trigger Score'),
    '#description' => t('The total bounce score needed to stop sending a user mail.'),
    '#default_value' => variable_get('bounce_processing_trigger_score', 50);
  );
  $form['bounce_processing_trigger_days'] = array(
    '#type' => 'textfield',
    '#title' => t('Keep Non-Delivery Reports For This Many Days'),
    '#description' => t('Older non-delivery reports will be cleared from the database.'),
    '#default_value' => variable_get('bounce_processing_trigger_days', 50);
  );

  return system_settings_form($form);
}

Lastly, these new variables should be deleted on uninstall. So hook_uninstall() now looks like this:

/**
 * Implementation of hook_uninstall().
 */
function bounce_processing_install() {
  drupal_uninstall_schema('bounce_processing');
  variable_del('bounce_processing_account');
  variable_del('bounce_processing_mail_server');
  variable_del('bounce_processing_is_ssl');
  variable_del('bounce_processing_folder');
  variable_del('bounce_processing_account');
  variable_del('bounce_processing_account_password');
  variable_del('bounce_processing_limit_per_run');
  variable_del('bounce_processing_trigger_days');
  variable_del('bounce_processing_trigger_score');
}

Set Up Bounce Processing to Run With Cron

With the bounce processing code written, we now want it to run regularly - so an implementation of hook_cron() is needed.

/**
 * Implementation of hook_cron().
 */
function bounce_processing_cron() {
   bounce_processing_run_processing();
}

Add Data to the User Object

Now that the bounce processing code is adding addresses to the bounce_processing_addresses table, we need to make sure that data appears in the user object when a user's email address matches one or more rows in the table. That is done via hook_user():

/**
 * Implementation of hook_user().
 */
function bounce_processing_user($op, &$edit, &$account, $category = NULL) {
   switch( $op ) {
      case 'load':
         $account->bounce_processing_marked
              = bounce_processing_is_marked($account->mail);
         break;

      case 'delete':
         // the user is going away, so clear out everything
         // associated with this account.
         $sql = "delete from {bounce_processing_analysis} where mail = '%s'";
         db_query($sql, strtolower($account->mail));
         $sql = "delete from {bounce_processing_mails} where mail = '%s'";
         db_query($sql, strtolower($account->mail));
         $sql = "delete from {bounce_processing_users} where mail = '%s'";;
         db_query($sql, strtolower($account->mail));
         break;
   }
}

/**
 * Return true if the email address provided is marked by the bounce processing
 * module for no further sending.
 */
function bounce_processing_is_marked($email) {
         $sql = "select count(1) as c from {bounce_processing_addresses}"
               . " where mail = '%d'";
         $result = db_query($sql, strtolower($account->mail));
         $row = db_fetch_array($result);
         return $row['c'];
}

You will probably want to further extend hook_user() to at least allow for an administrator to edit a user's "bounce_processing_marked" value.

Prevent Mail Going Out to Marked Addresses

While many different modules can safely tinker with the contents and headers of a single outgoing mail, putting in place code to decide whether or not to send that mail requires a little more care. The way in which this is done in Drupal is to change the "smtp_library" variable, then write your own drupal_mail_wrapper() function to take the place of the default behavior. You can see how this is done in modules like Devel, SMTP Authentication Support, and Return-Path. If you are using one of these modules, however, then you have to be careful in how your replacement for drupal_mail_wrapper() interacts with the other possible replacements - as only one can actually be live, but you may still want the functionality provided by the other modules.

Here is an example of possible code that checks for the existence of the SMTP Authentication Support module and then plays nice. First, hook_install() must tell Drupal that the bounce_processing module is the designated SMTP library:

/**
 * Implementation of hook_install().
 */
function bounce_processing_install() {
   drupal_install_schema('bounce_processing');
   variable_set('smtp_library', drupal_get_filename('module', 'bounce_processing'));
}

Then in bounce_processing.module, we check that assignment and write the wrapper function if so:

$smtp_library = variable_get('smtp_library');
if( $smtp_library == drupal_get_filename('module', 'bounce_processing') ) {

   // this module is the designated smtp library, so we must define this function
   function drupal_mail_wrapper($message) {
      // $message['to'] might be a list of addresses, and we have to check each one,
      // and then rebuild the list afterwards. Expect RF2822 format.
      $addresses = explode(",", $message['to']);
      foreach($addresses as $i => $address) {
         $emails = _bounce_processing_emails_from_text($text);
         if( bounce_processing_is_marked($emails[0]) ) {
            unset($addresses[$i]);
         }
      }
      if( count($addresses) ) {
         $message['to'] = implode(',',$addresses);

         // here we check to see that the STMP Authentication Support module
         // is installed. If not, we have to do the work of sending outselves
         if( module_exists('smtp') {
            return smtp_drupal_mail_wrapper($message);
         } else {
           // this is a direct copy of the drupal_mail_wrapper function from
           // the Return-Path module. Unfortunately it can't just be called
           // directly because of the choice of function name in that module.
           unset($messages['headers']['From']);
           $mimeheaders = array();
           foreach ($message['headers'] as $name => $value) {
              $mimeheaders[] = $name .': '. mime_header_encode($value);
           }

           return mail(
              $message['to'],
              mime_header_encode($message['subject']),
              // Note: e-mail uses CRLF for line-endings, but PHP's API requires LF.
              // They will appear correctly in the actual e-mail that is sent.
              str_replace("r", '', $message['body']),
              // For headers, PHP's API suggests that we use CRLF normally,
              // but some MTAs incorrecly replace LF with CRLF. See #234403.
              join("n", $mimeheaders),
              // Adds the sendmail -f command - necessary to preserve the
              // Return-Path header we set earlier
              "-f". $message['headers']['Return-Path']
           );
         }

      }
   }
}

Further Additions

Plenty more could be done to flesh out this module beyond the few items mentioned so far. For example, more error logging, or admin report pages to show types of bounces, or ways to export non-delivery report counts by SMTP response code. As things stand, it's a little crude, but it'll get the job done.