Pop Goes the POPfile

One of my friends recently installed an open-source program called POPFile for automatically classifying e-mail messages. POPFile takes a fairly clever approach to this problem, in that it acts as a local proxy for a POP3 server, but instead of just passing the messages through to the user, it captures them, analyzes them, and marks them as belonging to a user-defined category or “bucket.” The usual relationship between a mail client and its server looks like this:

Mail client and POP3 server

In other words, the user’s mail client, running on their own computer, connects over the network to a POP server at port 110 on some other machine, and downloads any new messages that have arrived. POPFile also runs on the user’s computer, but it interposes itself between the mail client and the remote server, like this:

Mail client, POPFile, and POP3 server

Here, instead of connecting directly to the remote POP3 server, the mail client connects to POPFile, which acts just like a POP3 server, and which talks to the real one. Thus, POPFile acts as a middleman (“proxy”) between the mail client and the mail server, so it can do some extra work on your behalf without changing the rules of the transaction.

Any decent e-mail program gives you a way to invent rules that influence how messages will be organized. For example, you can say things like “if a message was sent by my cousin Mary, and has the word ‘peppermint’ in the Subject line, show the message in red on the list of new messages.” Such rules are helpful, but ad hoc filing rules are not very good at expressing other things you’d like to say, such as “advertisements for bargain-rate watches, penny stocks, penis growth products, and weight-loss fads should be filed in the ‘Burn Before Reading’ folder.” Such messages do not usually share any simple common features, and they change over time, so it’s hard to write rules for them. This is where Bayesian text classification comes in—it’s a simple technique for finding statistical patterns in messages, and recognizing those patterns when they show up again later on. POPFile lets you use this technique in any e-mail program that knows about the POP3 protocol, even if it doesn’t know anything whatsoever about Bayesian text classification.

So far, so good. Unfortunately, while POPFile is a very well-written program, and seems to work well once installed, the installation process contains some subtle points that are not well-explained. Like me, my friend uses a Macintosh, so the easy-to-install Windows installer wasn’t an option. And, although detailed MacOS installation instructions were provided, the two of us had a devil of a time getting the program to work right. So, I’d like to document what we had to do, in case someone else runs across the same problems.

In what follows, I assume that you have a copy of POPFile’s MacOS installation instructions available, and can refer to them at need.

System Requirements

Steps 1 and 2 of the MacOS Installation Instructions can be followed without change. I confess that I do not know how anybody can get things done without having the Developer Tools installed anyway—though I’m sure my views on this subject are influenced by my tendency to program my way out of almost any situation I can’t talk my way out of. So, anyway, do those steps as written.

Steps 3 and 4 ask you to install a bunch of Perl modules needed by POPFile itself.1 But rather than downloading and installing them manually (a very tedious process), I recommend you use MacPorts instead. You still have to download a few things, but most of the process is much more automatic this way. Grab and run the latest MacPorts installer (1.5.0 as I write this), then pop open a Terminal window and run these commands (the user% shown on each line represents the command prompt):

user% sudo port selfupdate
# If you don't need SSL support, you can omit p5-io-socket-ssl
user% sudo port install p5-timedate p5-html-tagset \
   p5-dbi p5-mime-base64 p5-html-template p5-io-socket-ssl

This will take a while, but apart from entering your password to start the whole thing off, you can just sit back and drink tea while you wait. I myself like a nice hot cup of Earl Grey with clover honey (the scent of bergamot is quite relaxing).

You do have to install the DBD::SQLite2 package by hand, though, since MacPorts doesn’t have a portfile for that one. Follow the links to download the latest source (DBD-SQLite2-0.33.tar.gz as I write this), and save it on your Desktop. Then,

user% sudo -s
Password: ••••••••
root% mkdir -p /usr/local/lib/perl5/site_perl/5.8.8
root% mkdir -p /usr/local/lib/perl5/site_perl/darwin-2level
root% (cd /usr/local/lib/perl5/site_perl/5.8.8 && ln -s ../darwin-2level .)
root% ln -s /usr/local/lib/perl5/site_perl /opt/local/lib/perl5
root% exit
user% tar -zxf DBD-SQLite2-0.33.tar.gz
user% cd DBD-SQLite2-0.33
user%/opt/local/bin/perl Makefile.PL LIB=/opt/local/lib/perl5/site_perl
... much output ensues ...
user% make
... further output ensues ...
user% sudo make install
... still further output ensues ...

Basically, all the steps marked “root” above make it possible to install this module without corrupting your MacPorts installation. The rest is mostly the same as shown in Step 4 of the installation instructions. Be grateful you don’t have to do this for all those Perl packages! Once this is done, you can safely delete the DBD-SQLite2-0.33 directory from your Desktop, along with the .tar.gz file it came in.

Installing POPFile

The latest versions of MacOS are picky about the ownership and permissions on the scripts that go into StartupItems. That is good, from a security perspective—startup scripts get run as root, so you don’t want them to be writable by others. Think of root as a big, dumb executioner: Anybody whose name winds up on his list gets summarily slaughtered. You wouldn’t want your name added to that list as a prank, right? So, you make sure access to that list is carefully regulated. MacOS enforces this, by refusing to run StartupItems that have the wrong permissions.

It appears, however, that the SystemStarter also checks the permissions along the directory path containing the startup script. Again, this is to the good, but it means you probably cannot install into ~/Library/POPfile as the instructions recommend, because ~ (your home directory) is not owned by root, it’s owned by you. So, instead, I recommend you unpack the POPfile source into /Library/POPfile, or move it there once unpacked, so e.g.,

user% sudo -s
Password: ••••••••
root% mkdir /Library/POPfile
root% cd /Library/POPfile
root% unzip ~/Desktop/popfile-0.22.5.zip # or whatever version is current
root% exit
user%

Once you have done this, the rest of the testing instructions should work as advertised, save that you must write /Library/POPfile where they write ~/Library/POPfile.

Creating a POPfile Startup Item

Before you do this step, make sure POPFile works when you start it up manually according to the Test Your POPFile Installation section of the instructions. If that doesn’t work, then it almost certainly will not work at startup. Fortunately, running POPFile manually gives you a lot of feedback, so you should be able to figure out what went wrong.

Happily, the Startup Item instructions can be followed almost without change. One thing you do need to do differently is to make three small changes to the POPfile startup script:

  1. Replace the line reading PFPATH='/Users/yourname/Library/POPfile' with the line PFPATH='/Library/POPfile'.
  2. After the PFPATH line, insert a new line reading PERL=/opt/local/bin/perl.
  3. Find the two lines that read “perl popfile.pl > /dev/null 2>&1 &” and replace the word “perl” with the word $PERL.

Another change is in Step 7, in which you set permissions—here is the sequence of commands, with my changes:

user% sudo -s
Password: ••••••••
root% chown -R root:wheel /Library/StartupItems/POPfile # This is the main change
root% cd /Library/StartupItems/POPfile
root% chmod 0755 POPfile
root% chmod 0644 StartupParameters.plist

Otherwise, their instructions should work as advertised. Make sure you do not skip Step 7 of the Startup Item instructions. That is where you set the permissions, and as I mentioned above, the system is justifiably picky about that.

Diagnosing Problems

Once you start up POPFile, if everything works as advertised, you should be able to run a quick little test, as shown here—I’m using GMail as an example, but in principle any POP3 e-mail account should work, provided you know your login and password. Open a Terminal window, and try the following sequence of commands (the black text is what you type, the blue is what the computer prints).

user% telnet localhost 110
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
+OK POP3 POPFile (v0.22.5) server ready
USER pop.gmail.com:your.name@gmail.com:ssl
+OK send PASS
PASS your-secret-password
+OK Welcome.
STAT
+OK 62 1027704
QUIT
+OK Farewell.
Connection closed by foreign host.
user%

The proxy parses the text “pop.gmail.com:your.name@gmail.com:ssl” and establishes an SSL connection to the POP3 server at pop.gmail.com. It issues the POP3 USER command with your.name@gmail.com as the user name. Your password travels inside the encrypted SSL connection, so it does not appear in plain text on the network.2 If you are not able to complete this transaction, something is wrong, and you can check the output from POPFile in the window where you started it.

Once you’ve created the Startup Item for POPFile, the best way to test it is to reboot your machine. If the above test worked when you started POPFile manually, it should also work after reboot, assuming the Startup Item was installed correctly. If, however, you get something like this:

user% telnet localhost 110
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host
user%

… it means POPFile did not successfully start up at boot time. If permissions were the problem, you can sometimes find out by running Console.app (in /Applications/Utilities) and consulting the system.log, where you may find an entry like this:

SystemStarter[1234]: "/Library/StartupItems/POPfile" failed security check:
  not owned by UID 0

Otherwise, sad to say, there is not a whole lot of diagnostic feedback you can use. But usually, I think ownership and permissions are going to be the main problems.

As you can see, there are some complications in getting the system installed; however, once it does work, it seems to be a very nice piece of software. You can control the proxy via your web browser, by connecting to http://localhost:8080/, which is another good test that it is working properly.


1 POPFile is written in Perl. But, unlike many Perl programs, it is written in a clear and legible style. Kudos to the POPFile developers for that.

2 Not all POP3 servers speak SSL, and you should not send your password to such servers in the clear. POP3 services that do not support SSL usually support an APOP command, that lets you prove your identity without directly revealing your password. It’s harder to test an APOP login manually, however, so I used one that speaks SSL.