I recently switched from using Spamassassin to using DSpam. Spamassassin, while working very well for me, is slow, and consumes most of my processor. I like to be able to do things while my mail is being fetched.
DSpam's website has some good documentation for DSpam and the single user, but it is all old, and not terribly helpful for my setup. I'll explain my setup, and what I did to make it work.
Credit: Some things come from the DSpam page, specifically the mbox spamassassin conversion is from a post on DSpam for SA Users.
My Setup
I have two primary machines, my laptop and my desktop. Each of these are running Debian Testing (etch). My home directories are syncronized with unison, and I'm only using one machine at a time. Generally I want my experience to be exactly the same on both machines. Spamassasin stores it's Bayesian db in your home directory, so I can keep my spam filters updated and working when I'm on either machine. Since I store my mail locally, I want to be sure I'm training the same version of DSpam with the same data, so they're up to date with each other.
Setting up DSpam
We're going to want DSpam to store it's db in your home directory. This should work in a multi-user environment as well, but I have not tested it. To configure:
./configure --sysconfdir=/etc --enable-homedir --with-dspam-group=dspam \ --with-dspam-home-group=dspam
You'll need to have made the dspam group, and put your user in it. After configuring, do the standard make and make install
Now you need to get the database created, the easiest way is to run
dspam_stats username, replacing username with your
username. This should be done as your normal user. Now you should have a
directory ~/.dspam.
Edit your /etc/dspam.conf to have the following lines:
Trust username Preference "signatureLocation=headers"
Trust will give your user account access to do most DSpam things without being root. I set the DSpam signature location to be in my headers. This is only good if you plan to fix errors by piping the entire (headers and all) message to a local dspam process. If you plan to retrain in any other fashion, and your mail client doesn't handle perserving headers well, don't change the signature location.
Training DSpam from SpamAssassin Messages
When SpamAssassin scans a message it adds in headers, and if it finds spam it might add the original message as an attachment to the warning. We don't want DSpam to consider SpamAssassin tokens as part of a spam/not-spam decision. So we need to clean up our messages. There are two methods, depending on if you're using mbox or maildir.
mbox
You'll want to make a cleaned version of each mbox you want DSpam to consider for training purposes. The command to run for a spam box is:
formail -s spamassassin -d < spam.inbox > cleaned.spam.inbox dspam_corpus --addspam username cleaned.spam.inbox
If the mbox is not a spam box then:
formail -s spamassassin -d < other.inbox > cleaned.other.inbox dspam_corpus username cleaned.spam.inbox
This will clean your mbox and then train DSpam correctly. dspam_stats -H
username should report corpus training correctly.
maildir
Maildir is a bit trickier. You'll want to make a temporary set of maildirs for each box you plan to train. For example:
mkdir -p /tmp/spam/{cur,new}
mkdir -p /tmp/inbox/{cur,new}
Once you've done that, you'll get to convert over the messages. Use the following bash command to pull that off:
for i in ~/maildir/spam/new/*; \ do (spamassassin -d < $i > /tmp/spam/new/`basename $i`); \ done
Change "~/maildir/spam/new/" and "/tmp/spam/new/" as
needed. This command takes about a second a message, I'd go watch TV or
something.
Now that you've converted over your messages, you need to train DSpam on them, once again we'll use a nice bash for loop:
for i in /tmp/spam/new/*; \ do (cat $i | dspam --user username --class=spam \ --source=corpus --mode=teft --feature=chained,noise); \ done
Or in the case of non-spam training:
for i in /tmp/spam/new/*; \ do (cat $i | dspam --user username --class=innocent \ --source=corpus --mode=teft --feature=chained,noise); \ done
After this dspam_stats -H username should show a good number of
corpusfed messages.
Procmail Configuration
Procmail configuration is simple a straight-forward:
:0fw | dspam --user spr --stdout --deliver=innocent,spam :0 * ^X-DSPAM-Result: spam $HOME/Mail/spam/
Adjust as needed.
Mutt Configuration
I added the following to my .muttrc so I could easily re-train when I've had errors:
# DSpam management macro index "\cx" "<enter-command>set wait_key=no\n<pipe-message>dspam --user spr --class=spam --source=error\n<delete-message> <enter-command>set wait_key=yes\n" 'Spam Message' macro pager "\cx" "<enter-command>set wait_key=no\n<pipe-message>dspam --user spr --class=spam --source=error\n<delete-message> <enter-command>set wait_key=yes\n" 'Spam Message' macro index "\ca" "<enter-command>set wait_key=no\n<pipe-message>dspam --user spr --class=innocent --source=error\n <enter-command>set wait_key=yes\n<save-message>=" 'Non-Spam Message' macro pager "\ca" "<enter-command>set wait_key=no\n<pipe-message>dspam --user spr --class=innocent --source=error\n <enter-command>set wait_key=yes\n<save-message>=" 'Non-Spam Message'
So now when I hit ctrl-x a message is re-trained as spam and deleted. When I hit ctrl-g a message is re-trained as innocent and then I'm prompted to save to a mailbox.

