Mail Enhancement

After reading Kiri‘s analysis of her email, I was curious about my email. But first…

Dr. T sent me mail earlier today because I have hit the Big Time. Yes, my obscure, PG-13 blog is deliberately blocked by a Fortune 500 company:

Access Denied (policy_denied)
Your system policy has denied access to the requested URL.
The above Website is blocked. For assistance please Call XXX 853 5555-Option 1 and then 5 – Network Support Team

The best working theory I have is I’ve insulted the product manager by making unflattering observations about marketing partnerships, 5-gallon buckets of paint, or dryers.
But which?

Since I’m already banned, I might as well get on with the topic today. For the last week, I’ve been analyzing my personal mailbox to determine:

  • How much mail do I get?
  • How effective is my spam filtering?
  • How were the non-spam handled?
  • What general characteristics of spam were observed?

  • How much mail do I get?
    There were 518 conversations, 266 (51%) that were not spam and 252 (49%) spam.
    Gmail records things by “conversations.” Thus if this transpired:

    Will you pick me up a flux capacitor on your way to work?

    We might go back and forth a few times:

    How large? How many teslas?

    Standard size. At least ten. Twenty if they’re not too expensive.

    Is it my turn to buy milk this week?


    It counts as only one “conversation.”

  • How effective is my spam filtering?
    Surprisingly good. Gmail missed three obvious spams and had one “false positive,” from a poorly crafted “Word of the Day.” Its false positive rate is 0.2%. It might be interesting to compare this with my Yahoo account. I’ve had it for > 13 years, using it only sporadically but at “maximum spam deletion.”
  • How were non-spam handled?
    I have used Kiri’s categorizations as a template:

    Of the legitimate mail, it was surprising 31% were deleted without ever being read. These were almost always digests (Seattle Randonneur, Movable Type’s Pro-Net), columns (WSJ’s Jeremy Wagstaff) or news headline services.

    I’ve had this disclaimer around for at least a year to set expectations, but it was still surprising to see how low my response rate actually is. I try to be prompt with friends, but if it requires a thoughtful response, it could take a while for me to carve out a block of time. (Again: don’t be offended if I’m slow in responding.)

For the spam category, I looked at several things:

  • Recipient
    Most (59%) of my spam was sent using the Bcc method, that is, no recipient is overtly displayed. Slightly over 30% was sent to a list of recipients. I often see patterns where several clusters of recipients will be on the line, almost always to the same host.
  • English versus bad English versus UTF-8

    To thwart Bayesian filtering , spammers resort to what’s known as “Bayesian poisoning,” creating gibberish emails to hide their true purpose, embedding the product pitch in a small attachment. 97 of the messages contained an image attachment (87 .gif, 6 .png, 4 .jpg). The three or four I looked at were photos of various pharmaceutical products and a web page link offering “discrete” packaging.

    The gibberish text is combed from a variety of sources, sometimes even passages from legitimate news articles. At times, it has a strange, nearly-artistic beauty to it like Car Henge. The best will make its way into Spam Poetry competitions.

    Approximately ten percent of the spams were written in an UTF-8, Asian-looking character set. The only non-scripty thing was the web site link.

  • Spam Topics

    I glanced through the spams making a quick determination of their type. This was tedious! Because I’m not current on my brand name mega-pharma products, I ultimately lumped all of the herbal elixirs, brand-name, discount pharmaceuticals and other means that purportedly would enhance my boy-parts into the same category. The sites all have general pitch, though I did find one sliver of honesty:

    When should you stop taking [These Stupid] Pills?
    Whenever you feel comfortable with the way you look just stop taking our product.

    In distant second place were “lonely women” with webcams seeking “enhancement” in the recurring subscription revenue sense of the word. The remaining spams were a smattering of “discount” software, refinancing/phishing schemes, and the old favorite, the 419 scheme.


  1. I’d think it’d take a long time to rummage through both my earthlink and yahoo accounts to determine spam counts. How long did this analysis take you?

    I think I probably get more “Asian” spam than you, because I have posted to Asian-related fora in my past. I have set up several filters to screen for Asian languages. For example, if a message has a capital C with a hook under it (I forgot what this is called), it’s probably in Korean and it’s really not personally for me.

  2. I love the title of this post… clever. 🙂

    Your graphs are prettier than mine. Either your Excel foo exceeds mine (very likely) or you’re using a nicer tool (also likely).

    It’s interesting to compare our results! I guess with gmail, a “file” category isn’t relevant?

    I’m impressed that you actually analyzed the spam. I didn’t have the stomach (or time) for it. 🙂

  3. Claire: It took about an hour. I pasted the email headers and content previews into a delimited file so I could slice and dice them. I should investigate filtering on those character sets.

    Kiri: Although gmail tries to encourage retention of email, I regularly delete and purge because it will otherwise clutter up search.

    As for the spamalysis, it was out of morbid curiosity. Yes, I felt I needed to wash my hands afterwards.

  4. Would you mind doing this analysis at my company? 🙂 I’m so happy I don’t have to actually go through corporate spam, one-by-one. We let the users take care of whatever they get, after Abaca has had its way with it.

  5. I find since I’ve switched from Yahoo to gmail that my sp@m has 1) gone to nearly nil, and 2) is up on the one gmail account that I use for business related things where – shockingly – I’m actually subscribed to several newsletters. I still the best sp@m e-mail I ever got had the subject line “You will be happy” I never opened it because I wanted the subject to be true 😉

Comments are closed.