As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label Gmail. Show all posts

More than two years ago, I posted Finding large emails in Gmail using Python IMAP with XOAuth, which was really not an easy way if you don't know how to run a Python script.

Now, Gmail finally supports new operators for such task:

size: Search for messages larger than the specified size in bytes Example: size:1000000
Meaning: All messages larger than 1MB (1,000,000 bytes) in size.
Similar to size: but allows abbreviations for numbers Example: larger:10M
Meaning: All messages of at least 10M bytes (10,000,000 bytes) in size.

And this is my test:

It also supports date ranges, no need to browse through pages for old emails anymore. It's fast, only took than more than two years to develop.

6:45am, just woke up, showered, a cup of coffee in my hand wiping away the cute sleepy bugs in my head. Checking other stuff first, skimming over blog posts, updates, blah and blahs.

It's time to deal with the mails which I knew I had a few from my notification program, but I didn't peek at mail subjects.

Well, 3 spams out of 5 emails, that's quite a surprise, considering this is the Gmail, which was said having the most incredible algorithm, even A.I. perhaps?

To be more surprised is they are all marked as Important, WTF indeed. Cheers to the glorious superb filter to show me how important they were, so I could spot right on them as spams and send them into spam folder. I got it, that's the whole Gmail spam prevention strategy, let users deal with them after mark them as important.

Is Gmail officially screwed by spams?

title: Where is dat imporant words in this phishing email, Gmail?

Gmail failed on filtering out phising email again, a big time:

Oh, cmon, its body is empty, where is dat important words? It doesnt even have a subject line, alright, it has. cc, really, Gmail? What, attachment filename, you serious?

Here is a screenshot of that email, glad that Google Docs provides viewing on the net, so I dont need to download it and worry if it contains virus, though Gmail said it has scanned it. But, even it really has virus, it may need to be specifically designed for attacking on Linux.

The ridiculous content is old, but method is little bit new to me by using attachment. Poor Coca Cola, a victim as well.

1   Archive

1.1   Email headers

Received: by with SMTP id ct5csp181759wib;
        Wed, 18 Apr 2012 00:01:23 -0700 (PDT)
Received: by with SMTP id h8mr1003624yhe.79.1334732483236;
        Wed, 18 Apr 2012 00:01:23 -0700 (PDT)
Received: from ( [])
        by with ESMTP id q25si22785285yhj.122.2012.;
        Wed, 18 Apr 2012 00:01:23 -0700 (PDT)
Received-SPF: neutral ( is neither permitted nor denied by best guess record for domain of client-ip=;
Authentication-Results:; spf=neutral ( is neither permitted nor denied by best guess record for domain of
X-Spam-Rating: None
X_CMAE_Category: 0,0 Undefined,Undefined
X-CNFS-Analysis: v=1.1 cv=+PD7zhiQh4wHAkX2ildB6Hz7oVUY6cTH2eYUHJ1YceI= c=1 sm=0 a=-4BUNljfCKEA:10 a=FKkrIqjQGGEA:10 a=AhRLOILGsKkA:10 a=gv4l6aEeuxxzeCLns_sA:9 a=K-QaQ4hbBhWg8AMYVz4A:7 a=QEXdDO2ut3YA:10 a=_W_S_7VecoQA:10 a=aIyur2oi7UP9Z7IZqwkA:9 a=IKIoO-ieCDEA:10 a=QLvOlBIuGJjmAZ5IHHaCwQ==:117
X-CM-Score: 0
X-Scanned-by: Cloudmark Authority Engine
Authentication-Results:; spf=neutral
Received-SPF: neutral ( is neither permitted nor denied by domain of
Received: from [] ([]
 by (envelope-from )
 (ecelerity r(29895/29896)) with ESMTP
 id 07/63-15061-0C66E8F4; Wed, 18 Apr 2012 03:01:20 -0400
Date: Wed, 18 Apr 2012 03:01:20 -0400 (EDT)
From: Roland Mkemoff
Message-ID: <>
In-Reply-To: <>
Subject: cc
MIME-Version: 1.0
Content-Type: multipart/mixed;
X-Originating-IP: []
X-Mailer: Zimbra 6.0.5_GA_2328.RHEL5_64 (ZimbraWebClient - SAF3 (Win)/6.0.15_GA_2995)

1.2   Text of attachment, award.docx

         This is to inform you that your email address has won prize money of (500,000.00) GBP for been an active web-email user. This Lottery promotion was organized by COCA COLA PLC.

A cheque of 500,000.00 GBP has been issued against your winning email and has been forward to Fair Ways Courier Company for delivery to your country of residence.
You are required to contact us with the details below to claim your winnings

1. Full name:
2. Contact Address:
3. Age:
4. Telephone Number
5. Sex:
6. Occupation:
7. State:
8. Country:
9. Nationality:


MR Dave Dawes

I received this LOL email:
Received: by with SMTP id ct5csp139659wib;
        Tue, 10 Apr 2012 01:26:35 -0700 (PDT)
Received: by with SMTP id k25mr4094241bku.72.1334046395436;
        Tue, 10 Apr 2012 01:26:35 -0700 (PDT)
Received: from ([])
        by with ESMTPS id zw9si11860501bkb.48.2012.
        (version=TLSv1/SSLv3 cipher=OTHER);
        Tue, 10 Apr 2012 01:26:35 -0700 (PDT)
Received-SPF: neutral ( is neither permitted nor denied by best guess record for domain of client-ip=;
Authentication-Results:; spf=neutral ( is neither permitted nor denied by best guess record for domain of
Received: from apache by with local (Exim 4.69)
 (envelope-from )
 id 1SHWJs-0002f7-V9
 for; Tue, 10 Apr 2012 12:20:56 +0400
Subject: Compliments
Date: Tue, 10 Apr 2012 12:20:56 +0400
From: Wanis Al-qaddafi 
X-Priority: 3
X-Mailer: PHPMailer ( [version ]
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="iso-8859-1"
Sender: Apache 

Dear Sir,
I am an Aid to late Muammar Gaddafi's Intelligence Chief, Abdullah Al-Senussi who is in detention 
after he was arrested at Nouakchott airport Mauritania. The Libyan prosecutor general has sent an 
extradition request to the Mauritanian government through Interpol for him to return home for fair trial 
in Libya. As regards to this, he has asked me to move some of his funds to an offshore account hence 
my contact with you. I want to solicit your attention to receive funds on his behalf considering your 
experience in implementing corporate solution and vast years of business intelligence and because 
my status would not permit me to do this alone putting into consideration the currents events in Libya. 
If you are interested to help us on this transaction, be aware that you will be well compensated. 
Let me know if you can help us so i can discuss among other things the security and procedures to 
move the fund to you as soon as possible.

Awaiting your reply

Wanis Al-Quaddafi
Notice the From field? "," that is such a joke. It was actually sent from Russia and it didn't get into Spam folder in Gmail. I have received a few spams this week, which didn't get put into Spam folder.

It said "late" and Libyan prosecutor general wants to prosecute a dead body? Well, that guy is still alive and this is last month's news. Proof-reading your phishing email, idiot! Don't just copy your old template.

I can't find Wanis Al-Quaddafi, but I do find Wanis al-Qaddafi, who has died for nearly three decades. You really need to make up a better fake name.

Did anyone even fall for such brainlessly written email? I don't think so.

In the last few days, I got three Disqus comments and Disqus did email me about them. I use email notification to know if I have new comments.

Two of them were marked as spam by Gmail.

But both times, I have to see the comment on the post's page. First time, I was editing that post. Second time, moments ago, I scrolled down the home page to see how many posts I have published this month using the Archive dropdown list. I saw the post at the bottom has one comment, that's how I knew that comment.

The first case, that comment has six links, five to YouTube, one to Vimeo. It's a real comment, not a spam at all. The second comment, which has no links but only a simple question, "What is the table id?"

It's a real one, too. As I said before, spam detection isn't the solution, it's not fighting but avoiding the truth, which is we have lots of spam bombing us. It's like someone hates cockroach or mouse but this guys do kill them, he catches and moves them out of his house. But they keep coming back and breeding more and more. All the energy of this guy is used to move them out, silly.

These two incidents were not the first time. I had saved a few from spam folders a few times before and they were lucky. I currently have 450 over last 30-day period, so that's 15 spam emails a day. I guess I will clean/check up my spam folder every day from now on.

I now subscribe to Disqus comments feed of this blog. Just in case.

May I call myself by Spam Detection Victim if someday someone does want to give me one million dollars and Gmail put it into spam folder?


Gmail supports new operators for size range searching, see my blog post about them. (2012-11-14)

After I posted about using Googles Python XOAuth library to get the unread mail count and list, I finally found a good reason to use IMAP, you can search based on the message size! Which you cant do in the web interface.

typ, data =, '(SMALLER %d) (LARGER %d)' % (MAXSIZE * 1000, MINSIZE * 1000))

That is just great but not awesome because Gmails IMAP server does not support SORT command, which is an IMAP4rev1 extension command, according to Python doc.

The entire source code is the similar to the one in my previous post:

#!/usr/bin/env python
# Copyright 2010 Yu-Jie Lin
# BSD license

import email
import email.header
import imaplib
import sys

import xoauth

scope = ''
consumer = xoauth.OAuthEntity('anonymous', 'anonymous')
imap_hostname = ''

# How many messages will be fetched for listing?

  import config
except ImportError:
  class Config():
  config = Config()

def get_access_token():

  request_token = xoauth.GenerateRequestToken(
      consumer, scope, nonce=None, timestamp=None,

  oauth_verifier = raw_input('Enter verification code: ').strip()
    access_token = xoauth.GetAccessToken(
        consumer, request_token, oauth_verifier, config.google_accounts_url_generator)
  except ValueError:
    # Could indicate failure of authentication because verifier is incorrect
    print 'Incorrect verification code?'
  return access_token

def main():

  # Checking user email and access token
  if not hasattr(config, 'user') or not hasattr(config, 'access_token'):
    config.user = raw_input('Please enter your email address: ')
    config.google_accounts_url_generator = xoauth.GoogleAccountsUrlGenerator(config.user)
    access_token = get_access_token()
    config.access_token = {'key': access_token.key, 'secret': access_token.secret}
    # XXX save token, this is not a good way, I'm too lazy to use something
    # like shelve.
    f = open('', 'w')
    f.write('user = %s\n' % repr(config.user))
    f.write('access_token = %s\n' % repr(config.access_token))
    print '\n\ written.\n\n'

  config.google_accounts_url_generator = xoauth.GoogleAccountsUrlGenerator(config.user)
  access_token = xoauth.OAuthEntity(config.access_token['key'], config.access_token['secret'])

  # Generate xoauth string
  class ImBad():
    # I'm bad because I'm going to shut xoauth's mouth up. So you won't see these debug messages:
    # signature base string:
    # GET&
    # xoauth string (before base64-encoding):
    # GET oauth_co...
    def write(self, msg): pass
  sys.stdout = ImBad()
  xoauth_string = xoauth.GenerateXOauthString(
      consumer, access_token, config.user, 'IMAP',
      xoauth_requestor_id=None, nonce=None, timestamp=None)
  sys.stdout = sys.__stdout__

  MINSIZE = int(raw_input('Larger than in KB [1000]? ') or 1000)
  MAXSIZE = int(raw_input('Smaller than in KB [5000]? ') or 5000)
    print >> sys.stderr, 'Wrong size range!'
  imap_conn = imaplib.IMAP4_SSL(imap_hostname)
  imap_conn.authenticate('XOAUTH', lambda x: xoauth_string)'[Gmail]/All Mail', readonly=True)
  typ, data =, '(SMALLER %d) (LARGER %d)' % (MAXSIZE * 1000, MINSIZE * 1000))
  # No SORT command on Gmail IMAP server
  #typ, data = imap_conn.sort('(REVERSE SIZE)', 'UTF-8', '(LARGER %d)' % SIZE)
  unreads = data[0].split()
  print '%d messages are between %d and %d KB.' % (len(unreads), MINSIZE, MAXSIZE)
  ids = ','.join(unreads[:MAX_FETCH])
  if ids:
    print 'Listing %d messages:' % min(len(unreads), MAX_FETCH)
    typ, data = imap_conn.fetch(ids, '(RFC822.HEADER)')
    for item in data:
      if isinstance(item, tuple):
        raw_msg = item[1]
        msg = email.message_from_string(raw_msg)
        # Some email's header are encoded, for example: '=?UTF-8?B?...'
        print '\033[1;35m%s\033[0m: \033[1;32m%s\033[0m' % (

if __name__ == '__main__':

The output would look like:

% python2.5 ./
Larger than in KB [1000]?
Smaller than in KB [5000]?

23 messages are between 1000 and 5000 KB.

Listing 20 messages:
[messages here]

The search would take quite a lot of time to complete, up to minutes. So, please be patient.

I want to find those big emails because I couldnt figure out why 9,085 emails can take up to 543 MB in my Gmail. I found the biggest mail, 15,189KB, 2.80% of used space. Second and third takes 9,366 and 7,659KB, together take 3.14%.

I set up SSMTP with Gmail because I wanted to get mails for cron results.

I have these in /etc/ssmtp/ssmtp.conf:


Make sure this is only root-and-ssmtp-readable.

And /etc/ssmtp/revaliases:


You can test with

echo mailbody | mail -v -s "mail subject"
echo mailbody | sendmail -v

I dont have this mail command on my Gentoo, but it seems popular in every page I have read.

If you want to send a more complete test email via sendmail you can

echo -e "Subject: mail subject\nTo:\n\nmailbody" | sendmail -v