As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label GAE. Show all posts


The site is dead and some links have been removed. (2015-12-14T06:43:25Z)

Yesterday, I was looking into an old stuff I created using Google App Engine, which was evolved from a Bash script back in December, 2009. Here is a screenshot:

Yes, I shamelessly Google/Yahoo/Bing my username and yes, I unblushingly made a record for the number of search results with a chart. (Cough, two charts and I have this.)

When I opened that page, I noticed the records stopped on October 3, 2011. At first, I thought it might be the result fetching limit in my program, so I went to update the code, but I realized it was not like that.

Something went wrong when I saw a huge amount of unsuccessful tasks in task queue. Retry counts were around thousands. I looked into the logs and found out the problem was with Yahoo Search BOSS API. The domain of v1 API is gone.

So, I googled and found this announcement for v2 API, v1 was scheduled to be shut down on July 20, 2011, but it lasted until August 27. But then on September 20, it was back and lasted until October 3, then it is gone for sure since then.

Because my program requires three API returns successfully before it writes data into datastore. I should check tasks retry count, it is too high, then drop it and send me an email. But I didnt code it in that way, because I didnt think there is a chance of having the thousands of retries until yesterday. I lost about 6 months of data from Google and Bing. Will I add the email notification for something like this? Nah.

Yahoo Search BOSS v1 is not only one gone, soon Google Web Search API will be shut down around 2013. It was declared as deprecated before Yahoos, but it has longer transition time for developers, which is 3 years.

I think they both move to paid version of API, v2 and Custom Search. I dont use API for making money, so when they are gone, my program stops updating.

I need to find an IP whose geographic location is already known from other source. Normally, its the other way around, but this situation is different. I will explain in later post. (Updated at 2012-03-04T01:00:14Z, I used this to found an IP from known geographical location of malicious ad clicker)

First, downloading the access log from GAE. It is simple, this is the command I ran:

python google_appengine/ --num_days=3 request_logs project/ access.txt

--num_days specified 3 days of logs, the default is the logs of current calendar date, according to the documentation, options of request_logs can be found below in that page.

Next is to find the geographic location by IP address. You need to install MaxMind Python binding with C core on your system. Here is the core snippet I use to generate results:

import GeoIP

#gi =
gi ="/usr/share/GeoIP/GeoLiteCity.dat",GeoIP.GEOIP_STANDARD)
gi6 ="GeoLiteCityv6.dat",GeoIP.GEOIP_STANDARD)

with open('ips.txt') as f:
  for IP in f:
    IP = IP.rstrip('\n')
    if ':' in IP:
      gir = gi6.record_by_addr_v6(IP)
      gir = gi.record_by_addr(IP)
    print '%s: %s, %s, %s' % (IP, gir['city'], gir['region_name'], gir['country_name'])

GAE accepts both IPv4 and IPv6 connection, so you may want to also to look up for IPv6 address, or you will need to filter out IPv6 addresses and drop them. You can download free city database for IPv6. Note that Python binding version may need to be 1.2.7 for IPv6, I know 1.2.4 does not support IPv6.

Before you run this snippet, you need to process access.txt to have unique IPs:

cut -f 1 -d ' ' access.txt| sort | uniq > ips.txt

On Linux, its simple as that, or you can process in Python, I think its only need one line.

So, I ran: | grep '<CITY NAME>'

I got no results, I am sure if the requests were made, it must lies within 3 days. Something is really fishy.

Although I didnt find that IP, but I wrote this post.

I need to have a counter which allows a few increments within a second. Also the number will be stored in datastore. However, it doesnt have to be very accurate. The error can be from the eviction on memcache or cron doesnt work and could get the number in memcache being added to datastore in time.

I came up with a pingpong mechanism. The number will be increased by memcache.incr on two slots: ping slot and pong slot. When increment is on one slot, then a cron job will add the number from another slot to datastore and empty that slot if it successfully adds the number.

The time interval of slot change is 10 minutes, therefore, the number could be missing if memcache suddenly out of work. Every ten minutes, a cron job kicks in. So, normally, if the memcache doesnt work, the lost number could be from between 0 seconds and 10 minutes ago, if its just short glitches from Google App Engine. If memcache is out of work longer than that, Google App Engine is probably down, that it doesnt matter.

When datastore is down or under the scheduled maintenance, as long as memcache still works, the number still counts and wont be lost as long as the number isnt evicted from memcache. Once datastore is back with write capability, cron job can add the number to datastore.

There be no race condition when add the number to datastore because of using pingpong and cron job. I believe there is also no race condition with memcache.incr, if its processed on memcache server.

Here is my code, its from the source of my project. Unmodified, its not generalized for you to use it out-of-the-box. Read it and modify it before you use it.

1   Counting

Every time, Google App Engine releases new version of SDK, I always read the release notes. I was aware of

Results of datastore count() queries and offsets for all datastore queries are no longer capped at 1000,

when version 1.3.6 was released. I wrote a quick code to test it, something like the followings:

count = TestModel.all().count()

data1 = TestModel.all().fetch(1100, offset=5)
data2 = TestModel.all().fetch(1, offset=1100)

Using 1.3.8, first part still got me 1000 at most. The second part, I could fetch more than 1000 entities, and the offset could be bigger than 1000 without problems.

I finally saw someone asked and the correct way is:

count = TestModel.all().count(limit=None)

Now, I am reading the documentation again:

count() has no maximum limit. If you dont specify a limit, the datastore continues counting until it finishes counting or times out.

This doesnt sound like the same behavior in development server. I didnt try it on production server.

(I used to use this way to count.)

2   Randomly fetching

Before 1.3.6, there are some ways to get a random entity from datastore. All suck and are awkward. Now offset can be supplied with number bigger then 1000. Does it resolve? Read this line first:

The query has performance characteristics that correspond linearly with the offset amount plus the limit.

Well, I dont care at this moment since none of my data are too large.

I think Google is Einstein and Google App Engine is Quantnum Theory. ;)


The project is dead and some links have been removed. (2015-12-08T01:15:53Z)

Its a code I wrote more than two years ago, my first Google App Engine project. I didnt maintain it after a few months. Now, I have revived it and gave it a new face, those the code is still !#$%^.

Its kind of game. You get one word, then you enter another to create a link. You get another new word, then you just keep doing same thing, whatever the word pops up after you read the word you are going to link with, you type. I dont know what this app could achieve, its silly, without solid purpose. Just for fun, I guess.

Go to play next -> word.

This article, Updating Your Model's Schema, is already great and clear, but it does not have a complete code example. I decided to make one and write down some explanations. Just in case I might need it later.

It has one two stages to remove a property from a data model:
  1. Inherit from db.Expando if the model does not inherit from that.
  2. Remove the obsolete property from model definition.
  3. Delete the attribute, the property, of each entity del entity.obsolete
  4. Inherit from db.Model if the model originally inherited from.

How to actually do it:

Assume a model look like:
class MyModel(db.Model):
foo = db.TextProperty()
obsolete = db.TextProperty()

Re-define the model to:
class MyModel(db.Expando):
#class MyModel(db.Model):
foo = db.TextProperty()
# obsolete = db.TextProperty()

Make sure the model inherit from db.Expando and comment out (or just delete the line) the obsolete property.

Here is the example code to delete the attribute, the property:

from google.appengine.runtime import DeadlineExceededError

def del_obsolete(self):

count = 0
last_key = ''
q = MyModel.all()
cont = self.request.get('continue')
if cont:
q.filter('__key__ >=', db.Key(cont))
entities = q.fetch(100)
while entities:
for entity in entities:
last_key = str(entity.key())
del entity.obsolete
except AttributeError:
count += 1
q.filter('__key__ >', entities[-1].key())
entities = q.fetch(100)
except DeadlineExceededError:
self.response.out.write('%d processed, please continue to %s?continue=%s' % (count, self.request.path_url, last_key))
self.response.out.write('%d processed, all done.' % count)

Note that this snippet is to be used as a webapp.RequestHandler's get method, so it has self.response.

It use entities' keys to walk through every entity, it is efficient and safe. But you may also want to put your application under maintenance, preventing other code to add new entities, even though the values of keys seem to be increased only for new entities, but you really don't need to waste CPU time since new entities has no obsolete property.

Because it have to go through all entities and therefore it takes a lot of time to process. A mechanism to continue the process on the rest of entities is necessary. The code will catch google.appengine.runtime.DeadlineExceededError if it can not finish in one request, it then return a link which allows you to continue if you follow it. If you have lots of entities, you may want to use task instead of manual continuation. You may also want to set up the maximal amount of processing entities like 1000 entities in one request.

Once it has done its job, change the model definition back to db.Model and remove obsolete property line:
class MyModel(db.Model):
foo = db.TextProperty()

That's it.

I need to count how many entity of kind Blog has boolean property accepted set to True, but I suddenly realized that OFFSET in query is no use for me (In fact, it is not really useful).

In SDK 1.1.0, OFFSET does what you think on Development Server if you first use GAE and have experience of SQL, but it's still different than on Production Server.

Basically, if you have 1002 entities in Blog and you want to get the 1002nd entity. The follows will not get you that entity:
q = Blog.all()
# Doing filter here
# Order here
# Then fetch
r = q.fetch(1, 0)[0] # 1st
r = q.fetch(1, 1)[0] # 2nd
r = q.fetch(1, 999)[0] # 1000th
r = q.fetch(1, 1000)[0] # 1001st
r = q.fetch(1, 1001)[0] # 1002nd

You will get an exception on the last one like:
BadRequestError: Offset may not be above 1000.
BadRequestError: Too big query offset.
First one is on Production Sever, second is on Development Server.

The OFFSET takes effective after:
  1. filter data (WHERE clause)
  2. sort data (ORDER clause)
  3. truncate to first 1001 entities (even though count() only returns 1000 at most)
After filtering, sorting, truncating to first 1001 entities, then you can have your OFFSET. If you have read Updaing Your Model's Schema, it warns you:
A word of caution: when writing a query that retrieves entities in batches, avoid OFFSET (which doesn't work for large sets of data) and instead limit the amount of data returned by using a WHERE condition.
The only way is to filtering data (WHERE clause), you will need a unique property if you need to walk through all entities.

An amazing thing is you don't need to create new property, there is already one in all of you Kinds, the __key__ in query, the Key.

The benefits of using it:
  • No additional property,
  • No additional index (Because it's already created by default), and
  • Combination of two above, you don't need to use additional datastore quota. Index and Property use quota.
Here is a code snippet that I use to count Blog entities, you should be able to adapt it if you need to process data:
def get_count(q):
r = q.fetch(1000)
count = 0
while True:
count += len(r)
if len(r) < 1000:
q.filter('__key__ >', r[-1])
r = q.fetch(1000)
return count

q = db.Query(blog.Blog, keys_only=True)
total_count = get_count(q)

q = db.Query(blog.Blog, keys_only=True)
q.filter('accepted =', True)
accepted_count = get_count(q)

q = db.Query(blog.Blog, keys_only=True)
q.filter('accepted =', False)
blocked_count = get_count(q)

Note that
  • Remove keys_only=True if you need to process data. And you will need to use r[-1].key() to filter.
  • Add a resuming functionality because it really uses a lot of CPU time if it works on large set of data.

I just download the data from one of my App Engine application by following Uploading and Downloading, I used this new and experimental to download data into a sqlite3 database. You don't need to create the Loader/Exporter classes with this new method

It does explain how to download and upload, but, as for, uploading is only for production server. You have to look into the command line options, it's not complicated.

Here is a complete example to dump data:
$ python googleappengine/python/ --dump --kind=Kind --url= --filename=app-id-Kind.db /path/to/app.yaml/
[INFO ] Logging to bulkloader-log-20091111.001712
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 250000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 20/second
[INFO ] Opening database: bulkloader-progress-20091111.001712.sql3
[INFO ] Opening database: bulkloader-results-20091111.001712.sql3
[INFO ] Connecting to
Please enter login credentials for
Password for
.[INFO ] Kind: No descending index on __key__, performing serial download
[INFO ] Have 2160 entities, 0 previously transferred
[INFO ] 2160 entities (0 bytes) transferred in 134.6 seconds

And the following is for upload to Development Server using the sqlite3 database which we just download (not the CSV):
$ python googleappengine/python/ --restore --kind=Kind --url=http://localhost:8080/remote_api --filename=app-id-Kind.db --app_id=app-id
[INFO ] Logging to bulkloader-log-20091111.004013
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 250000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 20/second
[INFO ] Opening database: bulkloader-progress-20091111.004013.sql3
Please enter login credentials for localhost
Email: <- This does not matter, type anything
Password for <- Does not matter
[INFO ] Connecting to localhost:8080/remote_api
[INFO ] Starting import; maximum 10 entities per post
[INFO ] 2160 entites total, 0 previously transferred
[INFO ] 2160 entities (0 bytes) transferred in 31.3 seconds
[INFO ] All entities successfully transferred

You will need to specify the app id, which must match the Development server is running on.

This may be no need once the is stable.

I just tried to add two entity counts to my app's statistics page. Then I found out, the statistics APIreleased on 10/13/2009, version 1.2.6is not available for development server.

You can run the following code without errors:
from google.appengine.ext.db import stats
global_stat = stats.GlobalStat.all().get()

But global_stat is always None.

So I ended up with a code as follows:
db_blog_count = memcache.get('db_blog_count')
if db_blog_count is None:
blog_stat = stats.KindStat.all().filter('kind_name =', 'Blog').get()
if blog_stat is None:
db_blog_count = 'Unavailable'
db_blog_count = blog_stat.count
memcache.set('db_blog_count', db_blog_count, 3600)

The documentation didn't explicit mention whether if the statistics is available for development server or notmaybe I didn't read carefully, neither did Release Notes.

PS. I know the code is awful, str / int types mixed, terrible. But I am lazy to add and if clause in template file to check if db_blog_count is None or something like -1, or anything represents the data is not available.

PS2. The code should be just if blog_stat: (fourth line) and swap the next two statements if you know what I meant.


I Thank is dead and some links have been removed from this post. (2015-12-13T03:24:41Z)

Last of February, I started a new project I Thank. Its built on Google App Engine. I put a lot of things on it, which I havent done before. Such as

  • Google Account authentication,
  • sharding counters for calculating all entities,
  • Djangos i18n, custom template tag and feed generator,
  • pagination,
  • unittest using GAEUnit,
  • and other small bits.

You can access the code (BSDed) at Google Code hosting. Believe it or not, this is my most prettiest project in terms of style. Its nearly Pink! There are also many things that I didnt take care of, e.g. I think it looks ugly in IE and maybe others non-Firefox web browsers.

Its still under development stage, hope I can get some feedback from you, and you can go checking it out and thank someone!


This post was written only for Django 0.96.1 in GAE.

Two days ago, I started to create another Google App Engine application. This application will be internationalized when its finished. I tried searching for some solution, then I realized that there is no very simple way to achieve.

Normally, you can handle gettext stuff on your own, but our Google App Engine applications usually use templating from the SDK, which is from Django actually. One way or another, we have to incorporate with Django partially.

The goal here is:

  • Use minimal Django stuff, only import the essential stuff in order to get Djangos I18N support to work.
  • Messages in template must be translated, too.
  • Capable to decide the language from the cookie, django_language, or the request header, HTTP_ACCEPT_LANGUAGE.

I have already made a sample code, which you can read here and you can see it at

Note is gone. (2015-12-14T06:40:18Z)

Before we go into the code, please read the I18N1 and Settings2 of Django.

1   Setting Up

We need to use Django Settings to make I18N work. The reason of using Setting was due to Djangos gettext helper will require Settings module and decide location of message files by the location of Settings module.

If we want to use Django Setting, we must run the following code:

from google.appengine.ext.webapp import template

os.environ['DJANGO_SETTINGS_MODULE'] = 'conf.settings'
from django.conf import settings
# Force Django to reload settings
settings._target = None

Note that you must import the google.appengine.ext.webapp.template module, or you might get error about conf.settings is not able to be imported.

We need to set the environment variable DJANGO_SETTINGS_MODULE to the location of Setting module, conf.settings in this case. conf is the package and settings is a module file, our Settings module.

Why conf? Because when we generate message files from Python scripts and templates we will see how to generate later, the Django message file generator,, will create files under conf/locale/ from where its run.

2   Settings

What do we need in conf/

USE_I18N = True

# Valid languages
    # 'en', 'zh_TW' match the directories in conf/locale/*
    ('en', _('English')),
    ('zh_TW', _('Chinese')),

    # or ('zh-tw', _('Chinese')), # But the directory must still be conf/locale/zh_TW

    )# This is a default languageLANGUAGE_CODE = 'en'

3   Mark the messages

Wraps those need to be translated with _("message") in Python script and {% trans "message" %} in template files. Please read I18N1 for more usages.

4   Generate message files

Before you run the helper script, we need to create conf/locale, the helper wont create it for us.

Make sure you are at root of Google App Engine applications directory, then run:

$ PYTHONPATH=/path/to/googleappengine/python/lib/django/ /path/to/googleappengine/python/lib/django/django/bin/ -l en

/path/to/googleappengine/ is the Google App Engine SDKs location. This command should generate the conf/locale/en/LC_MESSAGE/django.po. Now you can open it to translate.

Dont forget to set CHARSET, Usually UTF-8 will be fine, the line would read like:

"Content-Type: text/plain; charset=UTF-8\n"

Once you finish translating, you need to run:

$ PYTHONPATH=/path/to/python/googleappengine/lib/django/ /path/to/googleappengine/python/lib/django/django/bin/

It will generate files in each language directories. You also need to update when you modify scripts or template, run:

$ PYTHONPATH=/path/to/googleappengine/python/lib/django/ /path/to/googleappengine/python/lib/django/django/bin/ -a

This will update all languages in conf/locale.

5   Working?

If you run your application, now it should show the language in conf.settings.LANGUAGE_CODE.

This is a per application setting, which is not normally that we want. We will expect each user can choose their own language. Django has a helper that calls LocaleMiddleware can do the job, unfortunately, it needs Djangos request and response class to work normally.

6   Do the dirty job

In order to do what LocaleMiddleware does, we need to make Google App Engines request/response objects have same behavior as Djagnos. For easing the complexity, we create a new class, I18NRequestHandler, which inherits google.ext.webapp.RequestHandler. You only need to replace with it in your handlers.

import os

from google.appengine.ext import webapp
from django.utils import translation

class I18NRequestHandler(webapp.RequestHandler):

  def initialize(self, request, response):

    webapp.RequestHandler.initialize(self, request, response)

    self.request.COOKIES = Cookies(self)
    self.request.META = os.environ

  def reset_language(self):

    # Decide the language from Cookies/Headers
    language = translation.get_language_from_request(self.request)
    self.request.LANGUAGE_CODE = translation.get_language()

    # Set headers in response
    self.response.headers['Content-Language'] = translation.get_language()
#    translation.deactivate()

Where Cookies is from (dead link with long gone Cookbook). When request comes in, it can automatically activate the language from what Cookies/Headers specify.

7   Caching problem

Its not so perfect. I have noticed a problem in development server. If you change code and/or the message file, recompile the message file while server still runs, those message in entry script may not be translated for reflecting to cookie django_languages change. I believe that is about the caching.

I am not sure the natural problems, so I couldnt solve it. However, this may not be severe problem.

8   Encoding

If you use unicode string (not str string) in {% blocktrans %} template tag, you may get error, encode it to utf-8 first, e.g. s.encode('utf-8').

9   Language Code

You must use underscore not dash for messages directory, e.g aa_BB, or Django would not recognize directory named as aa-BB or aa-bb. But in conf.settings you can use aa-bb, this means the language code and directory can be different, e.g. zh-tw for the language code in Python and zh_TW as message directory name.

10   Conclusion

Although this will work, but it may be broken if any changes to Django framework within Google App Engine. There isnt a good solution for I18N in Google App Engine if Google doesnt natively support it.

11   Updates

  • 2009-11-25: Added not about template module first and encoding issue, and updated the path of Python lib in GAE SDK.
  • 2009-12-24: Added a note about Language Code format, thanks BRAGA, again.
  • 2010-02-04: Added a note about the Language Code and message directory name.
  • 2013-02-17: fix dead links and typos.
  • 2013-07-24: remove .rst from title, update link.
[1](1, 2) Django 0.96 documentation is gone, the link is for Django 1.4.
[2]Django 0.96 documentation is gone, the link is for Django 1.4.

Google App Engine just announced the free quota will be reduced in 90 days, by 2009-05-25. The detail changes are:
  • CPU Time: 46.3 down to 6.5 hours, about 14% remaining.
  • Bandwidth In/Out: 10.0 GB down to 1.0 GB, 10 % remaining.
It's not all reductions, they also doubled the storage quota from 0.5 GB to 1.0 GB.

If you just signed in the dashboard, you would need to agree new Terms of Service, then you would see the new billing section. The most important change of ToS possibly is 4.4. You may not develop multiple Applications to simulate or act as a single Application or otherwise access the Service in a manner intended to avoid incurring fees.

Even though it cut off much more free quota on CPU Time and Bandwidth. My apps will stay in free quota, they are not hot. :)

Recently, my GAE application started to get few timeouts on operations on datastore.

Here is a sample traceback:
datastore timeout: operation took too long.
Traceback (most recent call last):
File "/base/python_lib/versions/1/google/appengine/ext/webapp/", line 498, in __call__
File "/base/data/home/apps/brps/1.330624965687476780/", line 104, in get
p = post.get(blog_id, post_id)
File "/base/data/home/apps/brps/1.330624965687476780/brps/", line 85, in get
p = db.run_in_transaction(transaction_update_relates, blog_id, post_id, relates)
File "/base/python_lib/versions/1/google/appengine/api/", line 1451, in RunInTransaction
raise _ToDatastoreError(err)
File "/base/python_lib/versions/1/google/appengine/api/", line 1637, in _ToDatastoreError
raise errors[err.application_error](err.error_detail)
Timeout: datastore timeout: operation took too long.

Here is how you can catch it:
from google.appengine.api.datastore_errors import Timeout

except Timeout:

On Development Server

When I use Mail API with sendmail using the example as in Sending Mail doc, the recipient has to be pure email address:
cannot be
User <>
Or sendmail complains:
INFO     2008-10-29 06:57:53,884] MailService.Send
INFO     2008-10-29 06:57:53,884]   From:
INFO     2008-10-29 06:57:53,885]   To: User
INFO     2008-10-29 06:57:53,885]   Subject: Your account has been approved
INFO     2008-10-29 06:57:53,885]   Body:
INFO     2008-10-29 06:57:53,885]     Content-type: text/plain
INFO     2008-10-29 06:57:53,885]     Data length: 261
/bin/sh: -c: line 0: syntax error near unexpected token `newline'
/bin/sh: -c: line 0: `sendmail User <>'
ERROR    2008-10-29 06:57:53,927] Error sending mail using sendmail: [Errno 32] Broken pipe
I think this can be fixed by patching the

On Production Server

Sender must be:
The sender must be the email address of a registered administrator for the application, or the address of the current signed-in user.