I need to count how many entity of kind Blog has boolean property accepted set to True, but I suddenly realized that OFFSET in query is no use for me (In fact, it is not really useful).
In SDK 1.1.0, OFFSET does what you think on Development Server if you first use GAE and have experience of SQL, but it's still different than on Production Server.
Basically, if you have 1002 entities in Blog and you want to get the 1002nd entity. The follows will not get you that entity:
You will get an exception on the last one like:
The OFFSET takes effective after:
An amazing thing is you don't need to create new property, there is already one in all of you Kinds, the __key__ in query, the Key.
The benefits of using it:
Note that
In SDK 1.1.0, OFFSET does what you think on Development Server if you first use GAE and have experience of SQL, but it's still different than on Production Server.
Basically, if you have 1002 entities in Blog and you want to get the 1002nd entity. The follows will not get you that entity:
q = Blog.all()
# Doing filter here
# Order here
# Then fetch
r = q.fetch(1, 0)[0] # 1st
r = q.fetch(1, 1)[0] # 2nd
r = q.fetch(1, 999)[0] # 1000th
r = q.fetch(1, 1000)[0] # 1001st
r = q.fetch(1, 1001)[0] # 1002nd
You will get an exception on the last one like:
BadRequestError: Offset may not be above 1000.First one is on Production Sever, second is on Development Server.
BadRequestError: Too big query offset.
The OFFSET takes effective after:
- filter data (WHERE clause)
- sort data (ORDER clause)
- truncate to first 1001 entities (even though count() only returns 1000 at most)
A word of caution: when writing a query that retrieves entities in batches, avoid OFFSET (which doesn't work for large sets of data) and instead limit the amount of data returned by using a WHERE condition.The only way is to filtering data (WHERE clause), you will need a unique property if you need to walk through all entities.
An amazing thing is you don't need to create new property, there is already one in all of you Kinds, the __key__ in query, the Key.
The benefits of using it:
- No additional property,
- No additional index (Because it's already created by default), and
- Combination of two above, you don't need to use additional datastore quota. Index and Property use quota.
def get_count(q):
r = q.fetch(1000)
count = 0
while True:
count += len(r)
if len(r) < 1000:
break
q.filter('__key__ >', r[-1])
r = q.fetch(1000)
return count
q = db.Query(blog.Blog, keys_only=True)
q.order('__key__')
total_count = get_count(q)
q = db.Query(blog.Blog, keys_only=True)
q.filter('accepted =', True)
q.order('__key__')
accepted_count = get_count(q)
q = db.Query(blog.Blog, keys_only=True)
q.filter('accepted =', False)
q.order('__key__')
blocked_count = get_count(q)
Note that
- Remove keys_only=True if you need to process data. And you will need to use r[-1].key() to filter.
- Add a resuming functionality because it really uses a lot of CPU time if it works on large set of data.
0 comments:
Post a Comment