Home > Performance > Google App Engine Tuning Note

Google App Engine Tuning Note

June 1st, 2012

This post documents some of my experience and thoughts on tuning Google App Engine for production( Push Queue in more specific), and explains some of the performance characteristics and catches I’ve learnt.

Please feel free to correct and complete this note. After reading this note, audience should understand what GAE push queue is, and have basic but important concepts ready to setup their own queues, this is my original goal at least.

GAE push queue provides a wide spectrum of parameters for behavior definition, and a set of metrics for monitoring. Having a good understanding on these parameters is essential part of being able to make good judgement and educated guess during performance tuning.

Let’s start by answering some essential concepts:

Q. What is GAE push queue?

A. Essentially it is a weak FIFO queue and a pusher that respects your configuration (rate/max concurrency etc) and deliver tasks(items in the queue) to your service endpoint automatically, as a result you don’t need to worry about having another frabric to control the workflow/rate etc for taking out items from the queues. reference : https://developers.google.com/appengine/docs/java/taskqueue/overview-push

Q. What are the parameters of a push queue?

A. In GAE you need to setup at least the following essential parameters.

Maximum Rate:
It is the OVERALL parameters to control how fast the item delivered.
Example: Delivery 5 items per second.

Bucket Size:
The rate control algorithm used is bucket token, token is regenerated/filled according to the rate. The number of max token essentially controls the max burst rate when you have sufficient worker available.

Max Concurrency:
The number of maximum concurrent worker allowed. Traditionally it is equals to the number to instances available(will talk about that later), BUT if you are using java and python 2.7 and set it allow concurrency, then the story is a bit different.
Depends on the nature of tasks, max concurrency on a single instance is determined by cpu or IO bound. For example, if a task is heavily IO bound, the max concurrency on a SINGLE instance can be larger than the task is heavily CPU bound.

Q: What are instances in GAE?

A: GAE instnace is a lightweight processing unit for servicing your requests, traditionally ONE instance serves a SINGLE request  at the same time(so a request blocks the instance till finishes), but in Python 2.7+ and Java it can have concurrent requests served in a single instance.
There are two types of instances: Backend and Frontend. The essentially differences are
Front end instance is defualt type, no special requirements to setup needed and fully managed by Google. Also each request (task) must be finished within 10 minutes.
Backend instance has more memory and CPU power but requires special setup (like backend.yaml in python. You can also specify instance class(B2/B4/B8 and $/instance hr is different..) and mode(residential and dynamic) see https://developers.google.com/appengine/docs/python/config/backends#Types_of_Backends for details.

Q: How to get the right parameters setup for my task, and how many instances do I need, or what kind of instances do I need…?

A: Depends on your task. So If your task is IO bound, I prefer to have more instances than larger instance to get max throughput. For example, I prefer 8 B2 instances than 4 B4 instances for downloading images from provider (obviously, you need to make sure the concurrent request number does’t kill your provider). And I would monitor the QPS(query per second) and Latency matrices on instances closely, they can help me make up the right decision what should be best.

Q: How can I know if my task is IO bound or CPU bound.

A: Good question, it depends! You need to have monitor setup (GAE doesn’t provide machine level transparency). If you instance is too small (like B2), your task can be CPU bound easily, and when you move to B4 and you may see the laterncy reduced to half. However when you move to B8, you may see the improvement is not that visible, since it now becomes IO bound. So when to have more instances (scale out) orlarger instances(scale up), it is the art of perf tuning, if you cannot measure then you cannot improve!

Q: What if I feel my queue is running too slow, and how can I identify the bottleneck?

A: First make sure your enforced Rate for your queue is same as your Max Rate. If so your rate is capping your performance in term of pushing item into your instance. If you find out the enforced rate is signification less than the max rate, there are following things to look at:

Take a look at queue definition first, and make sure you have more tokens than your max concurrent instances(so you can have more than one request per instance at the same time)
Take a look at max concurrent request, and make sure it is NOT max out, if so which you may need more requests (if you are allowing concurrent requests per instance, and make sure it is larger than your instance #)
Take a look at the latency of the request (see from instance), and QPS, as well as the average CPU(better to <80% CPU usage from my experience), and figure out if you need more instances or largest instances( Depends on IO bound of CPU bound).
Make sure your upstream and downstream dependencies are in good shape in serving
See wiki : http://en.wikipedia.org/wiki/Token_bucket

Q: My app is running happily, what else do I need to look at?

A: Running fine doesn’t mean running efficiently, you should try to make the best use of money and do the trade off between what exactly you need and budget. I generally prefer dynamic than residential instance on spotty burst processing requests; sometimes have slight more instances, for example 6 x B4 may have the same throughput as 4 x B8, or separating multi-phrases tasks and putting IO bound ones on lower end instances and CPU bound on higher end.

Just to be creative, saving a few instance hours here and there, and make better millage for our $$..

Categories: Performance Tags:
Comments are closed.