The original test has a flaw regarding the present of BeautifulSoup, please read the Regarding BeautifulSoup section.

Last time, I did a test on simplejson 2.1.3, it was actually a post for writing this one, I want to show you using feedparser causes quite some time.

Last month, feedparser 5.0 was released and just few days ago 5.0.1 was released. Before version 5, I was using 4.1, which I found its slow. But version 5.0 is even slower in Python 2.5 , much more more slower in Python 2.6. 5.0 claims to support Python 3, but Python 3 says the module has invalid syntax. So, I wont be knowing if it knows faster in Python 3.

Here is how I test. I downloaded 500 entries of Google Blog using Blogger API for format Atom and RSS, I also downloaded the JSON for comparison:

http://www.blogger.com/feeds/10861780/posts/default?max-results=500&alt=atom
http://www.blogger.com/feeds/10861780/posts/default?max-results=500&alt=rss
http://www.blogger.com/feeds/10861780/posts/default?max-results=500&alt=json

Because the execution/parsing time is really long, I simply used Bashs time built-in to get the time. The commands are as follow:

time python -c 'import feedparser as fp; fp.parse("test.atom")'
time python -c 'import feedparser as fp; fp.parse("test.rss")'
time python -c 'import simplejson as json; json.load(open("test.json"))'

Here is the result:

feedparser 5.0.1 4.1 2.1.3 + C
Python 2.5 2.6 2.5 2.6 2.5 2.6
Atom 14.515s 54.676s 5.939s 5.929s  
RSS 14.205s 53.185s 5.249s 5.286s
JSON   1.383s 1.372s

I dont think I need to explain the number, its very clear. I listed simplejson because I also did this:

simplejson + YQL for 100 entries: 4.445s

Using a YQL like:

select * from atom where url="http://www.blogger.com/feeds/10861780/posts/default?max-results=100&start-index=1&alt=atom"

and:

time python -c 'import simplejson as json; from urllib2 import urlopen; json.load(urlopen("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20atom%20where%20url%3D%22http%3A%2F%2Fwww.blogger.com%2Ffeeds%2F10861780%2Fposts%2Fdefault%3Fmax-results%3D100%26start-index%3D=1%26alt%3Datom%22&format=json&callback="))'

I found if I used 500 entries, YQL returns empty result. 250 seems to be the limit but simplejson shows error in the JSON. I dont know if YQL causes that or not, but 100 with 5 requests seems fine. The time is around 1 to 5 seconds, so multiply by 5 requests, the estimation will be upto 25 seconds in Python 2.6. And most of time is waiting for response from YQL, which doesnt utilize much CPU resource.

YQL is a great tool I have known. You can load a cross-domain XML file or get the HTTP error code for cross-domain JSONP. You can convert any format it supports to JSON.

feedparser is a great tool, too, but its really slow. I noticed because I have a script which loads many feeds using feedparser every ten minutes and it always is the TOP 1 CPU time user. After this test, I dont think I will use 5.0 and I will probably go back to use Python 2.5 to run the script. I might even switch to process JSON with YQLs help since the data is downloaded, it would be just a URL change.

I know its unfair to say feedparser is slow, if you only look at the time without considering the formats or the implementation, then its slow. And speed is only thing I do care. If there is one library implemented in C, I am sure the time will have huge improvement. I tried to search for one, but I cant find any.

1   Regarding BeautifulSoup

Added at 2011-09-16T17:17:30Z

Note

While checking upon a faster alternative, speedparser, I found out since version 5.2.1 (2015-07-24), FeedParser is no longer depending on BeautifulSoup and runs 326.57% faster than 5.1.3. (2015-08-21T23:44:43Z)

A search keywords in the visitor report got me Google it and found this issue1. Which explained why FeedParser 5.0.1 is much slower in my Python 2.6, the reason is the BeautifulSoup. At the time I tested, the BeautifulSoup was installed as a dependency of lxml on my Gentoo. I didnt notice and had no idea it would cause such performance slump.

Now, here is an updated test result for FeedParser 5.0.1:

Python 2.5.4 2.6.6 2.7.1
BeautifulSoup 3.2.0 Atom RSS Atom RSS Atom RSS
With 23.026s 23.785s 48.466s 50.220s 20.864s 21.373s
Without 14.425s 14.658s 11.556s 12.001s 12.983s 13.275s

Though 5.0.1 is still slower than 4.1, but with Python 2.6 is actually slightly faster than with Python 2.5 or Python 2.7, from which I made a wrong judgment (if you dont have BeautifulSoup installed) in the original test.

[1]http://code.google.com/p/feedparser/issues/detail?id=300 is gone.