I got an email from Jay yesterday commenting on Text processing fun. His mail server bounced my reply and also my request for permission to publish his message.
Jay created an elegant generator which over the same dataset of 1m rows, 180 columns processed the data a full 5 seconds faster than the list comprehensions in my code.
Here is my reply:
G'day Jay,
Great to hear from you.
Your code is very fast by comparison to the example, and very pythonic.
The whole thing can pretty much be achieved in one line:
rows = csv.DictReader(open('BigFile.csv', 'rb'))
but that doesn't really make an interesting post about using cProfile ;)
Thanks for your note - I hope you stick with python.
cheers, Cam.
I guess the post succeeded in one respect: Jay used cProfile/profile to prove his more elegant code was faster! But I missed the mark in trying to show that by profiling a quick but poor performing hack, you can satisfice and move on.
Anyhow, thanks Jay. I'd love to publish your note if you get in contact.

Feed
Comments 2
Hi! I read with interest your post on text processing in python. I was actually trying to do the same thing - but I managed to get it down in Perl! My question is - that I have tried searching a lot for XML processing modules in python for really large XML files but haven't quite found anything similar (in terms of speed) to XML::Twig in Perl... Could you please offer some suggestions on XML processing modules for very large files in Python?
Thanks!
Posted May 30, 2008 at 6:45 a.m. ¶Hi Bart,
Sorry I missed your comment in my spam bucket :(
I hope you found lxml
Posted June 5, 2008 at 2:43 p.m. ¶Post a Comment