incorporating dejaMoo: best of breed bull…

Text processing fun Pt 2.

cProfile good, example code bad.

I got an email from Jay yesterday commenting on Text processing fun. His mail server bounced my reply and also my request for permission to publish his message.

Jay created an elegant generator which over the same dataset of 1m rows, 180 columns processed the data a full 5 seconds faster than the list comprehensions in my code.

Here is my reply:

G'day Jay,

Great to hear from you.

Your code is very fast by comparison to the example, and very pythonic.

The whole thing can pretty much be achieved in one line:

rows = csv.DictReader(open('BigFile.csv', 'rb'))

but that doesn't really make an interesting post about using cProfile ;)

Thanks for your note - I hope you stick with python.

cheers, Cam.

I guess the post succeeded in one respect: Jay used cProfile/profile to prove his more elegant code was faster! But I missed the mark in trying to show that by profiling a quick but poor performing hack, you can satisfice and move on.

Anyhow, thanks Jay. I'd love to publish your note if you get in contact.

Comments (2) § Posted by in on
AddThis Social Bookmark Button

Comments 2

  1. Bart J wrote:

    Hi! I read with interest your post on text processing in python. I was actually trying to do the same thing - but I managed to get it down in Perl! My question is - that I have tried searching a lot for XML processing modules in python for really large XML files but haven't quite found anything similar (in terms of speed) to XML::Twig in Perl... Could you please offer some suggestions on XML processing modules for very large files in Python?

    Thanks!

    Posted May 30, 2008 at 6:45 a.m.
  2. Cam wrote:

    Hi Bart,

    Sorry I missed your comment in my spam bucket :(

    I hope you found lxml

    Posted June 5, 2008 at 2:43 p.m.

Post a Comment

All fields are required. Markdown is encouraged!


Tweet Tweet

Stuffs

Thanks for dropping in.

This is the personal website of Cam MacRae. Any opinions expressed here are my entirely own, and have jack to do with my employer.

It's the product of a little elbow grease, the news.ycombinator noprocrast feature, and a healthy dose of Django.

A Django site.

Tags

  1. D (1)
  2. SOA (1)
  3. ajax (2)
  4. apollo (1)
  5. architecture (1)
  6. blogs (2)
  7. carsales (1)
  8. collaboration (1)
  9. css (1)
  10. django (9)
  11. duels (1)
  12. email (1)
  13. erlang (3)
  14. findability (1)
  15. flex (3)
  16. folksonomies (1)
  17. funny (2)
  18. geek (20)
  19. google (3)
  20. innovation (1)
  21. iphone (1)
  22. javascript (4)
  23. jython (1)
  24. life (5)
  25. lighttpd (1)
  26. lisp (1)
  27. mac (1)
  28. macbook (1)
  29. marketing (1)
  30. open-source (1)
  31. oracle (2)
  32. python (6)
  33. rails (2)
  34. ruby (1)
  35. silverlight (1)
  36. skitch (1)
  37. startups (4)
  38. tech (21)
  39. twitter (1)
  40. usability (1)
  41. web20 (6)
  42. work (3)
  43. yui (2)
ten1000miles.com | Aussie Blogs |  Feed

Creative Commons License This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 Unported License.