Using a recursive generator and ZopeFind to walk the Zope database

For migrating a very large Plone 3 instance to Plone 4 we wanted to walk through the Zope database and avoid using the Plone catalog. Looping through the results for ZopeFind with search_sub=1 (which includes sub folders) means that it takes a long time to generate a massive list of results first, before it can do anything with them. With a large database this also uses a lot of RAM. What we needed was a recursive generator, but it took me a long time to wrap my head around how to write one. This article explains it nicely: http://linuxgazette.net/100/pramode.html.

I didn’t get it until I realized that when you call the function you get back a generator (because you yield a value). This is why you have to loop through your recursive call. Python generators were introduced about 10 years ago, I’m only starting to realize what I’ve been missing!

portal = app.portal

def walk(node):
    for idx, sub_node in node.ZopeFind(node, search_sub=0):
        yield sub_node
        if getattr(sub_node, "meta_type", "") in ['ATBTreeFolder', 'ATFolder']:
            for sub_sub_node in walk(sub_node):
                yield sub_sub_node

walker = walk(portal)

Incidentally, I came across a very creative solution to traversing the Zope database in collective.solr:

https://github.com/Jarn/collective.solr/blob/master/src/collective/solr/utils.py#L126

The trick here is that it loops over a list of paths, which it updates inside the loop!