Personal tools
You are here: Home log python parse big(xxGb) XML file via cElementTree
Document Actions

parse big(xxGb) XML file via cElementTree

outperforms SAX by easy-to-use and DOM by low-memory

"Overview":http://effbot.org/zone/element-index.htm

Example to parse Entrezgene XML format file generated by gene2xml:

import cElementTree as ElementTree
source = 'Entrezgene.xml'
for event, elem in ElementTree.iterparse(source):
  if elem.tag == 'Entrezgene':
    # Process the Entrezgene element
    geneid = elem.findtext('Entrezgene_track-info/Gene-track/Gene-track_geneid')
    print 'Gene id', geneid

    # Throw away the element, release memory. 
    # clear outside 'if' could lead to an empty whole tree
    elem.clear()

cElementTree is faster and less-memory version of ElementTree.

Watch the namespace in xml file(12-29-05) -- something like 'xmlns="my_space"' in the top-level XML tag will cause the element's tag become '{my_space}Entrezgene/{my_space}Entrezgene_track-info'.

« November 2009 »
Su Mo Tu We Th Fr Sa
1234567
89101112 1314
1516171819 20 21
22232425262728
2930
 

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: