parse big(xxGb) XML file via cElementTree
outperforms SAX by easy-to-use and DOM by low-memory
"Overview":http://effbot.org/zone/element-index.htm
Example to parse Entrezgene XML format file generated by gene2xml:
import cElementTree as ElementTree
source = 'Entrezgene.xml'
for event, elem in ElementTree.iterparse(source):
if elem.tag == 'Entrezgene':
# Process the Entrezgene element
geneid = elem.findtext('Entrezgene_track-info/Gene-track/Gene-track_geneid')
print 'Gene id', geneid
# Throw away the element, release memory.
# clear outside 'if' could lead to an empty whole tree
elem.clear()
cElementTree is faster and less-memory version of ElementTree.
Watch the namespace in xml file(12-29-05) -- something like 'xmlns="my_space"' in the top-level XML tag will cause the element's tag become '{my_space}Entrezgene/{my_space}Entrezgene_track-info'.