最近在做爬虫遇到要解析XML数据的问题
研究了一下,推荐大家使用 ElementTree 来解析,感觉 还不错哈,
例如,你想要解析:
<?xml version="1.0"?>
<doc>
<branch name="testing" hash="1cdf045c">
text,source
</branch>
<branch name="release01" hash="f200013e">
<sub-branch name="subrelease01">
xml,sgml
</sub-branch>
</branch>
<branch name="invalid">
</branch>
</doc>
找出branch这个结点的属性数据的方法是:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='doc1.xml')
for elem in tree.iter(tag='branch'):
print elem.tag, elem.attrib
打印出:
branch {'hash': '1cdf045c', 'name': 'testing'}
branch {'hash': 'f200013e', 'name': 'release01'}
branch {'name': 'invalid'}
参考:
- http://pycoders-weekly-chinese.readthedocs.org/en/latest/issue6/processing-xml-in-python-with-element-tree.html
- https://docs.python.org/2/library/xml.etree.elementtree.html