Parsing specific field in XML file in Python -
i have xml file looks this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <feed xml:base="http://data.treasury.gov:8001/feed.svc/" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns="http://www.w3.org/2005/atom"> <title type="text">dailytreasuryyieldcurveratedata</title> <id>http://data.treasury.gov:8001/feed.svc/dailytreasuryyieldcurveratedata</id> <updated>2015-08-30t15:17:09z</updated> <link rel="self" title="dailytreasuryyieldcurveratedata" href="dailytreasuryyieldcurveratedata" /> <entry> <id>http://data.treasury.gov:8001/feed.svc/dailytreasuryyieldcurveratedata(6404)</id> <title type="text"></title> <updated>2015-08-30t15:17:09z</updated> <author> <name /> </author> <link rel="edit" title="dailytreasuryyieldcurveratedatum" href="dailytreasuryyieldcurveratedata(6404)" /> <category term="treasurydatawarehousemodel.dailytreasuryyieldcurveratedatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> <content type="application/xml"> <m:properties> <d:id m:type="edm.int32">6404</d:id> <d:new_date m:type="edm.datetime">2015-08-03t00:00:00</d:new_date> <d:bc_1month m:type="edm.double">0.03</d:bc_1month> <d:bc_3month m:type="edm.double">0.08</d:bc_3month> <d:bc_6month m:type="edm.double">0.17</d:bc_6month> <d:bc_1year m:type="edm.double">0.33</d:bc_1year> <d:bc_2year m:type="edm.double">0.68</d:bc_2year> <d:bc_3year m:type="edm.double">0.99</d:bc_3year> <d:bc_5year m:type="edm.double">1.52</d:bc_5year> <d:bc_7year m:type="edm.double">1.89</d:bc_7year> <d:bc_10year m:type="edm.double">2.16</d:bc_10year> <d:bc_20year m:type="edm.double">2.55</d:bc_20year> <d:bc_30year m:type="edm.double">2.86</d:bc_30year> <d:bc_30yeardisplay m:type="edm.double">2.86</d:bc_30yeardisplay> </m:properties> </content> </entry> <entry> <id>http://data.treasury.gov:8001/feed.svc/dailytreasuryyieldcurveratedata(6405)</id> <title type="text"></title> <updated>2015-08-30t15:17:09z</updated> <author> <name /> </author> <link rel="edit" title="dailytreasuryyieldcurveratedatum" href="dailytreasuryyieldcurveratedata(6405)" /> <category term="treasurydatawarehousemodel.dailytreasuryyieldcurveratedatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /> <content type="application/xml"> <m:properties> <d:id m:type="edm.int32">6405</d:id> <d:new_date m:type="edm.datetime">2015-08-04t00:00:00</d:new_date> <d:bc_1month m:type="edm.double">0.05</d:bc_1month> <d:bc_3month m:type="edm.double">0.08</d:bc_3month> <d:bc_6month m:type="edm.double">0.18</d:bc_6month> <d:bc_1year m:type="edm.double">0.37</d:bc_1year> <d:bc_2year m:type="edm.double">0.74</d:bc_2year> <d:bc_3year m:type="edm.double">1.08</d:bc_3year> <d:bc_5year m:type="edm.double">1.6</d:bc_5year> <d:bc_7year m:type="edm.double">1.97</d:bc_7year> <d:bc_10year m:type="edm.double">2.23</d:bc_10year> <d:bc_20year m:type="edm.double">2.59</d:bc_20year> <d:bc_30year m:type="edm.double">2.9</d:bc_30year> <d:bc_30yeardisplay m:type="edm.double">2.9</d:bc_30yeardisplay> </m:properties> </content> </entry> </feed>
how can parse out '2.16' 'bc_10year'? i've been looking @ other examples elementtree , lxml , can't seem match xml format in examples of file.
the last thing i've tried was:
from lxml import etree doc = etree.parse(yield_xml) memoryelem = doc.find('content') print memoryelem.text # element text print memoryelem.get('type') # attribute
i error: attributeerror: 'nonetype' object has no attribute 'text'
is there simple way this?
you may try built-in split method:
>>>[data.split('>')[1].split('<')[0] data in str(xml_file).split('<d:') if 'bc_10year' in data][0] '2.16'
Comments
Post a Comment