XPath/Python - How to get different html tags and text inside a <div> -

i'm trying scrape html content @ url: http://www.dlib.org/dlib/november14/beel/11beel.html python sintax:

    s="http://www.dlib.org/dlib/november14/beel/11beel.html"     content = requests.get(s)     tree = html.fromstring(content.text)     titoli = tree.xpath('/html/body/form/table[3]/tr/td/table[5]/tr/td/table[1]/tr/td[2]/h3/text()')     par = tree.xpath('/html/body/form/table[3]/tr/td/table[5]/tr/td/table[1]/tr/td[2]/p/text()')     articoli = json.dumps({'titoli':titoli,'contenuti':par})     print ("content-type: json")     print     print (articoli)

the main request find xpath query return every tags, tags content , text inside useful div of page, can find path /html/body/form/table[3]/tr/td/table[5] or using web inspector under commented line: !-- content table --. code i've posted before not possible entire content of div titles , text inside p div, can't find way.

to actual html content of section of website using python/xpath, easier use from lxml import etree instead of from lxml import html. when set element tree, there function allows return html content of element, rather returning text content (as mentioned). code follows:

from lxml import etree import requests  s = "http://www.dlib.org/dlib/november14/beel/11beel.html" page = requests.get(s) tree = etree.html(page.text) element = tree.xpath('./body/form/table[3]/tr/td/table[5]') content = etree.tostring(element[0])

tree.xpath returns list of selected elements. in case, because using specific xpath, returns list containing 1 element. therefore have use etree.tostring(element[0]) access first element of list , return html content of element string.

Search This Blog

Guide

XPath/Python - How to get different html tags and text inside a <div> -

Comments

Post a Comment

Popular posts from this blog

swift - Button on Table View Cell connected to local function -

dns - Dokku server hosts two sites with TLD's, both domains are landing on only one app -

c# - ajax - How to receive data both html and json from server? -