javascript - Scrapy - making selections from dropdown(e.g.date) on webpage -


i'm new scrapy , python , trying scrape data off following start url .

after login, start url--->

start_urls = ["http://www.flightstats.com/go/historicalflightstatus/flightstatusbyflight.do?"]

(a) there need interact webpage select ---by-airport---

and make ---airport, date, time period selection---

how can that? loop on time periods , past dates..

  • i have used firebug see source, cannot show here not have enough points post images..

  • i read post mentioning use of splinter..

(b) after selections lead me page there links eventual page information want. how populate links , make scrapy every 1 extract information?

-using rules? should insert rules/ linkextractor function?

i willing try myself, hope can given find posts can guide me.. student , have spent more week on this.. have done scrapy tutorial, python tutorial, read scrapy documentation , searched previous posts in stackoverflow did not manage find posts cover this.

a million thanks.

my code far log-in , items scrape via xpath eventual target site:

`import scrapy  tutorial.items import flightitem scrapy.http import formrequest  class flightspider(scrapy.spider):     name = "flight"     allowed_domains = ["flightstats.com"]     login_page =     'https://www.flightstats.com/go/login/login_input.do;jsessionid=0dd6083a334aade3fd6923acb8ddcaa2.web1:8009?'     start_urls = [     "http://www.flightstats.com/go/historicalflightstatus/flightstatusbyflight.do?"]  def init_request(self):     #"""this function called before crawling starts."""     return request(url=self.login_page, callback=self.login)  def login(self, response):  #"""generate login request."""     return formrequest.from_response(response,formdata= {'loginform_email': 'marvxxxxxx@hotmail.com', 'password': 'xxxxxxxx'},callback=self.check_login_response)  def check_login_response(self, response):         #"""check response returned login request see if aresuccessfully logged in."""     if "sign out" in response.body:         self.log("\n\n\nsuccessfully logged in. let's start crawling!\n\n\n")          # crawling can begin..      return self.initialized() # ****this line fixed last problem*****      else:             self.log("\n\n\nfailed, bad times :(\n\n\n")          # went wrong, couldn't log in, nothing happens.  def parse(self, response):     sel in response.xpath('/html/body/div[2]/div[2]/div'):         item = flightstatsitem()         item['flight_number'] = sel.xpath('/div[1]/div[1]/h2').extract()         item['aircraft_make'] = sel.xpath('/div[4]/div[2]/div[2]/div[2]').extract()         item['dep_date'] = sel.xpath('/div[2]/div[1]/div').extract()         item['dep_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[1]').extract()         item['arr_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[2]').extract()         item['dep_gate_scheduled'] = sel.xpath('/div[2]/div[2]/div[1]/div[2]/div[2]').extract()         item['dep_gate_actual'] = sel.xpath('/div[2]/div[2]/div[1]/div[3]/div[2]').extract()         item['dep_runway_actual'] = sel.xpath('/div[2]/div[2]/div[2]/div[3]/div[2]').extract()         item['dep_terminal'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[1]').extract()         item['dep_gate'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[2]').extract()         item['arr_gate_scheduled'] = sel.xpath('/div[3]/div[2]/div[1]/div[2]/div[2]').extract()         item['arr_gate_actual'] = sel.xpath('/div[3]/div[2]/div[1]/div[3]/div[2]').extract()         item['arr_terminal'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[1]').extract()         item['arr_gate'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[2]').extract()          yield item` 


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -