python 2.7 - Scrapy - not retuning data -
i new scrapy , trying extract tweets https://twitter.com/abctv. know there api , learning exercise. code returns 0 tweets. item definition:
import scrapy class tweet(scrapy.item): username = scrapy.field() data = scrapy.field() created_at= scrapy.field() retweet_count = scrapy.field()
my crawler definition:
class twitterspider(scrapy.spider): name = "twitter" allowed_domains = ["https://twitter.com/mycpl"] start_urls = ["https://twitter.com/mycpl"] def parse(self, response): sel=selector(response) tweets=sel.xpath('//div[@class="content"]') items = [] tweet in tweets: item = tweet() item['username']=tweet.xpath('.//*[starts-with(@class,"username")]//text()').extract() item['created_at']=tweet.xpath('.//*[starts-with(@class,"_timestamp")]//@data-time-ms').extract() item['retweet_count']=tweet.xpath('.//*[starts-with(@class,"profiletweet-actioncountforpresentation")]/text()').extract() item['data']=tweet.xpath('p//text()').extract() items.append(item) return items
please assist learning , explain why not receiving responses when run scrapy crawl twitter -o test -t json.
edited -- fixed xpath code , works (only 20 because of scroll infinite scroll issue)
Comments
Post a Comment