Usage
Fetch 10 articles from Iltalehti with ILArticle
with the help of scrape
and
get
-methods:
from finscraper.spiders import ILArticle
spider = ILArticle()
spider.scrape(10)
articles = spider.get()
Use save
and
load
to continue scraping later on:
save_dir = spider.save()
spider = ILArticle.load(save_dir)
articles = spider.scrape(10).get() # 20 articles in total
Items are fetched into spider.jobdir
-directory which is destroyed
together with the spider
-object unless spider.save()
have been called.
Because Scrapy is used under the hood, any of
its settings
can be passed into scrape
-method.
For example, to limit the number of concurrent requests per domain:
from finscraper.spiders import ILArticle
settings = {'CONCURRENT_REQUESTS_PER_DOMAIN': 1}
spider = ILArticle().scrape(10, settings=settings)
articles = spider.get()