UsageΒΆ

Fetch 10 articles from Iltalehti with ILArticle with the help of scrape and get -methods:

from finscraper.spiders import ILArticle

spider = ILArticle()
spider.scrape(10)
articles = spider.get()

Use save and load to continue scraping later on:

save_dir = spider.save()
spider = ILArticle.load(save_dir)
articles = spider.scrape(10).get()  # 20 articles in total

Items are fetched into spider.jobdir -directory which is destroyed together with the spider -object unless spider.save() have been called.

Because Scrapy is used under the hood, any of its settings can be passed into scrape -method. For example, to limit the number of concurrent requests per domain:

from finscraper.spiders import ILArticle

settings = {'CONCURRENT_REQUESTS_PER_DOMAIN': 1}

spider = ILArticle().scrape(10, settings=settings)
articles = spider.get()