.. _faq: FAQ === Existing study: Why you shouldn't use **Crawly** ? ++++++++++++++++++++++++++++++++++++++++++++++++++ First of all, have you checked scrapy (http://scrapy.org) ? if not, you should, it's a very powerful framework, but in my case and unfortunately i have found some drawbacks with Scrapy which lead me to create **Crawly**, which are: * Scrapy was too big and too hard to hack in, as i had some problems with it, especially concerning consistency of scraped data which is a huge problem when it came to scapers, but a lot of spaghetti code make it also very hard to dig in :). * Most website just look the same (at least the one that i crawled) but scrapy didn't help make my code clean because of a lot of boilerplate. * Scrapy is huge in term of architecture (scrapyd, web interface, ...), and all of this was consuming a lot of memory and my little server wasn't able to support, so it was crushing other process each time scrapy start crawling. Define the need: Why I have created **Crawly** ? ++++++++++++++++++++++++++++++++++++++++++++++++ Because i love micro-frameworks (Flask VS Django) and because i believe that :: Inside every large, complex program is a small, elegant program that does the same thing, correctly -- Tony Hoare And because i wanted to fix all the problems listed above without having to dig in Scrapy, and when i estimated the cost of digging into scrapy and the cost of me creating a new crawler library and what i will gain, well guess what ?! Goals: What should a crawler library do ? +++++++++++++++++++++++++++++++++++++++++ IMHO, a crawler library should (**not** in order of importance): - Simple Usage: Make it easy to instruct the library to crawl a given website, by handling most common pattern existing for website design, for example: single page, list->detail, paginate->list->detail and such, and make it easy to extend for special website. - Feedback: Log everything to user. - Configurable: Something that all library should offer. - Encoding: Handle all HTML encoding (utf8, latin1 ...). - Scraping: Give developer easy way to extract data from a website, using XPath for example. - Speed: crawling a website should be fast. - Play nice: by handling rate limits, so we don't DOS the servers. Status: How is Crawly compared to Scrapy ? ++++++++++++++++++++++++++++++++++++++++++ * Speed: In term of speed i can tell you with confidence that Crawly is very fast and that all thanks to Gevent. Most of the times in my tests i remarked that Crawly was a **little bit faster** than Scrapy, but nothing very noticeable (few seconds of difference) because Scrapy is already very fast. * Memory: Well Crawly is small so in term of memory and it's very light. * Usage/Simplicity: Well i may be a little biased on this one, but that was on of the main reason for me to create Crawly. * Features: For the mean time Scrapy has a lot of feature that don't have a match in Crawly. * Maturity: Crawly is still new at this stage while Scrapy is very mature Open Source project so nothing to compare here :)