.. _faq:

FAQ
===

Existing study: Why you shouldn't use **Crawly** ?
++++++++++++++++++++++++++++++++++++++++++++++++++

First of all, have you checked scrapy (http://scrapy.org) ? if not, you should,
it's a very powerful framework, but in my case and unfortunately i have found
some drawbacks with Scrapy which lead me to create **Crawly**, which are:

* Scrapy was too big and too hard to hack in, as i had some problems with it,
  especially concerning consistency of scraped data which is a huge problem
  when it came to scapers, but a lot of spaghetti code make it also very hard
  to dig in :).
* Most website just look the same (at least the one that i crawled) but scrapy
  didn't help make my code clean because of a lot of boilerplate.
* Scrapy is huge in term of architecture (scrapyd, web interface, ...), and all
  of this was consuming a lot of memory and my little server wasn't able to 
  support, so it was crushing other process each time scrapy start crawling.

Define the need: Why I have created **Crawly** ?
++++++++++++++++++++++++++++++++++++++++++++++++

Because i love micro-frameworks (Flask VS Django) and because i believe that ::

  Inside every large, complex program is a small, elegant program that does the same thing, correctly -- Tony Hoare

And because i wanted to fix all the problems listed above without having to dig
in Scrapy, and when i estimated the cost of digging into scrapy and the cost
of me creating a new crawler library and what i will gain, well guess what ?!


Goals: What should a crawler library do ?
+++++++++++++++++++++++++++++++++++++++++

IMHO, a crawler library should (**not** in order of importance):

- Simple Usage: Make it easy to instruct the library to crawl a given website,
  by  handling most common pattern existing for website design, for example:
  single page, list->detail, paginate->list->detail and such, and make it
  easy to extend for special website.
- Feedback: Log everything to user.
- Configurable: Something that all library should offer.
- Encoding: Handle all HTML encoding (utf8, latin1 ...).
- Scraping: Give developer easy way to extract data from a website, using
  XPath for example.
- Speed: crawling a website should be fast.
- Play nice: by handling rate limits, so we don't DOS the servers.

Status: How is Crawly compared to Scrapy ?
++++++++++++++++++++++++++++++++++++++++++

* Speed: In term of speed i can tell you with confidence that Crawly is very
  fast and that all thanks to Gevent. Most of the times in my tests i remarked
  that Crawly was a **little bit faster** than Scrapy, but nothing very
  noticeable (few seconds of difference) because Scrapy is already very fast.
* Memory: Well Crawly is small so in term of memory and it's very light.
* Usage/Simplicity: Well i may be a little biased on this one, but that was
  on of the main reason for me to create Crawly.
* Features: For the mean time Scrapy has a lot of feature that don't have a
  match in Crawly.
* Maturity: Crawly is still new at this stage while Scrapy is very mature
  Open Source project so nothing to compare here :)