.. crawly documentation master file, created by
   sphinx-quickstart on Tue Oct 23 19:13:38 2012.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

==================================
Crawly: Micro crawler for Python
==================================

**Crawly** is a Python library that allow to crawl website and extract data
from this later using a simple API.

**Crawly** work by combining different tool, that ultimately created a small
library (~350 lines of code) that fetch website HTML, crawl it (follow links) 
and extract data from each page.

Libraries used:

* `requests <http://docs.python-requests.org/>`_ It's a Python
  HTTP library, it's used by **crawly** to fetch website HTML, this library
  take care of maintaining the Connection Pool, it's also easily configurable
  and support a lot of feature including: SSL, Cookies, Persistent requests,
  HTML decoding ... .
* `gevent <http://gevent.org/>`_ This is the engine responsible of the speed in
  crawly, with gevent you can run concurrent code, using green thread.
* `lxml <http://lxml.de/>`_ a fast, easy to use Python library that used to parse
  the HTML fetched to help extracting data easily.
* `logging <http://docs.python.org/library/logging.html>`_ Python standard library module that log information, also easily
  configurable.

User Guide:
-----------

.. toctree::
   :maxdepth: 2

   install
   api
   examples
   faq