.. crawly documentation master file, created by
sphinx-quickstart on Tue Oct 23 19:13:38 2012.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
==================================
Crawly: Micro crawler for Python
==================================
**Crawly** is a Python library that allow to crawl website and extract data
from this later using a simple API.
**Crawly** work by combining different tool, that ultimately created a small
library (~350 lines of code) that fetch website HTML, crawl it (follow links)
and extract data from each page.
Libraries used:
* `requests `_ It's a Python
HTTP library, it's used by **crawly** to fetch website HTML, this library
take care of maintaining the Connection Pool, it's also easily configurable
and support a lot of feature including: SSL, Cookies, Persistent requests,
HTML decoding ... .
* `gevent `_ This is the engine responsible of the speed in
crawly, with gevent you can run concurrent code, using green thread.
* `lxml `_ a fast, easy to use Python library that used to parse
the HTML fetched to help extracting data easily.
* `logging `_ Python standard library module that log information, also easily
configurable.
User Guide:
-----------
.. toctree::
:maxdepth: 2
install
api
examples
faq