.. crawly documentation master file, created by sphinx-quickstart on Tue Oct 23 19:13:38 2012. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. ================================== Crawly: Micro crawler for Python ================================== **Crawly** is a Python library that allow to crawl website and extract data from this later using a simple API. **Crawly** work by combining different tool, that ultimately created a small library (~350 lines of code) that fetch website HTML, crawl it (follow links) and extract data from each page. Libraries used: * `requests `_ It's a Python HTTP library, it's used by **crawly** to fetch website HTML, this library take care of maintaining the Connection Pool, it's also easily configurable and support a lot of feature including: SSL, Cookies, Persistent requests, HTML decoding ... . * `gevent `_ This is the engine responsible of the speed in crawly, with gevent you can run concurrent code, using green thread. * `lxml `_ a fast, easy to use Python library that used to parse the HTML fetched to help extracting data easily. * `logging `_ Python standard library module that log information, also easily configurable. User Guide: ----------- .. toctree:: :maxdepth: 2 install api examples faq