.. _api: API === .. module:: crawly This part of the documentation covers all the interfaces of crawly. Runner: ------- This class is not offered as a public interface, instead users should use ``runner`` module attribute that is an instance of ``_Runner``. .. autoclass:: crawly._Runner :inherited-members: Website structures: ~~~~~~~~~~~~~~~~~~~ .. autoclass:: crawly.WebSite :inherited-members: :members: .. autoclass:: crawly.Pagination :inherited-members: :members: .. autoclass:: crawly.WebPage :inherited-members: :members: .. autoclass:: crawly.HTML :inherited-members: Extraction tools: ~~~~~~~~~~~~~~~~~ .. autoclass:: crawly.XPath :inherited-members: :members: Exceptions: ~~~~~~~~~~~ .. autoexception:: crawly.ExtractionError Configuration: ~~~~~~~~~~~~~~ Crawly can be configured by passing a JSON formatted file in the ``--config`` command line option that will override the default configuration, which is a combinaison of `requests configuration `_ and `logging configuration `_. .. global-configuration: .. code-block:: python { 'timeout': 15, # Requests configuration: http://tinyurl.com/dyvdj57 'requests': { 'base_headers': { 'Accept': '*/*', 'Accept-Encoding': 'identity, deflate, compress, gzip', 'User-Agent': 'crawly/' + __version__ }, 'danger_mode': False, 'encode_uri': True, 'keep_alive': True, 'max_redirects': 30, 'max_retries': 3, 'pool_connections': 10, 'pool_maxsize': 10, 'safe_mode': True, # Default in False in requests. 'strict_mode': False, 'trust_env': True, 'verbose': False }, # Logging configuration: http://tinyurl.com/crt6rkw 'logging': { 'version': 1, 'formatters': { 'standard': { 'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s' } }, 'handlers': { 'console': { 'formatter': 'standard', 'class': 'logging.StreamHandler', } }, 'loggers': { '': { 'handlers': ['console'], 'level': 'DEBUG', 'propagate': False, } } } }