# Python Diffbot API Client


## Preface
Identify and extract the important parts of any web page in Python!  This client currently supports calls to Diffbot's Automatic APIs and Crawlbot.


Installation
To install activate a new virtual environment and run the following command:

    $ pip install -r requirements.txt

## Configuration

To run the example, you must first configure a working API token in config.py:

    $ cp config.py.example config.py; vim config.py;

Then replace the string "SOME_TOKEN" with your API token.  Finally, to run the example:

    $ python example.py

## Usage

### Article API
An example call to the Article API:

```
diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://shichuan.github.io/javascript-patterns/"
api = "article"
response = diffbot.request(url, token, api, version=2)
```

### Product API
An example call to the Product API:

```
diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.overstock.com/Home-Garden/iRobot-650-Roomba-Vacuuming-Robot/7886009/product.html"
api = "product"
response = diffbot.request(url, token, api, version=version)
```

### Image API
An example call to the Image API:

```
diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.google.com/"
api = "image"
response = diffbot.request(url, token, api, version=version)
```

### Analyze API
An example call to the Analyze API:

```
diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.twitter.com/"
api = "analyze"
response = diffbot.request(url, token, api, version=version)
```

### Crawlbot API
To start a new crawl, specify a crawl name, seed URLs, and the API via which URLs should be processed. An example call to the Crawlbot API:

```
token = "SOME_TOKEN"
name = "sampleCrawlName"
seeds = "http://www.twitter.com/"
api = "analyze"
sampleCrawl = DiffbotCrawl(token,name,seeds=seeds,api=api)
```

Omit "seeds" and "api" to load an existing crawl, or create a crawl as a placeholder.

To check the status of a crawl:

```
sampleCrawl.status()
```

To update a crawl:

```
maxToCrawl = 100
upp = "diffbot"
sampleCrawl.update(maxToCrawl=maxToCrawl,urlProcessPattern=upp)
```

To delete or restart a crawl:

```
sampleCrawl.delete()
sampleCrawl.restart()
```

To download crawl data:

```
sampleCrawl.download() # returns JSON by default
sampleCrawl.download(data_format="csv")
```

To pass additional arguments to a crawl:

```
sampleCrawl = DiffbotCrawl(token,name,seeds,apiUrl,maxToCrawl=100,maxToProcess=50,notifyEmail="support@diffbot.com")
```

## Testing

First install the test requirements with the following command:

    $ pip install -r test_requirements.txt

Currently there are some simple unit tests that mock the API calls and return data from fixtures in the filesystem.  From the project directory, simply run:

    $ nosetests