Scrapoxy

_images/logo.png

What is Scrapoxy ?

http://scrapoxy.io

Scrapoxy hides your scraper behind a cloud.

It starts a pool of proxies to send your requests.

Now, you can crawl without thinking about blacklisting!

It is written in Javascript (ES6) with Node.js & AngularJS and it is open source!

How does Scrapoxy work ?

  1. When Scrapoxy starts, it creates and manages a pool of proxies.

  2. Your scraper uses Scrapoxy as a normal proxy.

  3. Scrapoxy routes all requests through a pool of proxies.

_images/arch.gif

What Scrapoxy does ?

  • Create your own proxies

  • Use multiple cloud providers (AWS, DigitalOcean, OVH, Vscale)

  • Rotate IP addresses

  • Impersonate known browsers

  • Exclude blacklisted instances

  • Monitor the requests

  • Detect bottleneck

  • Optimize the scraping

Why Scrapoxy doesn’t support anti-blacklisting ?

Anti-blacklisting is a job for the scraper.

When the scraper detects blacklisting, it asks Scrapoxy to remove the proxy from the proxies pool (through a REST API).

What is the best scraper framework to use with Scrapoxy ?

You could use the open source Scrapy framework (Python).

Does Scrapoxy have a SaaS mode or a support plan ?

Scrapoxy is an open source tool. Source code is highly maintained. You are very welcome to open an issue for features or bugs.

If you are looking for a commercial product in SaaS mode or with a support plan, we recommend you to check the ScrapingHub products (ScrapingHub is the company which maintains the Scrapy framework).

Documentation

You can begin with the Quick Start or look at the Changelog.

Now, you can continue with Standard, and become an expert with Advanced.

And complete with Tutorials.

Prerequisite

Contribute

You can open an issue on this repository for any feedback (bug, question, request, pull request, etc.).

License

See the License.

And don’t forget to be POLITE when you write your scrapers!