Scrapes the web for reliable http/https proxies and tests them for speed and reliability. Can be used as both a python module and a command line utility. When run as a command line utility, proxies are sent to stdout. When run as a module, it returns a generator.
Check out the project on PyPi at https://pypi.python.org/pypi/grey_harvest/0.1.3.5
- Quickly and easily generate a list of reliable http/https proxies
- Usable as a comamnd line utility or a python module
- Can filter for proxies that support SSL
- Can filter for proxies locationed within specific countries
- Can exclude proxies located within specific countries
First, install the following dependencies:
# On Centos/RHEL/Fedora: sudo yum install python-devel libxlt-devel libxml-devel # On Debian/Ubuntu: sudo apt-get install python-dev libxml2-dev libxslt1-dev
Then install grey_harvest using pip as follows:
pip install grey_harvest
We can generate a list of 10 viable proxies with the following command:
# use the -n flag to specify number of proxies to generate grey_harvest -n 10
To select only proxies with SSL enabled, we do this:
# use the -H flag to select only https proxies grey_harvest -n 10 -H
We can use the -a flag to filter for proxies located within a list of specific countries. For example, to choose proxies located within Ukraine, Hong Kong, and the United States, we'd use this:
# use the -a flag to filter by country grey_harvest -a "United States" "Hong Kong" Ukraine -n 10
We can use the -p flag to filter for ports running on specific ports:
# the -p flag to only use proxies that run on port 80 grey_harvest -p 80 -n 10
We can deny proxies located within specific countries by using the -d flag. Proxies located within China are blocked by default as they are often located behind the Great Firewall, and as such tend to be unreliable. This can be changed within grey_harvest.py's internal configs.:
# use the -d flag to deny proxies located within France and # Germany grey_harvest -d France Germany -n 10 -H
Before diving into the documentation for the grey_harvest library, check out how easily we can generate a list of 20 proxies:
import gray_harvest ''' spawn a harvester ''' harvester = grey_harvest.GreyHarvester() ''' harvest some proxies from teh interwebz ''' count = 0 for proxy in harvester.run(): print proxy count += 1 if count >= 20: break
That's it. We now have 20 http/https proxies ready to go.