Skip to content

Guide to integrating proxies with CloudScraper, covering proxy setup, rotation, authenticated proxies, and premium proxy integration

Notifications You must be signed in to change notification settings

luminati-io/Cloudscraper-with-proxies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Using CloudScraper with Proxies

Promo

This guide covers setting up a CloudScraper proxy integration, rotating IPs, and using authenticated proxies for seamless scraping.

About CloudScraper

CloudScraper is a Python module designed to bypass Cloudflare's anti-bot page (commonly known as "I'm Under Attack Mode" or IUAM). Under the hood, it is implemented using Requests, one of the most popular Python HTTP clients.

Why Use Proxies with CloudScraper?

Cloudflare may block your IP if you make too many requests or trigger more sophisticated defenses that are difficult to bypass. The combination of proxies and CloudScraper for scraping websites hosted by Cloudflare offers two key benefits:

  • Enhanced security and anonymity: By routing requests through a proxy, your true identity remains hidden, reducing the risk of detection.
  • Avoiding blocks and interruptions: Proxies allow you to rotate IP addresses dynamically, which helps you bypass blocks and rate limiters.

Setting Up a Proxy With CloudScraper

Step #1: Install CloudScraper

Install the cloudscraper pip package:

pip install -U cloudscraper

The -U option ensures that you are getting the latest version of the package with the latest workarounds for Cloudflare's anti-bot engine.

Step #2: Initialize CloudScraper

Import CloudScraper:

import cloudscraper

Create a CloudScraper instance using the create_scraper() method:

scraper = cloudscraper.create_scraper()

The scraper object works similarly to the Session object from the requests library. In particular, it enables you to make HTTP requests while bypassing Cloudflare's anti-bot measures.

Step #3: Integrate a Proxy

Define a proxies dictionary and pass it to the get() method as below:

proxies = {
    "http": "<YOUR_HTTP_PROXY_URL>",
    "https": "<YOUR_HTTPS_PROXY_URL>"
}

# Perform a request through the specified proxy
response = scraper.get("<YOUR_TARGET_URL>", proxies=proxies)

The proxies parameter in the get() method is passed down to Requests. This allows the HTTP client to route your request through the specified HTTP or HTTPS proxy server, depending on the protocol of your target URL.

Step #4: Test the CloudScraper Proxy Integration Setup

For demonstration purposes, let's target the /ip endpoint of the HTTPBin project. This endpoint returns the caller's IP address. If everything works as expected, the response should display the IP address of the proxy server.

Assuming that the URL for the proxy server is http://202.159.35.121:443, this will be the script code:

import cloudscraper

# Create a CloudScraper instance
scraper = cloudscraper.create_scraper()

# Specify your proxy
proxies = {
    "http": "http://202.159.35.121:443",
    "https": "http://202.159.35.121:443"
}

# Make a request through the proxy
response = scraper.get("https://httpbin.org/ip", proxies=proxies)

# Print the response from the "/ip" endpoint
print(response.text)

You should see a response like this:

{
    "origin": "202.159.35.121"
}

The IP in the response matches the IP of the proxy server, as expected.

Note:
Free proxy servers are often short-lived. It's best to obtain a new IP address for a proxy when testing the script.

Implementing Proxy Rotation

Retrieve a list of proxies from a reliable provider and store them in an array:

proxy_list = [
    {"http": "<YOUR_PROXY_URL_1>", "https": "<YOUR_PROXY_URL_1>"},
    # ...
    {"http": "<YOUR_PROXY_URL_n>", "https": "<YOUR_PROXY_URL_n>"},
]

Next, use the random.choice() method to randomly select a proxy from the list:

import random

random_proxy = random.choice(proxy_list)

Set the randomly selected proxy in the get() request:

response = scraper.get("<YOUR_TARGET_URL>", proxies=random_proxy)

If everything is set up correctly, the request will use a different proxy from the list at each run. Here is the complete code:

import cloudscraper
import random

# Create a Cloudscraper instance
scraper = cloudscraper.create_scraper()

# List of proxy URLs (replace with actual proxy URLs)
proxy_list = [
    {"http": "<YOUR_PROXY_URL_1>", "https": "<YOUR_PROXY_URL_1>"},
    # ...
    {"http": "<YOUR_PROXY_URL_n>", "https": "<YOUR_PROXY_URL_n>"},
]

# Randomly select a proxy from the list
random_proxy = random.choice(proxy_list)

# Make a request using the randomly selected proxy
# (replace with the actual target URL)
response = scraper.get("<YOUR_TARGET_URL>", proxies=random_proxy)

Using Authenticated Proxies in CloudScraper

To authenticate a proxy in CloudScraper, include the required credentials directly in the proxy URL. The format for username and password authentication is as follows:

<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>

With that format, the CloudScraper proxy configuration would look like this:

import cloudscraper

# Create a Cloudscraper instance
scraper = cloudscraper.create_scraper()  

# Define your authenticated proxy
proxies = {
   "http": "<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>",
   "https": "<PROXY_PROTOCOL>://<YOUR_USERNAME>:<YOUR_PASSWORD>@<PROXY_IP_ADDRESS>:<PROXY_PORT>"
}

# Perform a request through the specified authenticated proxy
response = scraper.get("<YOUR_TARGET_URL>", proxies=proxies)

Integrating Premium Proxies in CloudScraper

For reliable results in production scraping environments, use proxies from top-tier providers like Bright Data. To integrate Bright Data’s proxies in CloudScraper:

  1. Create an account or log in.

  2. Reach the dashboard and click on the “Residential” zone in the table:

Bright Data's proxies and scraping infrastructure control panel

  1. Activate the proxies by clicking the toggle:

Turning on the residential zone

This is what you should now be seeing:

The residential zone turned on

Note:
Bright Data’s residential proxies rotate automatically.

  1. In the “Access Details” section, copy the proxy host, username, and password:

The access details for your residential proxies zone Your Bright Data proxy URL will look like this:

http://<PROXY_USERNAME>:<PROXY_PASSOWRD>@brd.superproxy.io:33335
  1. Integrate the proxy into Cloudscraper as follows:
import cloudscraper

# Create CloudScraper instance
scraper = cloudscraper.create_scraper()

# Define the Bright Data proxy
proxies = {
   "http": "http://<PROXY_USERNAME>:<PROXY_PASSOWRD>@brd.superproxy.io:33335",
   "https": "http://<PROXY_USERNAME>:<PROXY_PASSOWRD>@brd.superproxy.io:33335"
}

# Perform a request using the proxy
response = scraper.get("https://httpbin.io/ip", proxies=proxies)

# Print the repsonse
print(response.text)

The CloudScraper proxy integration is done.

Conclusion

Bright Data controls the best proxy servers in the world, serving Fortune 500 companies and over 20,000 customers. Its worldwide proxy network involves:

Create a free Bright Data account today to try our proxy servers.