How to Use Proxies to Rotate IP Addresses in Python

Learn how to perform web scraping at scale by preventing websites to ban your ip address while scraping them using different proxy methods in Python.
  · 7 min read · Updated jun 2024 · Web Scraping

Unlock the secrets of your code with our AI-powered Code Explainer. Take a look!

A proxy is a server application that acts as an intermediary for requests between a client and the server from which the client is requesting a certain service (HTTP, SSL, etc.).

 

When using a proxy server, instead of directly connecting to the target server and requesting whatever that is you wanna request, you direct the request to the proxy server which evaluates the request and performs it, and returns the response, here is a simple Wikipedia demonstration of proxy servers:

Web scraping experts often use more than one proxy to prevent websites from banning their IP addresses. Proxies have several other benefits, including bypassing filters and censorship, hiding your real IP address, etc.

In this tutorial, you will learn how you can use proxies in Python using requests library, we will also be using stem library, which is a Python controller library for Tor, let's install them:

pip3 install bs4 requests stem

If you're just beginning your Python programming journey, this in-depth Python web scraping tutorial is the perfect starting point. The guide will walk you through the most popular Python libraries for web scraping, including requests, BeautifulSoup4, and Selenium, showing you how to extract and save data to CSV and Excel files. It's a go-to resource for building a Python web scraper from scratch, understanding library differences, and discovering best practices for your project.

Related: How to Make a Subdomain Scanner in Python.

Using Free Available Proxies

First, some websites offer free proxy lists to use; I have built a function to grab this list automatically:

import requests
import random
from bs4 import BeautifulSoup as bs

def get_free_proxies():
    url = "https://free-proxy-list.net/"
    # get the HTTP response and construct soup object
    soup = bs(requests.get(url).content, "html.parser")
    proxies = []
    for row in soup.find("table", attrs={"id": "proxylisttable"}).find_all("tr")[1:]:
        tds = row.find_all("td")
        try:
            ip = tds[0].text.strip()
            port = tds[1].text.strip()
            host = f"{ip}:{port}"
            proxies.append(host)
        except IndexError:
            continue
    return proxies

However, when I tried to use them, most of them were timing out, I filtered some working ones:

proxies = [
    '167.172.248.53:3128',
    '194.226.34.132:5555',
    '203.202.245.62:80',
    '141.0.70.211:8080',
    '118.69.50.155:80',
    '201.55.164.177:3128',
    '51.15.166.107:3128',
    '91.205.218.64:80',
    '128.199.237.57:8080',
]

This list may not be viable forever; in fact, most of these will stop working when you read this tutorial (so you should execute the above function each time you want to use fresh proxy servers).

The below function accepts a list of proxies and creates a requests session that randomly selects one of the proxies passed:

def get_session(proxies):
    # construct an HTTP session
    session = requests.Session()
    # choose one random proxy
    proxy = random.choice(proxies)
    session.proxies = {"http": proxy, "https": proxy}
    return session

Let's test this by requesting a website that returns our IP address:

for i in range(5):
    s = get_session(proxies)
    try:
        print("Request page with IP:", s.get("http://icanhazip.com", timeout=1.5).text.strip())
    except Exception as e:
        continue

Here is my output:

Request page with IP: 45.64.134.198
Request page with IP: 141.0.70.211
Request page with IP: 94.250.248.230
Request page with IP: 46.173.219.2
Request page with IP: 201.55.164.177

As you can see, these are some IP addresses of the working proxy servers and not our real IP address (try to visit this website in your browser and you'll see your real IP address).

Free proxies tend to die very quickly, mostly in days or even hours, and would often die before our scraping project ends. To prevent that, you need to use premium proxies for large-scale data extraction projects, there are many providers out there who rotate IP addresses for you. One of the well-known solutions is Zyte. We will talk more about it in the last section of this tutorial.

Using Tor as a Proxy

You can also use the Tor network to rotate IP addresses:

import requests
from stem.control import Controller
from stem import Signal

def get_tor_session():
    # initialize a requests Session
    session = requests.Session()
    # setting the proxy of both http & https to the localhost:9050 
    # this requires a running Tor service in your machine and listening on port 9050 (by default)
    session.proxies = {"http": "socks5://localhost:9050", "https": "socks5://localhost:9050"}
    return session

def renew_connection():
    with Controller.from_port(port=9051) as c:
        c.authenticate()
        # send NEWNYM signal to establish a new clean connection through the Tor network
        c.signal(Signal.NEWNYM)

if __name__ == "__main__":
    s = get_tor_session()
    ip = s.get("http://icanhazip.com").text
    print("IP:", ip)
    renew_connection()
    s = get_tor_session()
    ip = s.get("http://icanhazip.com").text
    print("IP:", ip)

Note: The above code should work only if you have Tor installed in your machine (head to this link to properly install it) and well configured (ControlPort 9051 is enabled, check this stackoverflow answer for further details).

This will create a session with a Tor IP address and make an HTTP request, and then renew the connection by sending NEWNYM signal (which tells Tor to establish a new clean connection) to change the IP address and make another request, here is the output:

IP: 185.220.101.49

IP: 109.70.100.21

Great! However, when you experience web scraping using the Tor network, you'll soon realize it's pretty slow most of the time, that is why the recommended way is below.

Using Smart Proxy Manager

Zyte's Smart Proxy Manager allows you to crawl quickly and reliably, it manages and rotates proxies internally, so if you're banned, it will automatically detect that and rotates the IP address for you.

It is specifically designed for web scraping and crawling. Its job is clear: making your life easier as a web scraper. It helps you get successful requests and extract data at scale from any website using any web scraping tool.

With its simple API, the requests you make when scraping will be routed through a pool of high-quality proxies. When necessary, it automatically introduces delays between requests and removes/adds IP addresses to overcome different crawling challenges.

Here is how you can use Zyte with requests library in Python:

import requests

url = "http://icanhazip.com"
proxy_host = "proxy.crawlera.com"
proxy_port = "8010"
proxy_auth = "<APIKEY>:"
proxies = {
       "https": f"https://{proxy_auth}@{proxy_host}:{proxy_port}/",
       "http": f"http://{proxy_auth}@{proxy_host}:{proxy_port}/"
}

r = requests.get(url, proxies=proxies, verify=False)

Once you register for a plan, you'll be provided with an API key which you'll replace proxy_auth.

So, here is what Zyte does for you:

  • You send the HTTP request using its single endpoint API.
  • It automatically selects, rotates, throttles, and blacklists IPs to retrieve the target data.
  • It handles request headers and maintains sessions.
  • You receive a successful request in response.

Conclusion

There are several proxy types, including transparent proxies, anonymous proxies, and elite proxies. If your goal of using proxies is to prevent websites from banning your scrapers, then elite proxies are your optimal choice; it will make you seem like a regular internet user who is not using a proxy at all.

Furthermore, an extra anti-scraping measure uses rotating user agents, in which you send a changing spoofed header each time, saying that you're a regular browser.

Learn also: How to Extract All Website Links in Python.

Happy Coding ♥

Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!

View Full Code Assist My Coding
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!