[Discussion] Avoiding blacklists: rotating proxies or varying scraping params #11

simonfromla · 2019-10-05T04:56:22Z

simonfromla
Oct 5, 2019

Seeing as to how getting blacklisted is likely a concern to many using a module such as this, I'd like to start a discussion on ideas to avoid doing so.

What are anyone's thoughts on implementing features that reduce the chance of getting blacklisted, such as rotating proxies, or allowing adjustment of scraping params such as time in between requests?

Has anybody implemented something similar in their own usage? What's worked and what hasn't?

kevinzg · 2019-10-18T04:30:19Z

kevinzg
Oct 18, 2019
Maintainer

I don't think those features will be implemented directly on this project.

Eventually I would like to provide an Spider class interface to be used with Scrapy.

Scrapy already has support for auto throttle and has some recommended practices on how to distribute scrapers and avoid getting banned.

If there is another scraping framework with those features we might also provide an interface for it.

0 replies

Retro64XYZ · 2019-10-24T13:37:14Z

Retro64XYZ
Oct 24, 2019

I can implement rotating proxies if there is an interest. I don't want to do it if you won't accept it though. If I added the ability to rotate proxies, would you accept the pull request?

https://pypi.org/project/requests/

r = requests.get("http://www.google.com", 
                 proxies={"http": "http://61.233.25.166:80"})
print(r.text)

OR

import requests

s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}

r = s.get("http://www.google.com")
print(r.text)

You can easily update code yourself to include a list of proxies as an argument. But I am willing to do it.

0 replies

nubpro · 2020-01-22T20:52:57Z

nubpro
Jan 22, 2020

How likely is it to get banned from Facebook if I were to scrape the 3 Facebook page with a minimum of 2 pages each every 15 minute? All through a single machine with a single IP address (without logging in to Facebook)

I should really get started looking into Scrapy and all the stuff you all have mentioned above.

0 replies

evildrome · 2021-02-10T14:46:29Z

evildrome
Feb 10, 2021

I have been scraping FB groups for 6 years using my own c# app (until it broke recently).

It seems (from my experience) that scraping from a mature FB account with lots of "normal" activity will not get you banned.

I have downloaded groups with > 1 million posts and comments, which entailed more than 8 weeks continuous downloading.

If you create a new FB account and start scraping from it you will be insta-banned.

Also, DO NOT rotate proxies. Your account will be immediately locked as logging in from two places in close succession usually means some hacker has your login details, so FB lock your account until you go through verification.

I have been downloading social media since 1999.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Avoiding blacklists: rotating proxies or varying scraping params #11

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[Discussion] Avoiding blacklists: rotating proxies or varying scraping params #11

simonfromla Oct 5, 2019

Replies: 4 comments

kevinzg Oct 18, 2019 Maintainer

Retro64XYZ Oct 24, 2019

nubpro Jan 22, 2020

evildrome Feb 10, 2021

simonfromla
Oct 5, 2019

kevinzg
Oct 18, 2019
Maintainer

Retro64XYZ
Oct 24, 2019

nubpro
Jan 22, 2020

evildrome
Feb 10, 2021