scrapper #123079

Dexterr-net · 2024-05-09T08:17:54Z

Dexterr-net
May 9, 2024

i want to make an scrapper for my website

Dhruv-net · 2024-05-09T08:18:25Z

Dhruv-net
May 9, 2024

you can use libraries of python for this

0 replies

goldieowner17 · 2024-05-09T15:23:19Z

goldieowner17
May 9, 2024

Hey there! 👋

Thanks for posting in the GitHub Community, @Dexterr-net. We're happy you're here. You are more likely to get a useful response if you are posting your question in the applicable category. The Discover category is a place for GitHubbers where we post blogs, articles, best practices, and tips & tricks from GitHub employees and users.

I've gone ahead and moved this to the correct category for you. Good luck!

0 replies

ebndev · 2024-05-09T22:06:13Z

ebndev
May 9, 2024
Maintainer

Hi @Dexterr-net, we're happy you're here! You are more likely to get a useful response if you are explicit about what your project entails, giving a few more details might help someone give you a nudge in the right direction.

0 replies

CopperEagle · 2024-05-09T23:39:19Z

CopperEagle
May 9, 2024

Hey @Dexterr-net

Looking at your repositories, you seem to be familiar with Python. So you may want to look at Scrapy.

0 replies

IvanBF9 · 2024-05-14T10:13:55Z

IvanBF9
May 14, 2024

Everything depends on the site you want to scrape and the data you want to retrieve. If the site makes API requests from your browser, the simplest approach is to try to capture the routes you're interested in using your browser's network inspector. Once you've found these routes, you just need to make the same requests using the language and library of your choice, such as Axios in JavaScript or Requests in Python. This method allows you to be much faster and to have the data in a more digestible format, often JSON.

For sites that have the data directly in HTML, you can request the page using a library of your choice and then parse the HTML with BeautifulSoup in Python or Cheerio in JavaScript to use the data.

If you need to interact with the website or if other methods don't work for your case, you can use libraries that simulate web browsers, such as Puppeteer in JavaScript or Selenium in Python.

However, you should be aware that some sites implement security measures. To bypass these, you might need to specify headers or other parameters to create a browser identity.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

scrapper #123079

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

GitHub Community

scrapper #123079

Dexterr-net May 9, 2024

Replies: 4 comments

Dhruv-net May 9, 2024

goldieowner17 May 9, 2024

ebndev May 9, 2024 Maintainer

CopperEagle May 9, 2024

IvanBF9 May 14, 2024

Dexterr-net
May 9, 2024

Dhruv-net
May 9, 2024

goldieowner17
May 9, 2024

ebndev
May 9, 2024
Maintainer

CopperEagle
May 9, 2024

IvanBF9
May 14, 2024