scrapper #123079
Replies: 4 comments
-
you can use libraries of python for this |
Beta Was this translation helpful? Give feedback.
-
Hey there! 👋 Thanks for posting in the GitHub Community, @Dexterr-net. We're happy you're here. You are more likely to get a useful response if you are posting your question in the applicable category. The Discover category is a place for GitHubbers where we post blogs, articles, best practices, and tips & tricks from GitHub employees and users. I've gone ahead and moved this to the correct category for you. Good luck! |
Beta Was this translation helpful? Give feedback.
-
Hi @Dexterr-net, we're happy you're here! You are more likely to get a useful response if you are explicit about what your project entails, giving a few more details might help someone give you a nudge in the right direction. |
Beta Was this translation helpful? Give feedback.
-
Hey @Dexterr-net Looking at your repositories, you seem to be familiar with Python. So you may want to look at Scrapy. |
Beta Was this translation helpful? Give feedback.
-
Everything depends on the site you want to scrape and the data you want to retrieve. If the site makes API requests from your browser, the simplest approach is to try to capture the routes you're interested in using your browser's network inspector. Once you've found these routes, you just need to make the same requests using the language and library of your choice, such as Axios in JavaScript or Requests in Python. This method allows you to be much faster and to have the data in a more digestible format, often JSON. For sites that have the data directly in HTML, you can request the page using a library of your choice and then parse the HTML with BeautifulSoup in Python or Cheerio in JavaScript to use the data. If you need to interact with the website or if other methods don't work for your case, you can use libraries that simulate web browsers, such as Puppeteer in JavaScript or Selenium in Python. However, you should be aware that some sites implement security measures. To bypass these, you might need to specify headers or other parameters to create a browser identity. |
Beta Was this translation helpful? Give feedback.
-
i want to make an scrapper for my website
Beta Was this translation helpful? Give feedback.
All reactions