Skip to content

Scraping the Web

Bradley Aaron Kohler edited this page Apr 30, 2020 · 11 revisions

Scraping the Web the Naive Method

The Naive Method of scraping the web is using the static tag, and static attributes (key and value pairs).

Using BeautifulSoup4 we can scrape the following HTML text encapsulated by the tag

<div class="location"> Some text in here... </div>
s.find('div', attrs={'class': 'location'}).text.strip()

Here the static tag is 'div', the static attribute key and value pair is 'class': 'location'.

The advantage of the Naive Method is that it is incredibly accurate. The disadvantage of the Naive Method is that it must be consistently maintained, that is, the web page HTML format may update over time.

Clone this wiki locally