-
Notifications
You must be signed in to change notification settings - Fork 221
Scraping the Web
Bradley Aaron Kohler edited this page Apr 30, 2020
·
11 revisions
The Naive Method of scraping the web is using the static tag, and static attributes (key and value pairs).
Using BeautifulSoup4 we can scrape the following HTML text encapsulated by the tag
<div class="location"> Some text in here... </div>
s.find('div', attrs={'class': 'location'}).text.strip()
Here the static tag is 'div'
, the static attribute key and value pair is 'class': 'location'
.
The advantage of the Naive Method is that it is incredibly accurate. The disadvantage of the Naive Method is that it must be consistently maintained, that is, the web page HTML format may update over time.