Scrapemark

NOTE: This project is no longer maintained! more info

Scrapemark

Scrapemark is a super-convenient way to scrape webpages in Python.

It utilizes an HTML-like markup language to extract the data you need. You get your results as plain old Python lists and dictionaries. Scrapemark internally utilizes regular expressions and is super-fast.

As an example, here is a way you could scrape all the links on the Digg homepage in one fell swoop:

import scrapemark

print scrapemark.scrape("""
  {*
    <div class='news-summary'>
      <h3><a href='{{ [links].url }}'>{{ [links].title }}</a></h3>
      <p>{{ [links].description }}</p>
      <li class='digg-count'>
        <strong>{{ [links].diggs|int }}</strong>
      </li>
    </div>
  *}
  """,
  url='http://digg.com/')

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
examples		examples
scrapemark.egg-info		scrapemark.egg-info
.gitignore		.gitignore
DOCS.md		DOCS.md
EXAMPLES.md		EXAMPLES.md
Makefile		Makefile
README.md		README.md
scrapemark.py		scrapemark.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapemark

About

Releases 1

Packages

Languages

arshaw/scrapemark

Folders and files

Latest commit

History

Repository files navigation

Scrapemark

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages