-
Notifications
You must be signed in to change notification settings - Fork 129
Iterators
If you need to iterate over a list of nodes that match the same selector, you can use :iterator
properties. It takes a selector and a block and yields as many times as the number of elements that were found in that page for that selector. Basically this construct narrows down the context to the elements returned by the selector given to :iterator
and scrapes each one of the elements found in the page. This one is a bit complicated, we better give a good example:
Wombat.crawl do
base_url "http://www.github.com"
path "/explore"
repositories "css=ol.ranked-repositories>li", :iterator do
repo 'css=h3'
description 'css=p.description'
end
end
Outputs:
{
"repositories"=>
[
{
"repo"=>"EightMedia / hammer.js",
"description"=> "A javascript library for multi-touch gestures :// You can touch this"
},
{
"repo"=>"gummikana / email_mom.php",
"description"=>"A small script that emails my mom when I'm abroad telling her that I'm alive."
},
{"
repo"=>"hagino3000 / Struct.js",
"description"=>"C Struct like object for JavaScript"
},
{
"repo"=>"tumblr / policy",
"description"=>""
},
{
"repo"=>"NaturalNode / natural",
"description"=>"general natural language facilities for node"
}
]
}
Remember that, by default, properties will return only the first element that matches the given selector. So, for the example above, even if there are several h3
elements inside each li
, only the first matching element will be returned. If you want to retrieve all the matching elements, use the option :list
instead of the default :text
.