Scrapman is a Python script to crawl a list of elements from a specified URL with a specified selector.
Clone this repository:
git clone https://github.com/anned20/scrapman.git
Install the dependencies:
pip install -r requirements.txt
You are now ready to use Scrapman:
python scrapman.py --help
You should see something like:
Usage: scrapman.py [OPTIONS]
Options:
--debug / --no-debug Debug mode
--url TEXT URL to crawl
--selector TEXT Selector for the elements
--output-type [dict|json|csv]
--output-file TEXT File to output the result into. Use "-" for
stdout
--help Show this message and exit.
To run the tests you use pytest
Execute them with pytest
in the project directory
- requests - Getting the webpage
- click - Parsing command line options
- BeautifulSoup - Parsing the HTML of the webpage
This project is licensed under the MIT License - see the LICENSE.md file for details