Skip to content
/ minet Public
forked from medialab/minet

A webmining CLI tool & library for python.

Notifications You must be signed in to change notification settings

d3scmps/minet

 
 

Repository files navigation

Build Status

Minet

minet is a webmining CLI tool & library for python. It adopts a lo-fi approach to various webmining problems by letting you perform a variety of actions from the comfort of your command line. No database needed: raw data files will get you going.

In addition, minet also exposes its high-level programmatic interface as a library so you can tweak its behavior at will.

Features

  • Multithreaded, memory-efficient fetching from the web.
  • Multithreaded, scalable crawling using a comfy DSL.
  • Multiprocessed raw text content extraction from HTML pages.
  • Multiprocessed scraping from HTML pages using a comfy DSL.
  • URL-related heuristics utilities such as extraction, normalization and matching.
  • Data collection from various APIs such as CrowdTangle.

Installation

minet can be installed using pip:

pip install minet

Cookbook

To learn how to use minet and understand how it may fit your use cases, you should definitely check out our Cookbook.

Usage

About

A webmining CLI tool & library for python.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.3%
  • Other 0.7%