diff --git a/README.md b/README.md index 65d6f69a73..94ceac886f 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ In addition, **minet** also exposes its high-level programmatic interface as a p * Collecting data from [Facebook](https://www.facebook.com/) (comments, likes etc.) * Parsing [Facebook](https://www.facebook.com/) urls in a CSV file. * Collecting data from [Twitter](https://twitter.com) (users, followers, followees etc.) +* Scraping data (tweets etc.) from [Twitter](https://twitter.com)'s website public facing search API. * Collecting data from [YouTube](https://www.youtube.com/) (captions, comments, video metadata etc.) * Parsing [YouTube](https://www.youtube.com/) urls in a CSV file. * Dumping a [Hyphe](https://hyphe.medialab.sciences-po.fr/) corpus. diff --git a/docs/cli.md b/docs/cli.md index 4db1ba85a6..a498462cbf 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -41,6 +41,7 @@ * [twitter](#twitter) * [followers](#followers) * [friends](#friends) + * [scrape](#twitter-scrape) * [users](#users) * [youtube (yt)](#youtube) * [captions](#captions) @@ -549,7 +550,7 @@ usage: minet crowdtangle search [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT] [--language LANGUAGE] [-l LIMIT] [--not-in-title] [--offset OFFSET] [-p PLATFORMS] - [--search-field {include_query_strings,image_text_only,text_fields_only,account_name_only,text_fields_and_image_text}] + [--search-field {text_fields_only,account_name_only,image_text_only,include_query_strings,text_fields_and_image_text}] [--sort-by {date,interaction_rate,overperforming,total_interactions,underperforming}] [--start-date START_DATE] [--types TYPES] terms @@ -579,7 +580,7 @@ optional arguments: --not-in-title Whether to search terms in account titles also. --offset OFFSET Count offset. -p PLATFORMS, --platforms PLATFORMS The platforms from which to retrieve links (facebook, instagram, or reddit). This value can be comma-separated. - --search-field {include_query_strings,image_text_only,text_fields_only,account_name_only,text_fields_and_image_text} + --search-field {text_fields_only,account_name_only,image_text_only,include_query_strings,text_fields_and_image_text} In what to search the query. Defaults to `text_fields_and_image_text`. --sort-by {date,interaction_rate,overperforming,total_interactions,underperforming} The order in which to retrieve posts. Defaults to `date`. @@ -599,7 +600,7 @@ examples: usage: minet crowdtangle summary [-h] [--rate-limit RATE_LIMIT] [-o OUTPUT] [-t TOKEN] [-p PLATFORMS] [--posts POSTS] [-s SELECT] - [--sort-by {subscriber_count,total_interactions,date}] + [--sort-by {subscriber_count,date,total_interactions}] [--start-date START_DATE] [--total TOTAL] column [file] @@ -621,7 +622,7 @@ optional arguments: -p PLATFORMS, --platforms PLATFORMS The platforms from which to retrieve links (facebook, instagram, or reddit). This value can be comma-separated. --posts POSTS Path to a file containing the retrieved posts. -s SELECT, --select SELECT Columns to include in report (separated by `,`). - --sort-by {subscriber_count,total_interactions,date} + --sort-by {subscriber_count,date,total_interactions} How to sort retrieved posts. Defaults to `date`. --start-date START_DATE The earliest date at which a post could be posted (UTC!). You can pass just a year or a year-month for convenience. --total TOTAL Total number of HTML documents. Necessary if you want to display a finite progress indicator. @@ -908,6 +909,32 @@ examples: ``` +