This repo contains various analysis on different datasets. Current analysis focuses on time series forecasting and anomaly detection.
import urllib3
import networkx as nx
from wikipedia.parser import get_graph
pool = urllib3.PoolManager()
G = get_graph(pool, url = "https://en.wikipedia.org/wiki/Data_mining", deep=1)
nx.draw(G, nx.circular_layout(G), with_labels=True)
In this experiment, I'll test the hypothesis that:
By going to the first link on any Wikipedia article, you'll end up on the
philosophy article.
For more info on the subject go to my dev.to article.
crawl(pool, "https://en.wikipedia.org/wiki/Data_mining", phrase="Philosophy", deep=30, n=1, verbose=True)
30 Entering Data_mining
29 Entering Data_set
...
[('https://en.wikipedia.org/wiki/Thought',
[('https://en.wikipedia.org/wiki/Ideas',
['https://en.wikipedia.org/wiki/Philosophy'])])])])])])])])])])])])])])])])])])])])])])])])])])
Experiment and code
Dataset - sampled daily Prophet prediction of order volume with confidence intervals
Notebooks:
- Predictions with prophet
- Preparing data
- Exploring data as time series
- Frequent pattern mining with fpgrowth
Smoothed with 3-day moving average, yearly seasonality
This dataset contains data from a news website. Each csv file contains info about sessions, clicks on articles, time of interaction etc.
In file frequency_analysis.ipynb the distribution of page on with the article appears is analysed. For any session I added the order of articles. Then I filtered for that one article and created histograms for each hour.