About

This repo contains various analysis on different datasets. Current analysis focuses on time series forecasting and anomaly detection.

Datasets

Wikipedia

Drawing graph of page links

import urllib3
import networkx as nx
from wikipedia.parser import get_graph

pool = urllib3.PoolManager()

G = get_graph(pool, url = "https://en.wikipedia.org/wiki/Data_mining", deep=1)
nx.draw(G, nx.circular_layout(G), with_labels=True)

Finding philosophy page

In this experiment, I'll test the hypothesis that: By going to the first link on any Wikipedia article, you'll end up on the
philosophy article.

For more info on the subject go to my dev.to article.

crawl(pool, "https://en.wikipedia.org/wiki/Data_mining", phrase="Philosophy", deep=30, n=1, verbose=True)

30 Entering Data_mining
29 Entering Data_set

...

   [('https://en.wikipedia.org/wiki/Thought',
     [('https://en.wikipedia.org/wiki/Ideas',
       ['https://en.wikipedia.org/wiki/Philosophy'])])])])])])])])])])])])])])])])])])])])])])])])])])

Experiment and code

E-commerce dataset from brazilian retail store

Dataset - sampled daily Prophet prediction of order volume with confidence intervals

Notebooks:

Animations:

Smoothed with 3-day moving average, yearly seasonality

User interactions database

This dataset contains data from a news website. Each csv file contains info about sessions, clicks on articles, time of interaction etc.

In file frequency_analysis.ipynb the distribution of page on with the article appears is analysed. For any session I added the order of articles. Then I filtered for that one article and created histograms for each hour.

Image processing

Animation of how Sobel edge detetion works:

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.dvc		.dvc
e_commerce		e_commerce
image_processig		image_processig
mlops/src		mlops/src
mnist		mnist
user_interactions		user_interactions
wikipedia		wikipedia
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
plots.json		plots.json
requirements.txt		requirements.txt
scores.json		scores.json
temp		temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Datasets

Wikipedia

Drawing graph of page links

Finding philosophy page

E-commerce dataset from brazilian retail store

Animations:

User interactions database

Image processing

About

Languages

License

finloop/data-science-notebooks

Folders and files

Latest commit

History

Repository files navigation

About

Datasets

Wikipedia

Drawing graph of page links

Finding philosophy page

E-commerce dataset from brazilian retail store

Animations:

User interactions database

Image processing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages