Analysis of training and location data.

Ingest data from different sources and formats into a duckdb database. Create some methods to find duplicate data among records and do some quality checks and corrections. Do some analysis of the training data, about the fitness aspect of the data. Do some analysis and aggregation on the location data, for presence statistics and GIS applications.

This probably, will always be a work in progress.

Create a database

Quality check of location data

Deduplicate points.
Remove errors in records.
Combine columns/variables.

Merge analysis from my other projects

Description

The main database collects all available data from the source files. The intent is to aggregate as much data as possible, then to analyze the raw data, in order to find source files that we can delete or exclude from the main database. Also, by reading all the files we can detect file and formatting problems. The source files have been produced by different devices and have been processed by different software. We want to collect all the information gathered over a period of more than 10 years, so we expect more than 100 variables/columns and more than 30M records/rows. The processing scheme we try to implement should work with simple hardware specifications (8GB RAM or even less).

With further analysis, we can merge some of the variables, and check the data quality.

After we are confident about the data quality and the info in them, we can use the data to create other datasets we need.

Helpful and similar projects.

Location History JSON Converter used to parse the huge json file to a simple csv.
garmin-connect-export
GarminDB

My database stats

fit	gpx	json	Rds
1277	4474	2314	1

Table: File types

fit	gpx	gz	json	Rds	zip
82	4470	492	2314	1	707

Table: Files extensions

Total rows: 48629335

Total files: 8066

Total days: 4018

Total vars: 147

DB Size: 2.4 GiB

Source Size: 6.5 GiB

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
DB_build		DB_build
old		old
process		process
.gitignore		.gitignore
DEFINITIONS.R		DEFINITIONS.R
FUNCTIONS.R		FUNCTIONS.R
Readme.md		Readme.md
build_db_duck.sh		build_db_duck.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of training and location data.

Create a database

Quality check of location data

Merge analysis from my other projects

Description

Helpful and similar projects.

My database stats

About

Releases

Packages

Languages

thanasisn/training_location_analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of training and location data.

Create a database

Quality check of location data

Merge analysis from my other projects

Description

Helpful and similar projects.

My database stats

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages