-
Notifications
You must be signed in to change notification settings - Fork 76
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat (lab): food intake advanced analysis
- Loading branch information
Showing
1 changed file
with
317 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,317 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Análise com SQL Avançado\n", | ||
"## U.S. EPA Food Commodity Intake Database (FCID)\n", | ||
"### [https://fcid.foodrisk.org/](https://fcid.foodrisk.org/)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Ativando uma conexão de banco de dados em memória usando o SGBD H2:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%defaultDatasource jdbc:h2:mem:db" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Importando Tabelas do FCID" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"DROP TABLE IF EXISTS Crop_Group;\n", | ||
"DROP TABLE IF EXISTS FCID_Description;\n", | ||
"DROP TABLE IF EXISTS Recipes;\n", | ||
"DROP TABLE IF EXISTS Intake;\n", | ||
"\n", | ||
"CREATE TABLE Crop_Group (\n", | ||
" CGN VARCHAR(2),\n", | ||
" CGL VARCHAR(6),\n", | ||
" Crop_Group_Description VARCHAR(80),\n", | ||
" PRIMARY KEY (CGL)\n", | ||
") AS SELECT\n", | ||
" CGN, CGL, Crop_Group_Description\n", | ||
"FROM CSVREAD('../../data/food-intake/basics/FCID_Cropgroup_Description.csv');\n", | ||
"\n", | ||
"CREATE TABLE FCID_Description (\n", | ||
" CGN VARCHAR(2),\n", | ||
" CG_Subgroup VARCHAR(6),\n", | ||
" FCID_Code VARCHAR(10),\n", | ||
" FCID_Desc VARCHAR(55),\n", | ||
" PRIMARY KEY (FCID_Code),\n", | ||
") AS SELECT\n", | ||
" cgn, CG_Subgroup, FCID_Code, FCID_Desc\n", | ||
"FROM CSVREAD('../../data/food-intake/basics/FCID_Code_Description.csv');\n", | ||
"\n", | ||
"CREATE TABLE Recipes (\n", | ||
" Food_Code VARCHAR(8),\n", | ||
" Mod_Code VARCHAR(8),\n", | ||
" Ingredient_Num TINYINT,\n", | ||
" FCID_Code VARCHAR(10),\n", | ||
" Cooked_Status TINYINT,\n", | ||
" Food_Form TINYINT,\n", | ||
" Cooking_Method TINYINT,\n", | ||
" Commodity_Weight DECIMAL(5, 2),\n", | ||
" CSFII_9498_IND TINYINT,\n", | ||
" WWEIA_9904_IND TINYINT,\n", | ||
" WWEIA_0510_IND TINYINT,\n", | ||
" PRIMARY KEY(Food_Code, Mod_Code, Ingredient_Num),\n", | ||
" FOREIGN KEY(FCID_Code)\n", | ||
" REFERENCES FCID_Description(FCID_Code)\n", | ||
" ON DELETE NO ACTION\n", | ||
" ON UPDATE NO ACTION\n", | ||
") AS SELECT\n", | ||
" Food_Code, Mod_Code, Ingredient_Num, FCID_Code, Cooked_Status, Food_Form, Cooking_Method,\n", | ||
" Commodity_Weight, CSFII_9498_IND, WWEIA_9904_IND, WWEIA_0510_IND\n", | ||
"FROM CSVREAD('../../data/food-intake/recipes/Recipes_WWEIA_FCID_0510.csv');\n", | ||
"\n", | ||
"CREATE TABLE Intake (\n", | ||
" SeqN INTEGER NOT NULL,\n", | ||
" DayCode TINYINT NOT NULL,\n", | ||
" DraBF TINYINT,\n", | ||
" FCID_Code VARCHAR(10),\n", | ||
" Cooked_Status TINYINT,\n", | ||
" Food_Form TINYINT,\n", | ||
" Cooking_Method TINYINT,\n", | ||
" Intake DECIMAL(13,7),\n", | ||
" Intake_BW DECIMAL(13,10),\n", | ||
" PRIMARY KEY(SeqN, DayCode, FCID_Code, Cooked_Status, Food_Form, Cooking_Method),\n", | ||
" FOREIGN KEY(FCID_Code)\n", | ||
" REFERENCES FCID_Description(FCID_Code)\n", | ||
" ON DELETE NO ACTION\n", | ||
" ON UPDATE NO ACTION\n", | ||
") AS SELECT\n", | ||
" SEQN, DAYCODE, DRABF, FCID_Code, Cooked_Status, Food_Form, Cooking_Method, Intake,Intake_BW\n", | ||
"FROM CSVREAD('../../data/food-intake/consumption/Commodity_CSFFM_Intake_0510-cropped.csv');" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Visualizando as Tabelas" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "b89a3f80-02b3-4acb-bb7c-5d3d4f855e42", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"method": "display_data" | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"SELECT * FROM Crop_Group LIMIT 10;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "9b197073-9158-4939-8e60-adfcfb546c1e", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"method": "display_data" | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"SELECT * FROM FCID_Description LIMIT 10;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "10c6feb9-2454-4656-bba0-ece47f008442", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"method": "display_data" | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"SELECT * FROM Recipes LIMIT 10;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": { | ||
"scrolled": true | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"application/vnd.jupyter.widget-view+json": { | ||
"model_id": "7840e179-1311-409f-9ecf-6689a574ee1d", | ||
"version_major": 2, | ||
"version_minor": 0 | ||
}, | ||
"method": "display_data" | ||
}, | ||
"metadata": {}, | ||
"output_type": "display_data" | ||
} | ||
], | ||
"source": [ | ||
"SELECT * FROM Intake LIMIT 10;" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Métricas\n", | ||
"\n", | ||
"Considere que a tabela Intake registra alimentos consumidos por 1.489 pessoas. Considere as seguintes métricas para um alimento:\n", | ||
"\n", | ||
"| Métrica | Descrição |\n", | ||
"| --- | --- |\n", | ||
"| Popularidade | número de pessoas (dentre as 1.489) que consumiram o alimento |\n", | ||
"| Intake_Sum | total consumido do alimento pelas 1.489 pessoas em gramas |\n", | ||
"| Intake_AVG | média de consumo do alimento em gramas |\n", | ||
"| Intake_AVG_BW | média de consumo do alimento x peso da pessoa |\n", | ||
"| Recipes | número de receitas (dentre as 7.154 receitas) que têm o produto como ingrediente |" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1) Construa uma View que apresente essas métricas por produto\n", | ||
"\n", | ||
"* Veja exemplo em: `/data/food-intake/computed/commodity-profile.csv`\n", | ||
"* Importante: esta tabela foi feita com um número maior de registros, portanto os valores não serão iguais aos seus" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2) Como você analisaria a correlação entre as métricas?\n", | ||
"\n", | ||
"* Por exemplo, produtos mais populares são mais consumidos (em número de pessoas ou em quantidade)?\n", | ||
"* Proponha uma ou mais queries para fazer esta análise" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3) Podemos criar grupos de consumidores conforme um perfil?\n", | ||
"* por exemplo, consumidores podem ser agrupados por alimentos que comem predominantemente?\n", | ||
"* como você associaria grupos a classes?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## 4) Que métricas podem ser analisadas para a comparação de perfis?\n", | ||
"* escreva uma query SQL que calcule pelo menos uma métrica comparativa" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "SQL", | ||
"language": "SQL", | ||
"name": "sql" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": "sql", | ||
"file_extension": ".sql", | ||
"mimetype": "", | ||
"name": "SQL", | ||
"nbconverter_exporter": "", | ||
"version": "" | ||
}, | ||
"toc": { | ||
"base_numbering": 1, | ||
"nav_menu": {}, | ||
"number_sections": false, | ||
"sideBar": false, | ||
"skip_h1_title": false, | ||
"title_cell": "Table of Contents", | ||
"title_sidebar": "Contents", | ||
"toc_cell": false, | ||
"toc_position": {}, | ||
"toc_section_display": false, | ||
"toc_window_display": false | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |