Skip to content

Commit

Permalink
feat (lab): food intake advanced analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
santanche committed Sep 22, 2023
1 parent c318a07 commit e37c56d
Showing 1 changed file with 317 additions and 0 deletions.
317 changes: 317 additions & 0 deletions sql/food-intake/food-intake-analysis-advanced.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,317 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Análise com SQL Avançado\n",
"## U.S. EPA Food Commodity Intake Database (FCID)\n",
"### [https://fcid.foodrisk.org/](https://fcid.foodrisk.org/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ativando uma conexão de banco de dados em memória usando o SGBD H2:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%defaultDatasource jdbc:h2:mem:db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Importando Tabelas do FCID"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"DROP TABLE IF EXISTS Crop_Group;\n",
"DROP TABLE IF EXISTS FCID_Description;\n",
"DROP TABLE IF EXISTS Recipes;\n",
"DROP TABLE IF EXISTS Intake;\n",
"\n",
"CREATE TABLE Crop_Group (\n",
" CGN VARCHAR(2),\n",
" CGL VARCHAR(6),\n",
" Crop_Group_Description VARCHAR(80),\n",
" PRIMARY KEY (CGL)\n",
") AS SELECT\n",
" CGN, CGL, Crop_Group_Description\n",
"FROM CSVREAD('../../data/food-intake/basics/FCID_Cropgroup_Description.csv');\n",
"\n",
"CREATE TABLE FCID_Description (\n",
" CGN VARCHAR(2),\n",
" CG_Subgroup VARCHAR(6),\n",
" FCID_Code VARCHAR(10),\n",
" FCID_Desc VARCHAR(55),\n",
" PRIMARY KEY (FCID_Code),\n",
") AS SELECT\n",
" cgn, CG_Subgroup, FCID_Code, FCID_Desc\n",
"FROM CSVREAD('../../data/food-intake/basics/FCID_Code_Description.csv');\n",
"\n",
"CREATE TABLE Recipes (\n",
" Food_Code VARCHAR(8),\n",
" Mod_Code VARCHAR(8),\n",
" Ingredient_Num TINYINT,\n",
" FCID_Code VARCHAR(10),\n",
" Cooked_Status TINYINT,\n",
" Food_Form TINYINT,\n",
" Cooking_Method TINYINT,\n",
" Commodity_Weight DECIMAL(5, 2),\n",
" CSFII_9498_IND TINYINT,\n",
" WWEIA_9904_IND TINYINT,\n",
" WWEIA_0510_IND TINYINT,\n",
" PRIMARY KEY(Food_Code, Mod_Code, Ingredient_Num),\n",
" FOREIGN KEY(FCID_Code)\n",
" REFERENCES FCID_Description(FCID_Code)\n",
" ON DELETE NO ACTION\n",
" ON UPDATE NO ACTION\n",
") AS SELECT\n",
" Food_Code, Mod_Code, Ingredient_Num, FCID_Code, Cooked_Status, Food_Form, Cooking_Method,\n",
" Commodity_Weight, CSFII_9498_IND, WWEIA_9904_IND, WWEIA_0510_IND\n",
"FROM CSVREAD('../../data/food-intake/recipes/Recipes_WWEIA_FCID_0510.csv');\n",
"\n",
"CREATE TABLE Intake (\n",
" SeqN INTEGER NOT NULL,\n",
" DayCode TINYINT NOT NULL,\n",
" DraBF TINYINT,\n",
" FCID_Code VARCHAR(10),\n",
" Cooked_Status TINYINT,\n",
" Food_Form TINYINT,\n",
" Cooking_Method TINYINT,\n",
" Intake DECIMAL(13,7),\n",
" Intake_BW DECIMAL(13,10),\n",
" PRIMARY KEY(SeqN, DayCode, FCID_Code, Cooked_Status, Food_Form, Cooking_Method),\n",
" FOREIGN KEY(FCID_Code)\n",
" REFERENCES FCID_Description(FCID_Code)\n",
" ON DELETE NO ACTION\n",
" ON UPDATE NO ACTION\n",
") AS SELECT\n",
" SEQN, DAYCODE, DRABF, FCID_Code, Cooked_Status, Food_Form, Cooking_Method, Intake,Intake_BW\n",
"FROM CSVREAD('../../data/food-intake/consumption/Commodity_CSFFM_Intake_0510-cropped.csv');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualizando as Tabelas"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b89a3f80-02b3-4acb-bb7c-5d3d4f855e42",
"version_major": 2,
"version_minor": 0
},
"method": "display_data"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"SELECT * FROM Crop_Group LIMIT 10;"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9b197073-9158-4939-8e60-adfcfb546c1e",
"version_major": 2,
"version_minor": 0
},
"method": "display_data"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"SELECT * FROM FCID_Description LIMIT 10;"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "10c6feb9-2454-4656-bba0-ece47f008442",
"version_major": 2,
"version_minor": 0
},
"method": "display_data"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"SELECT * FROM Recipes LIMIT 10;"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7840e179-1311-409f-9ecf-6689a574ee1d",
"version_major": 2,
"version_minor": 0
},
"method": "display_data"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"SELECT * FROM Intake LIMIT 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Métricas\n",
"\n",
"Considere que a tabela Intake registra alimentos consumidos por 1.489 pessoas. Considere as seguintes métricas para um alimento:\n",
"\n",
"| Métrica | Descrição |\n",
"| --- | --- |\n",
"| Popularidade | número de pessoas (dentre as 1.489) que consumiram o alimento |\n",
"| Intake_Sum | total consumido do alimento pelas 1.489 pessoas em gramas |\n",
"| Intake_AVG | média de consumo do alimento em gramas |\n",
"| Intake_AVG_BW | média de consumo do alimento x peso da pessoa |\n",
"| Recipes | número de receitas (dentre as 7.154 receitas) que têm o produto como ingrediente |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Construa uma View que apresente essas métricas por produto\n",
"\n",
"* Veja exemplo em: `/data/food-intake/computed/commodity-profile.csv`\n",
"* Importante: esta tabela foi feita com um número maior de registros, portanto os valores não serão iguais aos seus"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Como você analisaria a correlação entre as métricas?\n",
"\n",
"* Por exemplo, produtos mais populares são mais consumidos (em número de pessoas ou em quantidade)?\n",
"* Proponha uma ou mais queries para fazer esta análise"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3) Podemos criar grupos de consumidores conforme um perfil?\n",
"* por exemplo, consumidores podem ser agrupados por alimentos que comem predominantemente?\n",
"* como você associaria grupos a classes?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4) Que métricas podem ser analisadas para a comparação de perfis?\n",
"* escreva uma query SQL que calcule pelo menos uma métrica comparativa"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "SQL",
"language": "SQL",
"name": "sql"
},
"language_info": {
"codemirror_mode": "sql",
"file_extension": ".sql",
"mimetype": "",
"name": "SQL",
"nbconverter_exporter": "",
"version": ""
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": false,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit e37c56d

Please sign in to comment.