diff --git a/sql/food-intake/food-intake-analysis-advanced.ipynb b/sql/food-intake/food-intake-analysis-advanced.ipynb new file mode 100644 index 0000000..601e6aa --- /dev/null +++ b/sql/food-intake/food-intake-analysis-advanced.ipynb @@ -0,0 +1,317 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Análise com SQL Avançado\n", + "## U.S. EPA Food Commodity Intake Database (FCID)\n", + "### [https://fcid.foodrisk.org/](https://fcid.foodrisk.org/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ativando uma conexão de banco de dados em memória usando o SGBD H2:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "%defaultDatasource jdbc:h2:mem:db" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Importando Tabelas do FCID" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "DROP TABLE IF EXISTS Crop_Group;\n", + "DROP TABLE IF EXISTS FCID_Description;\n", + "DROP TABLE IF EXISTS Recipes;\n", + "DROP TABLE IF EXISTS Intake;\n", + "\n", + "CREATE TABLE Crop_Group (\n", + " CGN VARCHAR(2),\n", + " CGL VARCHAR(6),\n", + " Crop_Group_Description VARCHAR(80),\n", + " PRIMARY KEY (CGL)\n", + ") AS SELECT\n", + " CGN, CGL, Crop_Group_Description\n", + "FROM CSVREAD('../../data/food-intake/basics/FCID_Cropgroup_Description.csv');\n", + "\n", + "CREATE TABLE FCID_Description (\n", + " CGN VARCHAR(2),\n", + " CG_Subgroup VARCHAR(6),\n", + " FCID_Code VARCHAR(10),\n", + " FCID_Desc VARCHAR(55),\n", + " PRIMARY KEY (FCID_Code),\n", + ") AS SELECT\n", + " cgn, CG_Subgroup, FCID_Code, FCID_Desc\n", + "FROM CSVREAD('../../data/food-intake/basics/FCID_Code_Description.csv');\n", + "\n", + "CREATE TABLE Recipes (\n", + " Food_Code VARCHAR(8),\n", + " Mod_Code VARCHAR(8),\n", + " Ingredient_Num TINYINT,\n", + " FCID_Code VARCHAR(10),\n", + " Cooked_Status TINYINT,\n", + " Food_Form TINYINT,\n", + " Cooking_Method TINYINT,\n", + " Commodity_Weight DECIMAL(5, 2),\n", + " CSFII_9498_IND TINYINT,\n", + " WWEIA_9904_IND TINYINT,\n", + " WWEIA_0510_IND TINYINT,\n", + " PRIMARY KEY(Food_Code, Mod_Code, Ingredient_Num),\n", + " FOREIGN KEY(FCID_Code)\n", + " REFERENCES FCID_Description(FCID_Code)\n", + " ON DELETE NO ACTION\n", + " ON UPDATE NO ACTION\n", + ") AS SELECT\n", + " Food_Code, Mod_Code, Ingredient_Num, FCID_Code, Cooked_Status, Food_Form, Cooking_Method,\n", + " Commodity_Weight, CSFII_9498_IND, WWEIA_9904_IND, WWEIA_0510_IND\n", + "FROM CSVREAD('../../data/food-intake/recipes/Recipes_WWEIA_FCID_0510.csv');\n", + "\n", + "CREATE TABLE Intake (\n", + " SeqN INTEGER NOT NULL,\n", + " DayCode TINYINT NOT NULL,\n", + " DraBF TINYINT,\n", + " FCID_Code VARCHAR(10),\n", + " Cooked_Status TINYINT,\n", + " Food_Form TINYINT,\n", + " Cooking_Method TINYINT,\n", + " Intake DECIMAL(13,7),\n", + " Intake_BW DECIMAL(13,10),\n", + " PRIMARY KEY(SeqN, DayCode, FCID_Code, Cooked_Status, Food_Form, Cooking_Method),\n", + " FOREIGN KEY(FCID_Code)\n", + " REFERENCES FCID_Description(FCID_Code)\n", + " ON DELETE NO ACTION\n", + " ON UPDATE NO ACTION\n", + ") AS SELECT\n", + " SEQN, DAYCODE, DRABF, FCID_Code, Cooked_Status, Food_Form, Cooking_Method, Intake,Intake_BW\n", + "FROM CSVREAD('../../data/food-intake/consumption/Commodity_CSFFM_Intake_0510-cropped.csv');" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Visualizando as Tabelas" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "b89a3f80-02b3-4acb-bb7c-5d3d4f855e42", + "version_major": 2, + "version_minor": 0 + }, + "method": "display_data" + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "SELECT * FROM Crop_Group LIMIT 10;" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "9b197073-9158-4939-8e60-adfcfb546c1e", + "version_major": 2, + "version_minor": 0 + }, + "method": "display_data" + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "SELECT * FROM FCID_Description LIMIT 10;" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "10c6feb9-2454-4656-bba0-ece47f008442", + "version_major": 2, + "version_minor": 0 + }, + "method": "display_data" + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "SELECT * FROM Recipes LIMIT 10;" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "7840e179-1311-409f-9ecf-6689a574ee1d", + "version_major": 2, + "version_minor": 0 + }, + "method": "display_data" + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "SELECT * FROM Intake LIMIT 10;" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Métricas\n", + "\n", + "Considere que a tabela Intake registra alimentos consumidos por 1.489 pessoas. Considere as seguintes métricas para um alimento:\n", + "\n", + "| Métrica | Descrição |\n", + "| --- | --- |\n", + "| Popularidade | número de pessoas (dentre as 1.489) que consumiram o alimento |\n", + "| Intake_Sum | total consumido do alimento pelas 1.489 pessoas em gramas |\n", + "| Intake_AVG | média de consumo do alimento em gramas |\n", + "| Intake_AVG_BW | média de consumo do alimento x peso da pessoa |\n", + "| Recipes | número de receitas (dentre as 7.154 receitas) que têm o produto como ingrediente |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1) Construa uma View que apresente essas métricas por produto\n", + "\n", + "* Veja exemplo em: `/data/food-intake/computed/commodity-profile.csv`\n", + "* Importante: esta tabela foi feita com um número maior de registros, portanto os valores não serão iguais aos seus" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2) Como você analisaria a correlação entre as métricas?\n", + "\n", + "* Por exemplo, produtos mais populares são mais consumidos (em número de pessoas ou em quantidade)?\n", + "* Proponha uma ou mais queries para fazer esta análise" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3) Podemos criar grupos de consumidores conforme um perfil?\n", + "* por exemplo, consumidores podem ser agrupados por alimentos que comem predominantemente?\n", + "* como você associaria grupos a classes?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4) Que métricas podem ser analisadas para a comparação de perfis?\n", + "* escreva uma query SQL que calcule pelo menos uma métrica comparativa" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "SQL", + "language": "SQL", + "name": "sql" + }, + "language_info": { + "codemirror_mode": "sql", + "file_extension": ".sql", + "mimetype": "", + "name": "SQL", + "nbconverter_exporter": "", + "version": "" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": false, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": false, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}