The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is a system of ongoing health-related telephone surveys designed to collect data on health-related risk behaviors, chronic health conditions, and the use of preventive services from the non-institutionalized adult population (≥ 18 years) residing in the United States. The BRFSS is administered and supported by CDC's Population Health Surveillance Branch, under the Division of Population Health at CDC's National Center for Chronic Disease Prevention and Health Promotion.
Originally, the dataset come from the CDC (1) and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. As the CDC describes: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.". The most recent dataset (as of February 15, 2022) includes data from 2020. It consists of 401,958 rows and 279 columns. The vast majority of columns are questions asked to respondents about their health status, such as "Do you have serious difficulty walking or climbing stairs?" or "Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes]". In this dataset, We noticed many different factors (questions) that directly or indirectly influence heart disease, so we decided to select the most relevant variables from it and do some cleaning so that it would be usable for machine learning projects (2).