Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graded Assignment -4 (May Term 2024):- Redesigning The Hindu Data Point Stories #31

Open
Jimmi-Kr opened this issue Jul 21, 2024 · 38 comments

Comments

@Jimmi-Kr
Copy link
Collaborator

For this assignment, we'll use data stories from The Hindu Data Point. Use what you have learned in Week 4 & Week 5 for doing this assignment.

Select a story that you like, study it carefully, and redesign it. Specifically, we want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:

  • What is the story the author is trying to tell?
  • What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.
  • How is it encoded, what problems are with it, and how have you attempted to improve it?

You may choose to expand or curtail the scope of the data used in the story or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you to tell a different, unrelated story.

While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce coherent documentation.

For reference, take a look at what the previous batches (2019,2020,2021, 2022 )did with this assignment.

@sahilrajpal121
Copy link

sahilrajpal121 commented Jul 24, 2024

---WIP---
Name - Sahil Rajpal
Roll - 21f1006804

Original Article: Share of Women across Employment Sectors (link)

Summary:
The recently released Annual Survey of Unincorporated Sector 2022-23 reveals that the share of women owners and workers in unincorporated enterprises is relatively high in the southern states of India. The unincorporated sector encompasses a variety of jobs, from street vending to tailoring and car repair, which require different levels of capital and skill. This sector includes individual-operated or self-employed enterprises involving unpaid family members or paid workers. It excludes agricultural establishments, registered companies, and public sector/government companies.

Key Insights:

  • The share of women in the unincorporated sector is notably high in southern states and some eastern states but remains low in western, northern, and central states.
  • Telangana leads the country with women constituting 41% of the workforce in unincorporated enterprises, followed by other southern states, West Bengal, and Odisha.
  • Women in southern states form a higher share of the overall workforce and a significant share among worker-owners.
  • Across India, women are predominantly present among unpaid family workers, often taking no payment and having little say in the enterprise operations.

Visual Representation:
Screenshot (735)

The provided chart is a scatter plot with circles representing states, differentiated by regions through colors. It categorizes women workers into unpaid family members, informal/formal hired workers, and working owners in various sectors of unincorporated enterprises. The southern states are positioned towards the right, indicating a higher share of women in the workforce.

Visual Critique:

  1. Current Chart Analysis:

    • The chart is a scatter plot with different sections indicating the share of women across various job types in unincorporated enterprises.
    • Each circle represents a state, with color coding to differentiate regions.
  2. Issues with the Current Chart:

    • It can be challenging to identify and compare specific states effectively due to overlapping circles.
    • The absence of labeling can make the chart less intuitive.

Redesign Iterations

PS: All charts below are interactive. Tooltips provide further information about the dataset

Iteration-1

In this attempt, I tested out Strip Plot on a mock, comparatively smaller dataset.
image

  • The plot will become cluttered when tested on the original data.
  • Labels will be unreadable when 21 states are included in the chart.
  • Points are small and will be difficult to distinguish from the original data points.

Iteration-2

Here, I created Treemaps, one for each employment type, sorted by Percentage share of women.

image

  • The goal of the visualization is to show the states having a high share of women workers, with southern states dominating in this aspect.
  • Utilizing multiple treemaps to convey the story helps prevent it from becoming cluttered.

Iteration-3

Furthermore, I visualized the story in the form of a Heatmap.
image

  • Heatmap allows us to express the story in greater detail, providing a clear distinction of shares of women among all states.
  • Enhances readability across all major aspects, such as Employment Type and States.
  • However, it may not be visually appealing to everyone, as it primarily consists of numbers.

Final Iteration

Finally, I improved upon the last iteration and presented the story using a Split Bar Chart

image

  • Visually appealing with clear distinctions on how each State ranks among each Employment Category
  • Easier to read numbers at a glance using Split Bars.

@Ashutosh-tec
Copy link

Ashutosh-tec commented Jul 25, 2024

Name: Ashutosh Kumar Barmwal
Roll: 21f1001709

Documentation
Original Story
Title: Economic Confidence Slips: RBI Survey Reveals Growing Pessimism Amid Inflation Concerns

Redesign Documentation

Story the Author is Trying to Tell

The author highlights a recent decline in economic confidence among urban households in India. The focus is on four key areas:

  1. General Economic Climate: Tracking changes in public perception over time.
  2. Employment Situation: Assessing optimism or pessimism about job prospects.
  3. Price Levels: Understanding public perception of inflation.
  4. Income Levels: Evaluating changes in perceived income levels.

The narrative emphasizes that after a steady recovery post-COVID, confidence in the economy has recently dipped.

Data Used to Tell the Story

Data Details

  • Type of Data: Survey responses from urban households across 19 cities.
  • Extent of Data: Collected bi-monthly over several years.
  • Dimensions of Data:
    • Perception of the general economic climate
    • Perception of employment situation
    • Perception of price levels
    • Perception of income levels
  • Gaps in the Data: The data may not fully represent rural areas or smaller cities.

Essential vs. Irrelevant Data

  • Essential: Data on the general economic climate, employment situation, price levels, and income levels.
  • Irrelevant: Data not directly related to economic perception, such as demographic details of respondents not used in the analysis.

Visual Encoding and Problems

Current Encoding

  • Charts: Line charts depicting changes in perceptions over time.

image

image
3.
image
4.
image

  • Problems:
    • The visuals may appear cluttered.
    • Trends might not be immediately clear.
    • Text explanations might not be well integrated with the visuals.

Improvements Attempted

  1. Clearer Segmentation: Use distinct colors for each key aspect (economic climate, employment, price levels, income levels) to reduce clutter.
  2. Consistent Color Coding: Implement a consistent color scheme to differentiate between positive and negative perceptions.
  3. Annotated Highlights: Add annotations to highlight key turning points or significant changes.

Redesign Process

  1. General Economic Climate & Employment Situation

    • Original: Single line chart.
    • Redesign:
      • Use two lines: one for "Improved" and one for "Worsened."
      • Annotate key events (e.g., significant economic policies, global events).
      • Highlight the recent decline in confidence.
  2. Price Levels & Income Levels

    • Original: Combined line chart.
    • Redesign:
      • Use distinct lines for "Increased" and "Decreased" perceptions.
      • Consistent color scheme (e.g., blue for increased, red for decreased).
      • Annotations to explain the persistent high perception of increased prices and income changes.

Redesigning Charts: Trying to get the data.

Thank You,

@neeraj-iit
Copy link

neeraj-iit commented Jul 25, 2024

Name: Neeraj Yadav
Roll No: 21f1005729

Main Story:
The article provides insights into the cities that have the highest share of students scoring 650 or above in the NEET UG 2024 exams. It highlights the significance of these scores for securing admissions in government medical colleges and identifies top cities and centers contributing to these high scores.

Data Used:

Type of Data: Quantitative data on student scores from the NEET UG 2024 exams.
Extent of Data: The dataset includes scores from all candidates who appeared for NEET UG 2024, focusing on those scoring 650 and above.
Dimensions of the Data: The data includes variables such as candidate scores, cities, states, and specific educational centers.
Gaps in the Data: The article does not provide detailed demographic information or historical comparison data.
Relevance: Essential data includes candidate scores and their respective cities and centers. Irrelevant data might include unrelated demographic details not covered in the story.

Current Visual Encoding:

Chart 1: A scatter chart displaying the percentage of students scoring above 650 marks across different cities.
Fig 1

Table 2: A table listing the top centers with the highest share of candidates scoring above 650 marks.
Fig 2

Problems with Current Encoding:

Scatter Chart:
Cluttered Data Points: The scatter chart is densely populated, making it difficult to distinguish individual data points.
Color Gradient: The color gradient from 0 to 7.48% might not be intuitive for quick interpretation.
Table:
Limited Information: The table lists the top ten centers but does not provide additional context or comparisons.
Lack of Visual Appeal: The table is plain and could benefit from visual enhancements for better readability.

Redesigning the Visualization

Improvement Plan:

Simplify and Clarify: Create clearer, more intuitive charts that highlight key insights without overwhelming the viewer.
Use Effective Visual Elements: Utilize bar charts, heat maps, and annotated visualizations to emphasize important data points.
Enhance Readability: Ensure all visualizations have clear labels, legends, and titles.

Redesigned Visualizations:

Bar Chart: Displaying the top cities with the highest share of students scoring above 650 marks.
Heat Map: Showing the concentration of high scores across different states.
Annotated Visuals: Highlighting the top-performing centers and cities.
Redesigned Bar Chart:

Figure_1

Redesigned Heat Map:

Figure_2

Documentation

Original Story:

Link to the original story: NEET UG 2024: Data reveals top cities for high-scoring candidates

Redesign Documentation:

Bar Chart: The bar chart simplifies the data by focusing on the top cities, making it easier to compare their performance.
Heat Map: The heat map provides a clear visual representation of high score concentrations across states.
Annotations and Highlights: Annotations emphasize key data points, such as the highest-performing city, to draw the viewer's attention.

These redesigned visualizations aim to improve the clarity and storytelling of the data, making it more accessible and easier to interpret for the audience.

@45sajal
Copy link

45sajal commented Jul 26, 2024

Name: Sajal Dhingra
Roll: 21f2001213

STORY TAKEN FOR REVIEW

Title: A green wealth tax in Budget 2024

Story which publisher is trying to convey

The new government in India is set to present its Budget 2024, addressing critical issues of unemployment and inequality. A proposed solution is a wealth tax-financed Indian Green Deal (IGD) aimed at tackling climate change, inequality, and joblessness. Rising inequality and the carbon footprint of the wealthiest 10% in India have contributed to increased emissions, driven by their consumption of carbon-intensive goods.

The IGD would focus on green energy, infrastructure, and the care economy (health and education), modeled after the 2020 Atmanirbhar package. It proposes spending 10% of GDP over ten years: 5% on infrastructure, 3% on the care economy, and 2% on green energy. This investment could create 38.6 million jobs, representing 8.2% of the labor force.

Funding the IGD would require a wealth tax of approximately 1.7%, which could decrease to 1.3% by 2032 due to the projected rise in wealth of the Indian elite. This approach aims to showcase India as a leader in climate action while addressing socioeconomic disparities.

Data used in the article

Expenditure Data

Type: Quantitative data on spending patterns across different commodities by the Indian elite and average Indians.
Extent: Current spending patterns.
Dimensions: Ratios of expenses on various commodities, differentiated between the Indian elite and average Indians.
Essential: Yes, to show the consumption patterns driving carbon emissions and justify the wealth tax.

Carbon Emission Data:

Type: Quantitative data on per capita carbon footprints.
Extent: Comparative analysis of the top 10% of Indian population vs. an average Indian and a first-world citizen.
Dimensions: Carbon emissions by consumption categories such as housing, industrial goods, transport, and clothing.
Essential: Yes, to link wealth inequality with environmental impact and justify the green aspect of the IGD.

Projected Wealth and Tax Rate Data:

Type: Quantitative projections of wealth growth among the Indian elite and the corresponding declining wealth tax rate.
Extent: Projections up to 2032.
Dimensions: Wealth in million crores and tax rate percentages.
Essential: Yes, to support the feasibility and sustainability of financing the IGD through a wealth tax.

Current encoding visuals

This chart shows the Carbon Emission of the top 10% of Indian population vs. an average Indian and a first-world citizen.
Light Blue colour -> first-world citizen
Dark Blue colour -> Indian elite (top 10%)
Red Colour -> An Average Indian.
img_01_dvd_04

This chart shows the ratio of the expenditure by an Indian elite to an average Indian
img_02_dvd_04

This chart shows projected rise in wealth of Indian elite.
img_03_dvd_04

This chart shows the projected decline in Tax Rate
img_04_dvd_04

Problems with visual encoding

1) The colors used in the chart are similar and not easily distinguishable for all viewers, especially those with color vision deficiencies.

2) The axes are not labeled beyond the general title, which can make interpretation more difficult.

3) The axis labels are missing, which can make it difficult to interpret the chart.

4) The bar chart could benefit from a more visually appealing design with varying colors or patterns.

Redesigning the visuals

Informative Title and Axis Labels: Providing a clear and descriptive title along with appropriately labeled axes ("Expenditure (Million Crores)" for the y-axis and "Year" for the x-axis) enhances the viewer's understanding of what the chart represents.

Gridlines and Data Points: Adding gridlines and marking data points helps in better visualizing the trends and specific values, making the data easier to interpret and analyze.
img_05_dvd_04

Enhanced Clarity and Distinction: The use of distinct colors (light blue for first world citizens, dark blue for elite Indians, and red for average Indians) clearly differentiates the data series, making it easier to interpret the trends and comparisons.

Detailed Annotations and Gridlines: Adding gridlines improves readability and precision, while annotations on data points provide immediate reference values, making the data more comprehensible at a glance.
img_06_dvd_04

@bhumikaxyz
Copy link

bhumikaxyz commented Jul 26, 2024

About Me

Name: Bhumika Taneja
Roll Number: 21f1006329

Original Article : Diseases with higher burden in Asia and Africa lack research funding

What is the author trying to convey with this story?

The author highlights the significant disparity in research funding and attention between neglected tropical diseases (NTDs) and more prominent diseases like COVID-19, HIV/AIDS, tuberculosis, and malaria. Despite the massive burden these diseases place on impoverished populations in tropical and subtropical regions, they receive substantially less funding and resources. This underfunding perpetuates a cycle of poverty and disease, causing long-term disabilities, social stigma, and economic burdens that hinder development and deter investment in treatments. The article underscores the urgent need for increased funding and attention to NTDs to break this cycle and alleviate the suffering of millions.

Key Points:

  • NTDs affect millions in poor, tropical, and subtropical regions but receive far less funding.
  • COVID-19 received $4.22 billion in research funding in 2022, while NTDs like dengue and chikungunya received only $10-$80 million each.
  • NTDs contribute to poverty through long-term disabilities, social stigma, and economic burdens.
  • Success stories like Guinea worm eradication show what can be achieved with focused efforts and funding.
  • There's a critical need for more resources, research, and development to combat NTDs effectively.

What data is he/she using to tell the story?

The following charts are included in the original Hindu article.

Plot 1: Research Funding by Disease in 2022
Screenshot 2024-07-26 224734

Type of Data:
Categorical Data: Different diseases.
Quantitative Data: Research funding amounts for each disease in 2022.

Extent of the Data:
Temporal Extent: Data for the year 2022.
Financial Extent: Funding amounts ranging from a few million dollars to over $4 billion.

Dimensions of the Data
Diseases: List of diseases receiving research funding.
Funding Amount: The specific amount of research funding allocated to each disease.

Gaps in the Data:
The data only shows funding for 2022 without historical comparison.

Essential Data:
Funding Amounts: Crucial for understanding the level of research investment in each disease.
Disease List: Important to identify which diseases are prioritized in funding.

Plot 2: Research funding for different health technologies from 2007 to 2022.
Screenshot 2024-07-26 224823

Type of Data:
Funding Data: Research funding for different health technologies (vaccines, drugs, biologics, diagnostics & diagnostic platforms, basic research) over the period from 2007 to 2022.
Statistical Data: The amount of funding allocated to various technologies related to disease research.

Extent of the Data:
Temporal Extent: Covers research funding data from 2007 to 2022.
Financial Extent: Funding amounts ranging from $0 to $5 billion.

Dimensions of the Data:
Temporal Dimension: Yearly data points from 2007 to 2022.
Financial Dimension: Funding amounts in billions of dollars.
Technological Dimension: Different categories of research: vaccines, drugs, biologics, diagnostics & diagnostic platforms, basic research.

Gaps in the Data:
Lack of Disease-Specific Funding: The graph does not break down the funding data by specific diseases, making it difficult to see how much is allocated to NTDs versus other diseases.
Lack of Geographical Data: The graph does not show how the funding is distributed geographically, which could be relevant to understanding global research priorities.

Essential Data:
Funding Trends: The trend in funding over time for different technologies is crucial for understanding research priorities and shifts.
Comparison Across Technologies: Showing funding amounts for different research technologies helps highlight disparities and areas of focus
.

I will be redesigning the second plot for the purpose of this assignment.

Data Encoding and Potential Improvements

Current Encoding

  • Line Graph: Each technology's funding over time is represented by a separate line.
  • Color Coding: Different colors are used for each type of research (vaccines, drugs, biologics, diagnostics, basic research).

Problems with the Current Encoding

  • Overlapping Lines: The lines overlap in some places, making it difficult to distinguish between them.
  • Color Distinction: Some colors are too similar and might not be easily distinguishable, especially for those with color vision deficiencies.
  • Lack of Annotations: Important events or changes in funding (like the spike in vaccine funding in 2020) are not annotated for better understanding.
  • Scale Issues: The large spike in vaccine funding in 2020 might overshadow smaller but still significant changes in funding for other technologies.

Suggested Improvements

  • Color Choice: Use a color palette that is distinguishable and colorblind-friendly.
  • Annotations: Add annotations for significant events or changes in funding trends.
  • Log Scale: Consider using a log scale for the y-axis if there are large disparities in funding amounts, to better visualize smaller trends.

Redesigned Plot

image

Plot Details

  1. Single Line Chart: All research funding data (Vaccines, Drugs, Biologics, Diagnostics, Basic Research) are plotted on a single line chart.
  2. Color Coding: Each type of research is represented by a distinct color.
  3. Legends: A legend is provided in the top-left corner to identify the lines.
  4. Annotations: An annotation highlights the funding surge for vaccines in response to COVID-19 in 2020.

Encoding Improvements

  1. Distinct Colors: The plot uses distinguishable colors for each research type, making it easier to differentiate between the lines.
  2. Annotations: Key events, such as the funding surge for vaccines in 2020, are annotated for better context.
  3. Clear Legends: Legends help in identifying which line corresponds to which type of research

@SURAJARS
Copy link

Name: Suraj ARS
Roll Number: 21f1005229

Original Article : On unemployment in Indian States

Story of article in view of author

The article provides an analysis of unemployment in major Indian states, excluding Union Territories, using data from the Periodic Labour Force Survey (PLFS) of 2022-23. It focuses on individuals aged 15 and above and highlights the disparities in unemployment rates across different states. Goa has the highest unemployment rate at almost 10%, followed by other relatively wealthy states like Kerala, Haryana, and Punjab. The analysis reveals that states with a higher proportion of self-employment have lower unemployment rates, and more urbanized states tend to have higher unemployment rates due to fewer informal job opportunities. The link between education and unemployment is also explored, showing that states with a higher percentage of educated individuals, such as graduates, tend to have higher unemployment rates, possibly due to a mismatch between skills and job requirements or because graduates aspire to high-wage jobs that are not available in sufficient numbers.

Key Findings

  1. Goa has the highest unemployment rate at almost 10%.
  2. Other relatively wealthy states like Kerala, Haryana, and Punjab also have high unemployment rates.
  3. States with a higher proportion of self-employment have lower unemployment rates.
  4. More urbanized states tend to have higher unemployment rates due to fewer informal job opportunities.
  5. A link between education and unemployment is observed for example states with a higher percentage of educated individuals, such as graduates, tend to have higher unemployment rates.

Charts present in hindu article

Chart 1: Umemployment across Indian States
Unemployment across Indian States

Type of Data: Quantitative data on Unemployment rates across Indian States 2022-23
Extent of Data: The dataset includes unemployment rates on comparing Indain states 2022-23
Dimensions of the Data: This data has Indian States and Unemployment rates in percentage
Gaps in the Data: This article shows only umemployment rates in 2022-23 without historical comparison data.
Essential data: Need to know education funding,Number of universities , Policy Schemes across Indian States.

Chart 2: Self Employment Vs Unemployment
Self Employment vs Umemployment

Type of Data: A scatter plot data of self employment vs unemployment
Extent of Data: The dataset includes comparison between self employment and unemployment in 2022-23
Dimensions of the Data: This data has unemployment rate and share of self employment rate
Gaps in the Data: This article shows comparison between self employment and unemployment in 2022-23 without historical comparison data.
Essential data: Need to know various divisions of self employment across Indian States.

Current Encoding

Horizontal Bar Graph : Each State represented with bar about unemployment rate
Scatter Plot : Compares self employment and Unemployment then dots represents States

Problems with the Current Encoding

Horizontal Representation : The horizontal representation of Indian States, making it challenging to recognize them.
Color Variation: Utilization of Single essential Color with higher rate utilizing dark and bring down rate utilizing light and probably won't be quickly recognizable
Different colors : Each state with various colors to be used in scatter plot.

Suggested Improvements

Vertical Representation : The vertical representation of Indian States, making it easier to recognize them.
Variety of Gradient Colors : Utilize a variety range that is recognizable and visually challenged well disposed.

Redesigned Plot

Major Indian States

Plot Details

Pie Chart: This reprsentation conveys top five states unemployment rates and rest of states fall under others classification.
Color Coding: Each Indain state is addressed with distinct color.
Legends: A legend is provided in the top rightcorner to identify each section.

@trxpti
Copy link

trxpti commented Jul 27, 2024

Assignment 4

Name - Tripti Arya
Roll Number: 21f1005935

Link to the Original Article: Nepal’s treacherous skies : With 741 plane crash deaths, country ranks 11 of 207 nations

The Story Behind the chosen article
Authors Vignesh Radhakrishnan and Jasmin Nihalani highlight Nepal's high plane crash fatalities despite low air traffic over country's air space. Nepal's mountainous terrain and rapidly changing weather make its airports notoriously dangerous. The recent Saurya Airlines crash, killing 18 people, brings the total fatalities to 741, ranking Nepal 11th out of 207 nations in fatalities per departures. Since 1996, Nepal has experienced 54 crashes, ranking 33rd globally. Despite being 78th in departures, Nepal's high fatality rate aligns it with countries like Nigeria and Pakistan. The article urges authorities to address these safety concerns to prevent further loss of life.

Key Findings

  1. Nepal has relatively low air traffic but a high number of fatalities from plane crashes.
  2. A recent Saurya Airlines crash resulted in 18 deaths, underscoring the ongoing issue.
  3. Nepal has had total of 741 plane crash fatalities, ranking 11th out of 207 nations in fatalities per total departures.
  4. Country experienced 54 plane crashes since 1996, ranking 33rd globally.
  5. The high fatality rate is due to Nepal's mountainous terrain and rapidly changing weather conditions.
  6. Nepal is similar to like Nigeria and Pakistan, which have low air traffic but high fatality rates.

Provided chart for better understanding
Chart 1: Nepal and other countries in terms of Plane crashes

image

Chart 2: Nepal and other countries in terms of Total fatalities due to Plane crashes.

image

Description for Chart 1 and 2:

  1. This chart is having treemap representing hierarchical data where each rectangle corresponds to a country.
  2. The data points here showing frequency in terms number of plane crashes so far in given countries.
  3. The size of each rectangle indicates the relative number of plane crash and fatalities happened. Nepal with red color rectangle showing high fatality rate.
  4. In conclusion, hte chart shows a visual comparison of plane crash fatalities across different countries.

Chart 3: Fatalities in crashes against departures by air carriers in different countries

image

Description for the above chart:
This chart is a scatter plot using Quantitative data, continuous in nature representing the relationship between the number of fatalities and the number of departures in various countries.

  1. The x-axis representing the number of departures, while the y-axis represents the number of fatalities.
  2. Each dot represents a country, with Nepal highlighted in red(with 741 fatalities and 637,307 departures) .
  3. Both axes use a logarithmic scale to accommodate a wide range of values.
  4. The chart shows that despite lower numbers of departures, some countries, including Nepal, have high fatalities, indicating a disproportionate rate of plane crash deaths relative to air traffic.

Chart 4: Number of air crashes and fatalities of each airline in Nepal

WhatsApp Image 2024-07-27 at 4 20 43 PM

Description for the above chart:
This chart is a bar graph with Quantitative and categorical Data representing the number of plane crashes and fatalities across various airlines in Nepal.

1.The y-axis lists different Nepalese airlines.
2. Each bar is divided into two segments: red representing the number of crashes and blue representing the number of fatalities.
3. The chart allows for a visual comparison of crashes and fatalities among different airlines.
4. Yeti Airlines and Tara Air have notably high numbers of crashes and fatalities compared to other airlines, indicating significant safety issues.

Problems and Improvements

  1. Treemap Issues:
    Clutter: Small rectangles make it hard to compare lower values.
    Detail Loss: Small countries or those with fewer crashes/fatalities might be invisible due to color encoding.

  2. Scatter Plot Issues:
    Color Encoding: Color distribution to other data points makes it harder to distinguish different data points other than the main one.
    Overlapping Dots: Similar values lead to indistinguishable data points.
    Logarithmic Scale: Confusing for viewers at a first glance unfamiliar with it.

  3. Bar Graph Issues:
    Segment Confusion: Red and blue segments may lack sufficient color contrast.
    Comparison Difficulty: Segmented bars complicate value comparison.

Improvements:

  1. Treemap: we can improve this by adding a color gradient and tooltips for better distinction and visibility.
  2. Scatter Plot: Applying transparency and interactive elements to reduce overlap and enhance exploration.
  3. Bar Graph: by Enhancing color contrast between segments, sort airlines by fatalities, and adding labels and annotations in order to make it more clear.

Redesign for given charts

1. Replacing Treemap(showing Fatalities data due to plane crashes) with Ordered Bar chart
image
Instead of using a treemap, we can use an ordered bar chart to present the underlying data more effectively. This chart displays only the top 25 countries with the highest number of fatalities in plane crashes over the past few years. The ordered bar chart helps to clearly show the rankings and removes less significant data points from the visual, making it easier to interpret. and the enhanced color gradients of bars making visual more eye catching.

2. Bar graph showing Fatalities over plane crashes
image
In my point of view this bar graph is a better choice compared to the one in the article because it clearly illustrates the ratio of fatalities to airline crashes that occurred in Nepal's airspace over the past few years. The graph provides a clear and concise message, making it easier to understand the extent of the fatalities in relation to the crashes.

Conclusion:
The above article is crucial for raising awareness among authorities and stakeholders about Nepal's air accidents. Equally important, however, is presenting data visuals that stand out, create a more significant impact, and help drive improvements in aviation safety. with the help of effective data visualization we can highlight critical issues and trends, encouraging action to enhance safety measures and prevent future tragedies.

@Kirupa-Krishan
Copy link

Name: Kirupa Krishan G

Roll_No: 21f1006352

Story Overview:

The article discusses the significant increase in the cost of a home-cooked vegetarian meal (thali) in Maharashtra over the last five years compared to the relatively modest salary rise between salaried and daily wage labourers. The key point is the growing disparity between food costs and income, highlighting the strain on households, especially those with daily wages.

Source : Link

Data Used:

Type of Data:

  • Commodity Prices: Prices of ingredients for a vegetarian thali (e.g., rice, dal, vegetables).
  • Income Data: Average wages and salaries in Maharashtra. Per day and monthly.

Extent and Dimensions:

  • Temporal Extent: 2019, 2023, and 2024 data.
  • Geographical Extent: Maharashtra, India.
  • Metrics: Prices of individual ingredients, total cost of two veg thalis, daily and monthly wages.

Gaps in the Data:

  • Regional Specificity: Only Maharashtra data is used, which might not represent other regions.
  • Non-Vegetarian Meals: Data is missing for non-vegetarian meals.
  • Inflation Adjustment: No explicit adjustment for inflation.
  • Missing Values: The data for the years 2020,2021,2022,2023 could have be added.

Data Details:

  • Essential Data: Prices of thali ingredients, average wages and salaries, cost comparison over time.
  • Irrelevant Data: Any non-food-related economic indicators not directly tied to the analysis.

Data Encoding:

  • Tables and Charts: Data is presented in tables showing the cost of ingredients, total meal costs, wages, and the percentage of income spent on food.
  • Narrative: Explains the methodology and findings.

Problems and Improvements:

Problems:

  • Visualization: The data is presented solely in tables, which does not effectively illustrate the extent of price increases and wage differences over the past five years.

Improvements:

  • Visualization: Incorporate charts or graphs to visually represent the changes in commodity prices and wages, highlighting the percentage increase and the relative differences over time. This can provide a clearer and more impactful understanding of the data trends.

Chart 1:

table1

Chart 2:

table2

**Chart #3:**

table3

Chart 4:

table4

Redesigned Charts:

Visualization of Price and Wage Changes(Interactive chart)

View the interactive chart on Flourish

Redesign Chart 1:

table_re1

Redesign Chart 2:

table_re2

Redesign Chart 3:

table_re3

Description of the Improved Visualizations:

  1. Scatter Plot: Cost of Commodities for 2 Thalis

    • Title: The table lists the commodities required to prepare two thalis and their retail prices in ₹.
    • Y-Axis Title: Item
    • X-Axis Title: Cost/Kg in Rupees
    • Legends:
      • Blue: 5 years ago
      • Purple: 1 year ago
      • Pink: March 2024
  2. Scatter Plot: Percentage Increase in Commodity Prices

    • Title: The table lists the commodities required to prepare two thalis and their retail prices in ₹.
    • Y-Axis Title: Item
    • X-Axis Title: Cost/Kg in Rupees
    • Legends:
      • Yellow: Increase from 5 years to 1 year
      • Green: Increase from 1 year to March 2024
  3. Line Graph: Average Monthly Salary/Wage vs. Cost of 2 Thali Every Day Per Month

    • Title: Average Monthly Salary/Wage vs Cost of 2 Thali Every Day Per Month
    • Y-Axis Title: Earnings/Cost in Rupees
    • X-Axis Title: Time Period (5 years ago, 1 year ago, As on March 2024)
    • Legends:
      • Blue Line: Average salary earnings of a person during the preceding calendar month from regular wage/salaried employment
      • Red Line: Average wage earnings of a person per month from casual labor
      • Purple Line: Cost of making two thalis every day for a month
    • Annotations: Highlighted points showing the percentage of income allocated for food costs.

Redesign Process:

To redesign the visualization, I first identified the key data points: the changes in the cost of a vegetarian thali and average wages over five years. I then created line graphs to depict these trends, making the changes more visually accessible. To highlight the percentage increases in commodity prices, I used bar graphs, which facilitated easier comparison of relative changes. Annotations and labels were added for clarity, providing immediate context. Consistent formatting, including units, legends, and labels, was maintained across all visual elements to ensure readability and enhance overall comprehension.

@prashantjnvu
Copy link

prashantjnvu commented Jul 28, 2024

Name: Prashant Sharma
Roll Number: 21F1004586

Title: Which topics are India’s researchers publishing papers on?

Data Source: https://www.thehindu.com/data/which-topics-are-indias-researchers-publishing-papers-on/article68410121.ece

1: Story the Author is Trying to Tell:

The author is analyzing and comparing the research focus of scientists from different countries, particularly India, US and China, over the last 20 years and the last five years. The story highlights the dominant research focus area; such as Health, AI, Clean/Green energy, Astronomy, Network and Communication, Social wellbeing and Nanotechnology, and discusses how these focus areas reflect the scientific and technological priorities of the countries involved. The analysis aims to show trends and provide insights into how different nations allocate their research resources and how these decisions align with global scientific challenges and opportunities.

2: Data Used to Tell the Story:
Type of Data:

  • The data consists of research publication counts from the Web of Science database.

  • It includes quantitative data on the number of papers published on various research topics. Numerical (Interval) data.

      Extent of the Data:
    
  • The data covers a span of 20 years and also focuses specifically on the last five years. Numerical (Interval) data.

     Dimensions of the Data:
    
  • Research topics (e.g., coronavirus, deep learning, photocatalysis, nanotechnology). Nominal Data

  • Time periods (last 20 years, last five years). Numerical (Interval) data.

  • Geographic distribution (India, U.S., China). Nominal Data

      Gaps in the Data:
    
  • From the point of view of the intent of the story data lacks geographical extent (more number of countries) and segmentation of years (segments of multiple 5 years to understand the long term and short term research commitments.
    Essential Data:
  • Number of publications on key research topics

  • Trends over time

  • Comparative data across different countries

      Irrelevant Data:
    
  • Specific journal names or publication venues unless they indicate trends in quality or focus areas.

3: Encoding and Problems:
Encoding:

  • The data is encoded in textual descriptions and charts ranking the research topics by publication counts for respective countries.

  • The charts visualize the top five research topics in selected countries over specific time periods – 5 years and last 2 decades.

     Problems (Data Visualization):
    
  • The charts might not include enough context or explanations for the observed trends.

  • No comparative data visualization to understand the –

  • Variations in research focus of a country from research area perspective and research (broad) category perspective.

  • Variations or transitions in research focus of each country.

  • Comparative standing of research output in common research categories between the countries.

     Improvements (Data Visualization):
     Adding a graph to –
    
  • Visualize the comparisons of overall research output of countries in the last 5 years and last 2 decades per category.

  • Visualize country-wise contributions in research output per research category of top 5 research areas in last 5 years and last 2 decades.

  • Visualize the comparisons of overall research output per top 5 research area in last 5 years and last 2 decades.

  • Visualize the qualitative comparisons of overall research publication per category per country in the last 5 years and last 2 decades as heatmap.

  • Visualize the publication contributions as % contribution in the research categories of top 5 research areas in the last 2 decades and in the last 5 years.

Existing Graphs:
Chart 1 | The chart ranks the five topics under which the highest number of papers were published (2019- 2023) in select nations.

Last_5

Chart 2 | The chart ranks the five topics under which the highest number of papers were published (2004- 2023) in select nations.

Last_10

Graph Additions:

To make the intended story more impactful and easy to understand from comparision point of view:

(1) Visualize the comparisons of overall research output of countries in the last 5 years and last 2 decades per category.
This visualization will help in critically compare –

  • Which country is contributing more historically and recently? (India is contributing least among US, China and India)
  • How does the last 5 years overall research paper contribution stand in comparison to the last 2 decade contribution? (China is producing lot more research paper faster than its previous rate)

1

(2) Visualize country-wise contributions in research output per research category of top 5 research areas in last 5 years and last 2 decades.
This visualization will help in critically compare –

  • Is particular country’s focus is narrow or diversified? (in last 5 years diversified for India)
  • Comparative (country-wise) research output per country and category.

2

(3) Visualize the comparisons of overall research output per top 5 research area in last 5 years and last 2 decades.
This visualization will help in comparing critically –

  • Which research topics survived from the year prior to 2019 and comparisons of it from last 5 year publication to gauge change in focus, if any? AI, Clean/Green Energy, Health (overall focus areas are diversified in last 5 years) and Nanotechnology (overall focus area has slimmed down in nano technology in last 5 years)
  • Which research topics lost the focus? – Astronomy, Networks and Communication and Social well being.

3

(4) Visualize the qualitative comparisons of overall research publication per category per country in the last 5 years and last 2 decades as heatmap.
This visualization will help in comparing critically –

  • Which country is publishing at what scale per research category ? (Clean/Green energy research focus of china in last two decades but recently it get low focus in this field, Health has always been a focus area for US)
  • Which research areas have become out of favor recently or have come into focus recently ? (In the last 2 decades AI was not in the top 5 research focus areas but in the last 5 it is for US and India.)
  • How are the countries stacked in terms of publishing count (Qualitatively) for common top 5 research areas ? 9India’s publishing count is lowest among US and China in areas - Health, AI, Clean/Green energy and NanoTechnology)

4

(5) Visualize the publication contributions as % contribution in the research categories of top 5 research areas in the last 2 decades and in the last 5 years.
This visualization will help in comparing critically –

  • Which research area is overall favored ? (AI has outshone the publication in the last 5 years as compared to 2 decade count of publication, More research papers have been published at higher rate in the field of Health, Nano Technology publications have been rather low.)
  • Which research areas have become out of favor recently ? ( In the last 5 years publication count in Astronomy, Networks and communication and Social Wellbeing have not been favored)

5

@irshad747
Copy link

Name : Irshad Sareshwala
Roll Number : 21f1004835

Original Story :2024 polls: How people in high and low income areas voted in Chennai’s Mylapore, T Nagar and other areas(https://www.thehindu.com/data/2024-polls-how-people-in-high-and-low-income-areas-voted-in-chennais-mylapore-t-nagar-and-other-areas/article68427083.ece)

  1. What is the story the author is trying to tell?

The original story aims to analyze the voting patterns in Chennai's Mylapore, T Nagar, and other areas based on income levels. It shows that the DMK has a stronghold among urban poor voters, while the BJP has better support among wealthier voters.

  1. What data is being used to tell the story?

The data used in the story includes:

Polling station data listing areas and polling stations.
"Form-20" data from the Election Commission showing party-wise votes polled in each polling station.
Guideline values of streets/areas as a proxy for wealth/income.
Details of the data:

Type of data: Quantitative (votes, guideline values)
Extent of the data: Data from the 2024 Lok Sabha elections for Chennai's three Lok Sabha seats.
Dimensions of the data: Vote shares by party, guideline values by street/area.
Gaps in the data: Potential inaccuracies in using guideline values as a sole indicator of income.
Essential data: Vote shares, polling station details, guideline values.
Irrelevant data: Additional demographic details not directly related to voting patterns.

  1. How is it encoded, what problems are with it, and how have you attempted to improve it?

Original Encoding:

The original visualization uses scatter plots with red and blue dots representing DMK and BJP vote shares, respectively, across different streets/areas.
Streets/areas are arranged based on their guideline values from high to low.
Problems:

The scatter plot may not clearly show the relationship between income and voting patterns.
The use of only two colors may not be sufficient to differentiate between multiple data points in a small area.
Lack of interactivity to explore specific data points in detail.
Improvements:

Use a more intuitive visualization method, such as a bar chart or heatmap, to show the correlation between income levels and vote shares.
Add interactivity to the visualization to allow users to hover over data points for more details.
Incorporate additional datasets, such as demographic information, to provide more context to the voting patterns.

Redesigning the visualization:
Screenshot (455)
Final Visualization:
The redesigned visualization uses a dual-axis bar chart to show vote shares by area and guideline values. It includes interactive elements to provide detailed information about each data point.

Conclusion:
The redesigned visualization effectively conveys the relationship between income levels and voting patterns in Chennai, enhancing clarity and interactivity compared to the original scatter plot.

@21f1006304ds
Copy link

21f1006304ds commented Jul 28, 2024

Name : Rajesh Saha
Roll No. 21F1006304

Subject : A green wealth tax in Budget 2024

Article Link :

A green wealth tax in Budget 2024

Author's story:

The author is proposing a green wealth tax for Indian Elite (top 10% of Indian Population in terms of CO2 emission). The author has also shown that this tax would decline over time, still the target would be achieved. To support this proposal, the author has shown 3 cases - 1) CO2 emission by Indian Elites comparing with developed countries and Indian average population, 2) How much money India needs to tackle IGD (Indian Green Deal), 3) How this can be achieved.

Data used by author's :

The author has used numerical data (in USD), categorical data (Elite Indians, Average Indians), Ratio (CO2 emission). The source of the data was not mentioned in the article.

Data Collection for redesign:

Data Collected from the above links and visualization. Data has been collected manually by hovering mouse at different data points.

Redesign attempt

For redesigning purpose, all the above visualizations would be visited and then modifications would be done with the same data.

1. CO2 emission of Indian Elites comparing with developed countries and Indian average population

The author has divided this into 2 parts - A) Comparison of CO2 emission among developed country, Indian Elite and Average Indian Elite, B) The sector-wise comparison of CO2 emission between Indian elite and average Indian population.

1A. Comparison of CO2 emission among developed country, Indian Elite and Average Indian Elite
The author has used line chart as below.
image

In the above, the story is clearly coming that how Indian elite is catching with the emissions in developed countries and how that is higher than average Indian. So, I have redesigned this in the same line.
I have used line chart with more grid lines, properly adding the legends and also provided a selectable search box so that, if someone wants to hide one or more lines and wants to see only few charts, (s)he can do that. I have also chosen the color differently such that red has been chosen for Indian Elites CO2 emission which we need to tackle with, where as the emission by average Indian is not increasing at higher rate, so that is colored as green, the CO2 emission for developed country is declining, so that is marked in blue (eg, the standard color for Electric Vehicle logo is Blue).
image

1B. The sector-wise comparison of CO2 emission between Indian elite and average Indian population
Here, the author has shown the ratio of CO2 emission by average Indian and Indian elites.
image

There are 2 problems here - 1) For ratio, the author has chosen different denominator for different sectors, 2) the relative comparison is not clearly shown. As a result, the sector-wise comparison is not coming out clearly (eg, it looks like Housing has the maximum contribution) and no clear comparison between Indian average and Elite is coming out.

I have redesigned this as a column chart after normalizing the base at 1, ie, all the ratio has been shown as 1:x, where 1 is the emission by average Indian and x is the emission by elite Indian.

image

This redesigned chart shows that "Health and Education" sector is the highest contributor by elite Indians in terms of ratio for the same by average Indian.

2. How much money India needs for IGD

The author has presented this in 2 different donut charts showing the expected investment money and the employment created. The problems with this design are that - 1) one has to refer 2 different charts for corelating, that is, there is no implicit co-relation is appearing in the chart, 2) color chosen could have been better.

image

image

To redesign this, I have chosen bubble chart, where the investment money is being shown in X axis, job/employment created is shown on Y-axis. Along with that, more relevant color has been chosen, the size of the circle will give idea about employment created.

image

3. How can this be achieved

The author has projected the expenditure by Indian elites between the year 2023 and 2032 and shown that in line chart. In another line chart, the author has shown a declining rate of proposed wealth tax for the same period.
Here also, the author has used 2 different charts and to corelated, one needs to manually observe these charts. Also, this does not show the relation with the money collected as tax.

image

image

To redesign this, I have created bubble chart with varying size and color of dots.
image

This single chart shows 1) The declining tax rates for year 2023 to 2032, 2) The increasing bubble sizes shows the increasing expenditure by Indian elites, 3) The color of dot (Red to Green) shown that the money collected is increasing.

The popup at each data point will show the year, tax-rate, expenditure, money collection.

So, in this proposed concluding chart we can show that the wealth tax is justified and that can be reduced over years. Also, despite the reduced tax rate, the money collection will increase as the expenditure by Indian elite will also increase.

Note:

The visualizations were created using flourish.
Visualizations are available in following links.

https://public.flourish.studio/visualisation/18879865/
https://public.flourish.studio/visualisation/18883275/
https://public.flourish.studio/visualisation/18880429/
https://public.flourish.studio/visualisation/18882520/

@Arvind-Gunasekaran
Copy link

Arvind-Gunasekaran commented Jul 28, 2024

Name: Arvind Gunasekaran
Roll No: 21f1001014
Email: 21f1001014@ds.study.iitm.ac.in

Assignment 4 - REDESIGNING DATA STORY

ARTICLE: India no longer has more losses than wins in Test cricket

(https://www.thehindu.com/data/india-no-longer-has-more-losses-than-wins-in-test-cricket-data/article67945758.ece)

I. Story the Author is Trying to Tell
The author illustrates the evolution and improvement of the Indian cricket team's performance in Test matches over time. The main points include:

  1. Overall Performance Improvement: India's Test match performance has significantly improved, especially in the last few decades, making them a competitive team both at home and away.

  2. Home vs. Away Performance: India has historically been stronger at home, dramatically improving their away performance in the 2000s.

  3. Key Milestones: Specific periods marked significant changes in India's performance, such as the dominance at home in the 1990s and the improved away performance in the 2000s and beyond.

  4. Current Standing: India’s current win-loss ratio is 1.00, marking a balance between wins and losses in Test cricket.

II. Data Used to Tell the Story

  1. Win-Loss Ratios of Different Teams
    The data on win-loss ratios of different teams is fundamentally comparative, offering a statistical look at how various international teams have fared in Test match cricket.
    image
  • Extent: It encompasses the entirety of these teams' Test match histories, providing an extensive overview of their performance records. Key dimensions of this data include the number of matches played, wins, losses, ties, draws, and the resulting win/loss ratios.

  • Gaps: While this data is rich in historical breadth, it lacks depth in terms of explaining the underlying factors contributing to these performance metrics. It does not, for instance, delve into the contextual elements such as home ground advantages, player statistics, or changes in team composition that might explain why some teams have better records than others.

  • How Data is Encoded and Problems with It: The win-loss ratios of different teams are presented in a table format, which, while informative, poses significant challenges in visual comprehension. Tables are excellent for detailed numerical data but can be cumbersome when it comes to visualizing trends and making quick comparisons.

  • Improvements: Here is a redesigned representation of the win-loss ratios of Test cricket teams using a horizontal bar chart. This visualization makes it easier to compare the win/loss ratios of different teams at a glance.
    win-loss ratios

  1. Cumulative Wins, Losses, Draws for India
    This dataset tracks India’s cumulative wins, losses, and draws over a substantial timeline, from the nation's first Test match in 1932 up to 2024.
    image
  • Extent: As a time series dataset, it charts the evolution of India's cricket performance across nearly a century. The primary dimensions are the cumulative counts of wins, losses, and draws, offering a longitudinal view of performance trends.

  • Gaps: A notable gap in this data presentation is its lack of periodic segmentation, which could provide insights into specific eras or phases of performance improvement or decline. Without breaking down the timeline into distinct periods, it is challenging to pinpoint the exact phases of transformation in India’s cricketing history.

  • How Data is Encoded and Problems with It: India’s cumulative wins, losses, and draws are encoded in a line chart, which is generally effective for showing trends over time. However, the overlapping lines representing wins, losses, and draws can make it difficult to distinguish between them, especially as the numbers grow and the lines converge or diverge.

  • Improvements: Enhancing this chart with distinct line styles—such as using different colors, dashed or dotted lines—could help in differentiating the categories more clearly, thereby improving readability and comprehension.
    image

  1. Rolling Averages of Win/Loss/Draw Percentages
    The rolling averages approach uses sets of 83 Tests to smooth out short-term fluctuations and reveal longer-term trends in win, loss, and draw percentages.
    image
  • Extent: This method covers the entire span of India's Test cricket history, from the first Test in 1932 to the 581st Test in 2024. By calculating the average outcomes for these rolling sets, the analysis aims to provide a more stable view of performance trends.

  • Gaps: However, this smoothing technique can obscure short-term variations and fluctuations, which might be critical for understanding specific transitions or pivotal moments in the team’s performance. The granularity of the rolling average might mask significant events or periods where rapid changes occurred.

  • How Data is Encoded and Problems with It: The rolling averages are depicted through a line chart with multiple lines representing different performance metrics. While this method provides a comprehensive view of the trends, the chart can become cluttered and difficult to interpret, especially with multiple overlapping lines.

  • Improvements: Using different line styles or markers can enhance clarity, making it easier to follow each performance metric separately. Additionally, segmenting the chart into smaller periods or using interactive features could help in isolating specific trends and making the data more digestible.
    image

  1. Home vs. Away Performance
    This comparative dataset presents a detailed look at India's Test cricket performance both at home and away over different decades, spanning from the pre-1990s era to the 2020s.
  • Extent: It captures the number of wins, losses, and win/loss ratios for both home and away matches, providing a dual perspective on how the team has performed in different environments.

  • Gaps: One of the gaps in this data is its lack of context regarding the reasons behind the variations in performance. For instance, changes in team strategy, player development, coaching, and conditions of play are not explored, which are essential for a comprehensive understanding of the shifts in win/loss ratios across different periods.

  • How Data is Encoded and Problems with It: The home versus away performance data is currently presented in a table format. While tables are straightforward for presenting exact numbers, they lack visual appeal and can be difficult to interpret quickly.

  • Improvements: The data is now represented using bar charts for wins and losses and a line chart for win/loss ratios, providing a clearer comparison between home and away performances across different decades.
    home and away winslosses

Conclusion: These in-depth observations, analyses and improvements contribute to a comprehensive redesign of the data story on the performance of The Indian Cricket Team in Test Cricket through history.

Thanks,
Arvind Gunasekaran
21f1001014

@srinivesh
Copy link

srinivesh commented Jul 28, 2024

Name: S R Srinivasan
Roll No: 21f1002966
Email: 21f1002966@ds.study.iitm.ac.in

By-polls: an indication of a new anti-incumbency

Original article: https://www.thehindu.com/data/by-polls-an-indication-of-a-new-anti-incumbency/article68413953.ece

What is the story the author is trying to tell?

• Soon after the results of the Lok Sabha (LS) elections were announced, bypolls to 13 Assembly Seats (AS) were held across seven states
• In 12 of the 13 seats, the contest was primarily between the NDA and I.N.D.I.A blocks; in the seat in Bihar, and independent candidate won
• The author takes a position that the change in the vote share between the two elections is an indication of a new anti-incumbency
• He uses a simple analysis to derive this data
It is said that in politics and economics, it is possible to write the conclusions and use the data to support the conclusions. My analysis provides a critique of the data story from this context.

What is the need for the analysis?

• The diversity of India poses a challenge in conducting quantitative and qualitative analysis on the people’s perception of the policies of the government
• There are frequent election – almost one in every 6 months. The results of the elections are often used as the proxy data
• In 12 of the 13 seats, the contest was primarily between the NDA and I.N.D.I.A blocks; in the seat in Bihar, and independent candidate wonThe author

Understanding the data that powers the story

What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

Type of Data:

Constituency-wise data published by ECI

Extent of Data:

The data spans a few months in 2024, covering LS elections and AS bye-elections. The geograhpical extent is poor since only 13 AS seats are being looked.

Dimensions of the Data:

Each basic row represents the performance of an alliance in the constituency. Most of the features are categorical, with the vote share features being numeric

Encoding of the Data:

Due to the limited number of features, the data is presented in the tabular form, without any visual encoding.

Essential Data:

The essential data includes the State (for easier reading), Name of AS constituency, Party (to distinguish between alliance constituents), Vote share in the two elections being considered. The derived data is the change in vote share percentage between the two elections. With the table itself being small, no data is irrelevant.

How is it encoded, what problems are with it, and how have you attempted to improve it?

Gaps in the Data:

This would be discussed more in later sections. The main gap is the low extent of the data. An extension to the previous LS and AS elections would provide a more representative analysis.

Table 1 | The table shows the NDA parties’ vote share in the 2024 Lok Sabha elections and the Assembly elections.

image

Source of image: https://www.thehindu.com/data/68414003-Chart-1-bypolls.svg

Table 2 | The table shows INDIA parties’s vote share in the 2024 Lok Sabha elections and the Assembly elections.

image

Source of image: https://www.thehindu.com/data/68414006-Chart-2-bypolls.svg

Conclusions by the Author

Based on the tables, the author concludes that “the overall trend that emerges from the bypoll results in 13 Assembly seats indicates a sharp decline in the BJP/NDA’s vote shares from the Lok Sabha polls held less than two months ago, as well as improvements for the INDIA parties.”

He further provides the following subjective conclusions (as quoted).


• Prime Minister Narendra Modi needs to introspect whether his government’s hubristic response to the 2024 Lok Sabha verdict, which resulted in a net loss of 63 seats for the BJP, is leading to the alienation of a significant section of the electorate.
• Political arrogance reflected in the continued persecution and vilification of the parliamentary opposition and wanton violation of human rights
• Pompous exaggerations of vacuous economic achievements and a deceptive denial of growing economic hardship and disparities; and
• Manipulative social re-engineering which undermines social justice and solidarity have become the hallmarks of the Modi regime.

The electorate seems to be losing its patience. “

Potential improvements using external data source

It is well known that the Indian electorate makes different choices between LS and AS elections. As an example, the state of Odisha holds simultaneous elections and the vote share of the parties has been different. To put this in a graphic way, the same elector pushes one button with her left hand (for the LS election) and a different button with her right hand (for the AS election). It is beyond the scope of this assignment to list the reasons for this. It suffices to say that the author himself acknowledges this fact in a previous article analysing the AS elections in 2023.

For data analysis, it is trivial to see that extending the series to more LS and AS elections would provide a more complete picture. I did this by adding a feature on the vote share in the previous assembly elections – this would have been over different years. Due to lack of time, I could not add AS wise results from the 2019 LS elections – this data is often found in individual state Election Commission pages.

I encoded the data as a grouped bar chart, with AS constituencies in the Y-axis. It Is easy to see that the trend is quite unclear over the 3 elections. The data is smal enough to be encoded as a simple table; I chose a more visual encoding.

3-election vote share data for NDA Bloc

https://public.flourish.studio/visualisation/18884689/
Vote Share across election for NDA Bloc

3-election vote share data for I.N.D.I.A Bloc

https://public.flourish.studio/visualisation/18884884/

3-election vote share for I N D I A Bloc

Partial conclusions from extended data

  • Comparing previous AS vote share with bypoll vote share does not give a clear trend
  • Many of the bypolls were necessiated due to defections and resignations; so even a comparison of previous AS and bypoll vote share is inadequate
  • Some states had a change in alliances; In Himachal previouse AS elections for the seats were won by independents
  • And there are many more reasons

No clear conclusion is posisble even from the extended data. Even if the improvements in the next section are addressed, the limited geographical extents makes it impossible to extrapolate a country wide, or even a a state wide trend from the data. In parrticular, there is no data to suppor a 'new anti-incumbency' as claimed in the title of data point analysis.

Further improvements in the analysis

  • Add the capability to use the state as a group
  • Add popup panels/text to explain the analysis for each AS constituency
  • Possibly visualize the analysis
  • Include the data from 2019 LS and possibly previous AS elections

@Souvikx2
Copy link

Name : Souvik Bhattacharjee
Roll number : 21f1003742
Link to Story: https://www.thehindu.com/data/neet-ug-2024-data-reveals-top-cities-for-high-scoring-candidates-crucial-for-government-medical-college-admissions/article68441411.ece

Main story:
The story talks about the distribution of selected candidates with score of 650 and above for NEET 2024, the general Medical entrance examination , among the Indian cities. The charts give an insights of how few cities hold the majority of the candidates who have excelled in the examination, further, it is also been used by students and parents alike to show the possibility of unfair means used in the exam.

Data used:
Type of Data: Quantative data used.
Extend of data: All Indian NEET 2024 candidate score.
Dimension of data: Candidate cities, Scores, test centers and states.
Gaps in data: No detail about retest takers, people who have not attended coaching, gender and income brackets.
Relevance: The data includes cities and score which in turn can be used to highlight the concentration of 650+ achievers.

Current Visual Representation:

image

image

Critics:
The colour coding used for the graph fades away into transparency makes it harder for the reader to see. It is also confusing that each dot infront of the state is a city unless hovered upon. The selection of the visual also fails to properly translate the importance of the story.
The visual also contains much redundant information. If the authors purpose was to highlight the cities with most number of 650+ scoring candidates, it should have been limited to the top cities only.

Redesign:
A donut chart would be used containing only the top cities. This will reduce the redundant information of the visual and highlight the number of cities which are directly contributing to the number of toppers.
image

@harshymehta14
Copy link

harshymehta14 commented Jul 28, 2024

Name - Harsh Y Mehta
Roll no - 21f1001295

Assignment-04

Title - Personal loans disbursed via digital apps have the highest share of overdue accounts Source Link
Published on - 04 July 2024
Author - Vignesh Radhakrishnan

Overview
Until the mid-2010s, banks lent massive loans to big industries. When these businesses failed, the bad loans went unnoticed until a 2015 RBI review revealed that 10% of loans were bad by 2017. Various recovery methods, including the Insolvency and Bankruptcy Code, 2016, were used to recover these loans, and the issue was widely publicized.

As a result, banks reduced loans to industries and recovered more bad loans, reaching a healthy state in 2024 with a decadal-low Gross Non-Performing Assets (GNPA). However, they shifted focus to retail loans, such as personal loans and credit cards, which grew significantly. Despite regulatory measures, the GNPA ratio for personal loans fell to 1.2% in March 2024.

The RBI's Financial Stability Report highlights concerns about rising slippages and high delinquency levels among small borrowers with personal loans below Rs. 50,000, especially from NBFC-Fintech lenders. These issues indicate potential future problems, with the RBI now worried about individuals rather than industries.

Glossary

  1. NPA (Non Performing Assets) - is the share of total loans that are overdue for more than 90 days.
  2. GNPA (Gross Non-Performing Assets) - represents the total amount of loans that are classified as non-performing without any deductions.
  3. NNPA (Net Non-Performing Assets) - derived by subtracting provisions (reserves set aside for potential losses) and recoveries from the GNPA.
  4. Slippages - are fresh additions of bad loans in a year.
  5. Delinquencies - minor crime.
  6. NBFC (Non Banking Finance Company) - is a financial institution that offers banking services such as loans and credit facilities but does not hold a banking license and cannot accept demand deposits from the public.

Full Forms

  1. PVBs (Private Sector Banks)- Example: HDFC Bank
  2. PSBs (Public sector Banks) - Example: SBI Bank
  3. SFBs (Small finance Banks) - Example: IDFC Bank

Key Takeaways

  1. In 2015, the Reserve Bank of India (RBI) carried out a review, following which skeletons tumbled out of the closet. The share of bad loans reached as high as 10% in 2017, which meant that nearly one in every 10 loans had turned bad.
  2. The latest Financial Stability Report (FSR) of the RBI shows that Gross Non-Performing Assets (GNPA) was at a decadal-low in March this year (2024).
  3. The GNPA ratio of personal loans has been reducing consistently reaching 1.2% in March 2024 — the lowest across sectors and within the segment (Agriculture, Industry, Services, Personal Loan).
  4. In FY24, slippages from retail loans (excluding home loans) formed 40% of fresh additions of NPAs.
  5. Delinquency levels among small borrowers with personal loans below Rs. 50,000 remain high.

Charts
Chart 1 | The chart shows the Gross non-performing Assets (GNPA) and NPA across years.
image

Chart 2 | The chart shows the GNPA (in %) across sectors.
image

Chart 3 | The chart shows the bank-type wise split of the share of slippages from retail loans in the overall new additions of NPAs. The chart excludes slippages in home loans. Slippages are fresh additions of bad loans in a year.
image

Chart 4 | The chart shows the bank type-wise delinquency levels for personal loans below Rs. 50,000.
image

Redesign Chart

Slippage& Delinquency across Banks type-wise
image
Disclaimer - The data in above chart is an approximation and may not be accurate,

The combined charts allow for easier comparison of related data in a single view. They save space and make the information clearer and more readable. This improves efficiency, making data analysis quicker and more straightforward.

@trishulam
Copy link

trishulam commented Jul 28, 2024

Name: N K Vamsi Krishna

Roll_No: 21f1003596

Story Overview:

The article discusses the significant shifts in voter support during the 2024 Assembly bypolls across 13 constituencies, highlighting a potential trend of anti-incumbency. The data indicates a notable decline in vote shares for the BJP-led NDA coalition and gains for the opposition INDIA bloc, suggesting a shift in voter sentiment that could influence future elections.

Source: The Hindu

Data Used:

Type of Data:

  1. Vote Share Data: Percentages of votes obtained by various parties (BJP, NDA, INDIA bloc) in both the 2024 Lok Sabha elections and the subsequent Assembly bypolls.
  2. Election Results: Specific seat outcomes for each party in the bypolls.
  3. Comparative Analysis: Changes in vote shares between the Lok Sabha and Assembly bypolls, focusing on the decline or increase in support for each party.

Extent and Dimensions:

  1. Temporal Extent: 2024 data from both the Lok Sabha elections and the Assembly bypolls.
  2. Geographical Extent: 13 Assembly constituencies across seven states in India.
  3. Metrics: Vote shares by party, changes in vote shares, election results by constituency.

Gaps in the Data:

  1. Regional Specificity: Data is limited to 13 constituencies, which may not represent broader regional or national trends.
  2. Detailed Local Context: Lack of detailed local issues or events that may have influenced voter behavior.
  3. Temporal Gaps: Only 2024 data is used, without historical comparison beyond the immediate election cycle.

Data Details:

  1. Essential Data: Vote shares by party, average gains/losses, seat outcomes.
  2. Irrelevant Data: Non-election related data or broad economic indicators not directly tied to the analysis.

Data Encoding:

  1. Tables and Charts: Original data presented in tables showing vote shares and seat outcomes.
  2. Narrative: Explains the methodology and findings.

Problems and Improvements:

Problems:

  1. Visualization: Original data is presented solely in tables, which does not effectively illustrate the trends in voter support changes.
  2. Engagement: Static tables lack the interactivity and visual impact needed to engage readers and highlight significant trends.

Improvements:

  1. Visualization: Incorporate charts and graphs to visually represent vote share changes and seat distribution, highlighting the percentage changes and relative differences over time.
  2. Interactivity: Use interactive visualizations to enhance reader engagement and understanding.

Original Visualizations:

Visualization 1:

image

Visualization 2:

image

Redesigned Visualizations:

Interactive Visualizations - Flourish

Visualization 1: Vote Share Gains and Losses in 2024 Assembly Bypolls

Visualization:
chart3

Explanation:
This bar chart compares the changes in vote shares for the NDA and INDIA alliances across 13 constituencies during the 2024 Assembly bypolls. Each bar represents the gain or loss in vote share percentage points for a constituency.

  • Left Side (NDA): The chart on the left shows the vote share gains and losses for the NDA alliance. Negative values indicate a loss in vote share compared to the 2024 Lok Sabha elections, while positive values indicate a gain.
  • Right Side (INDIA): The chart on the right shows the vote share gains and losses for the INDIA alliance. Positive values indicate a gain in vote share compared to the 2024 Lok Sabha elections, while negative values (though fewer in this case) indicate a loss.

The chart clearly illustrates the trend of declining support for the NDA and rising support for the INDIA alliance across most constituencies, highlighting significant shifts in voter sentiment.

Visualization 2: Seat Distribution in the 2024 Assembly Bypolls

Visualization:
Chart2

Explanation:
This donut chart shows the distribution of seats won by each party in the 2024 Assembly bypolls. Each segment of the donut represents the number of seats won by a party, providing a clear and immediate visual summary of the election results.

  • Congress: The Congress party won 4 seats, represented by the largest segment.
  • Trinamool Congress: The Trinamool Congress also won 4 seats, sharing the largest segment with Congress.
  • Dravida Munnetra Kazagam (DMK): The DMK won 1 seat.
  • Aam Aadmi Party (AAP): The AAP won 1 seat.
  • Bharatiya Janata Party (BJP): The BJP won 2 seats.
  • Independent: An independent candidate won 1 seat.

The chart effectively highlights the competitive nature of the bypolls, with seats distributed across multiple parties, indicating a diverse political landscape.

Conclusion

The redesigned visualizations provide a clearer and more engaging way to understand the data story of the 2024 Assembly bypolls. The vote share comparison bar chart highlights the significant shifts in voter support, while the seat distribution donut chart gives a quick overview of the outcomes. These visualizations help to communicate the broader narrative of potential anti-incumbency sentiment and its implications for future elections.

@abirChakrabortyIITM
Copy link

Name: Abir Subroto Chakraborty
Roll No: 21f2000280


Title: By-polls: an indication of a new anti-incumbency (Link)

Author: Prasenjit Bose


Based on the data provided in the images and the brief overview from the article, here’s the analysis:

1. What is the story the author is trying to tell?

The author is highlighting a significant shift in voter preferences, indicating a decline in the vote share for the BJP-led NDA in recent bypolls, while the opposition INDIA bloc has gained considerable ground. This shift is seen as a possible indication of growing anti-incumbency sentiment against the BJP.

2. What data is used to tell the story?

The data consists of vote share percentages for both the NDA and INDIA bloc parties in various constituencies during the 2024 Lok Sabha elections and the subsequent 2024 Assembly bypolls. The key elements include:

  • Type of data: Quantitative data on vote shares.
  • Extent of data: Covering 13 Assembly constituencies across seven states.
  • Dimensions of data: Vote share percentages for NDA and INDIA bloc, the gain or loss in vote share from the Lok Sabha elections to the bypolls.
  • Gaps in the data: The data does not provide specific reasons for the change in vote share, only the numerical differences.
  • Essential vs. irrelevant data: Essential data includes the vote share percentages and the gain/loss. Any anecdotal reasons for the shifts (e.g., local issues or electoral malpractices) are supplementary but not quantified in this data.

Details of the Data Tables Provided:

Table 1: NDA Performance

This table details the vote share of the NDA (BJP and JDU) in various Assembly Constituencies (ACs) across multiple states in the 2024 Lok Sabha (LS) elections and the subsequent 2024 Assembly (AS) bypolls.

  • Columns:

    • State: The name of the state.
    • AC Name: The specific Assembly Constituency name.
    • Party: The political party (BJP or JDU).
    • 2024 LS Election Vote Share (%): The percentage of votes received by the NDA in the 2024 LS elections.
    • 2024 AS Bypoll Vote Share (%): The percentage of votes received by the NDA in the 2024 AS bypolls.
    • Gain/Loss (% points): The change in vote share percentage points between the 2024 LS elections and the 2024 AS bypolls.
  • Key Observations:

    • Significant declines in vote share for the BJP in all listed constituencies.
    • The highest drop is in Maniktala, West Bengal, with a 26% point loss.
    • The least drop is in Amarwara, Madhya Pradesh, with a 3.1% point loss.

image

Table 2: INDIA Performance

This table presents the vote share of the INDIA bloc (including AITC, INC, AAP, VCK/DMK, and RJD) in the same constituencies and elections as Table 1.

  • Columns:

    • State: The name of the state.
    • AC Name: The specific Assembly Constituency name.
    • Party: The political party (AITC, INC, AAP, VCK/DMK, RJD).
    • 2024 LS Election Vote Share (%): The percentage of votes received by the INDIA bloc in the 2024 LS elections.
    • 2024 AS Bypoll Vote Share (%): The percentage of votes received by the INDIA bloc in the 2024 AS bypolls.
    • Gain/Loss (% points): The change in vote share percentage points between the 2024 LS elections and the 2024 AS bypolls.
  • Key Observations:

    • Significant gains for the INDIA bloc in almost all listed constituencies.
    • The highest gain is in Jalandhar West, Punjab, with a 44.2% point increase for AAP.
    • The lowest gain is in Amarwara, Madhya Pradesh, with a 2.5% point increase for INC.

The tables illustrate a notable shift in voter preference from the BJP-led NDA to the opposition INDIA bloc in recent bypolls compared to the Lok Sabha elections. The data indicates significant gains for the INDIA bloc across multiple states and constituencies, suggesting a growing anti-incumbency sentiment against the BJP.

image


3. How is it encoded, what problems are with it, and how have you attempted to improve it?

  • Encoding: The data is presented in tables with clear columns for vote shares in the 2024 LS election, 2024 AS bypoll, and the gain/loss.
  • Problems:
    • Potential bias in interpreting the reasons for vote share changes.
    • Lack of context on local factors influencing the vote changes.
    • Missing information on independent candidates or minor parties' impact.
  • Improvements:
    • Providing additional context or qualitative data to explain the vote share changes.
    • Including a more comprehensive analysis of local issues affecting each constituency.
    • Visual aids like charts or graphs could help illustrate the data trends more effectively.

In summary, the data underscores a notable shift in voter sentiment against the BJP, with the INDIA bloc gaining traction, pointing to possible national implications for future elections.


Conclusion by the Author

The recent bypoll outcomes signify a notable decline in support for the BJP-led NDA, with significant vote share losses across multiple states, contrasting the gains made by the opposition INDIA bloc. This trend suggests a potential shift in the national political landscape, possibly reflecting growing dissatisfaction with the BJP. The author emphasizes that while local factors may play a role, the overall decline in the BJP’s vote share across various constituencies points to a broader anti-incumbency sentiment. The opposition's gains indicate a possible change in voter mood, favoring the INDIA bloc in upcoming elections.

@Ashrey30
Copy link

Name: Ashrey
Roll No: 21f2000448

Article: A Green Wealth Tax in Budget 2024

Story the Author is trying to tell

The author is presenting the idea of a wealth tax-financed Indian Green Deal (IGD) that aims to address climate change, inequality, and unemployment. The story argues that the wealth tax on the Indian elite can fund a comprehensive green energy, infrastructure, and care economy program, ultimately generating millions of jobs and reducing carbon emissions.

Data Analysis

1. Per Capita Carbon Footprint (Chart 1):

  • Type: Quantitative
  • Extent: Top 10% of the Indian population vs. average Indian vs. first-world citizens
  • Dimensions: Per capita carbon emission

Screenshot 2024-07-28 172722

2. Expenses and Carbon Intensity of Commodities (Chart 2):

  • Type: Quantitative and Qualitative
  • Extent: Expenses incurred by the Indian elite and average Indian across different commodities
  • Dimensions: Expense ratio, Carbon intensity of commodities

Screenshot 2024-07-28 172733

3. Expenditure and Employment Generation (Chart 3a & 3b):

  • Type: Quantitative
  • Extent: Proposed spending on green energy, infrastructure, and care economy; employment generated
  • Dimensions: Percentage of GDP, number of jobs

Screenshot 2024-07-28 172751
Screenshot 2024-07-28 172802

4. Projected Wealth and Tax Rate (Chart 4a & 4b):

  • Type: Quantitative
  • Extent: Projected rise in wealth of the Indian elite and corresponding tax rate
  • Dimensions: Wealth (in Million crores), Tax rate (%)

Screenshot 2024-07-28 172815
Screenshot 2024-07-28 172824

Gaps in the Data

  • Lack of detailed breakdown of how specific programs within green energy, infrastructure, and care economy would be implemented.
  • No data on potential resistance or economic impact on the Indian elite due to the wealth tax.
  • Limited information on the current state of employment and carbon emissions to contextualize the impact.

Essential vs. Irrelevant Data

  • Essential: Carbon footprint data, expense ratios, proposed expenditure and employment generation data, projected wealth, and tax rate.
  • Irrelevant: Detailed breakdown of individual consumption habits not tied to carbon-intensive commodities.

Redesigned Visualizations

1. Per Capita Carbon Footprint

  • Original: Grouped bar chart comparing emissions.
  • Redesign: This chart compares the carbon footprints of the top 10% of the Indian population, the average Indian, and first-world citizens. The visual shows that the top 10% of Indians have a carbon footprint comparable to first-world citizens, highlighting the significant disparity in emissions within the country.

2. Expenses and Carbon Intensity

  • Original: Horizontal bar graph with expenses and carbon intensity.
  • Redesign: This bar chart illustrates the expense ratios of the Indian elite versus the average Indian across various commodities and the carbon intensity of those commodities. The chart clearly indicates that the higher expenses of the elite are concentrated in carbon-intensive commodities.

3. Projected Expenditure and Wealth Tax Rate

  • Original: Separate line charts for wealth and tax rate.
  • Redesign: This combined line & bar line chart depicts the projected rise in wealth of the Indian elite and the corresponding decline in the tax rate over time. The chart suggests that the proposed wealth tax rate could gradually decrease as the wealth base increases, making the tax sustainable in the long run.

4. Expenditure and Employment Generation

  • Original: Separate charts for expenses and employment.
  • Redesign: This irregular pie chart shows the proposed expenditures on green energy, infrastructure, and the care economy as percentages of GDP, while the line chart represents the total employment generated by these expenditures. The chart demonstrates the substantial job creation potential of the Indian Green Deal.

Final Redesign

DVD_GA5_1

@muskansindhu
Copy link

Redesigning the NEET-UG 2024 Data Story from The Hindu Data Point

Name: Muskan Sindhu
Roll No: 21f1003710

Original Article: Select “coaching hubs” are host to many high scoring NEET-UG-2024 candidates

Original Story

The original article highlights the cities and centers with the highest share of students scoring 650 or more in the NEET-UG 2024 exam. The main focus is on the exceptional performance in specific cities, particularly Sikar in Rajasthan, and the implications of these high scores for securing admissions in government medical colleges.

Story Analysis

The author aims to highlight the top-performing cities and centers in the NEET-UG 2024 exam and discuss the implications of these scores for medical college admissions. The data used includes quantitative data on NEET-UG scores, segmented by city and center, with percentages of students scoring above specific thresholds (650 and 700 marks). However, the article lacks historical comparison data and does not delve deeply into the reasons behind the high scores in specific centers. Essential data include scores by city and center, percentages of high scorers, and absolute numbers of high scorers.

Screenshot 2024-07-28 at 7 42 56 PM

fig 1.1. Original scatter plot showing the percentage share of students who scored over 650 marks by state.

Screenshot 2024-07-28 at 7 34 19 PM

fig 1.2. Table with centers that have the highest share of students scoring above 650 marks.

Visual Encoding and Improvements

The current visual encoding includes a table listing the top centers with the highest share of students scoring above 650 and a scatter plot showing the percentage share of students who scored over 650 marks by state. The scatter plot may be overwhelming due to the large number of data points, and the table lacks visual appeal and could be enhanced with better design elements. Additionally, the context and significance of the data points could be explained better. To improve this, I propose simplifying the scatter plot to focus on the top states and adding annotations for clarity. The table should be enhanced with visual elements like bar graphs to show comparisons more clearly. Providing historical comparison data would add context to the current year's results.

Screenshot 2024-07-28 at 8 04 06 PM

fig 2.1. Simplified scatter plot focusing on the top-performing states with the highest share of students scoring above 650 marks.

Screenshot 2024-07-28 at 8 02 47 PM

fig 2.2. Simplified bar chart showing the top centers with the highest share of students scoring 650 and above in NEET-UG 2024.

Redesign Process

To enhance the visualizations, I have focused on improving the scatter plot and the table visualization. For the scatter plot, I focused only on the top-performing states, using color coding to differentiate them and adding annotations to highlight significant data points. A clear legend was used to explain the color coding. For the table visualization, I converted it into a bar chart where each bar represents the percentage of students scoring above 650, with a secondary axis showing the absolute number of students scoring above 650. Color coding was used to differentiate between cities and centers.

Final Notes

The redesigned visualizations provide clearer insights into the distribution of high NEET-UG scores across various centers and states. By using annotations, clear legends, and contextual data, the visualizations aim to make the story more compelling and informative. This assignment helped me understand the importance of clear and effective data visualization, aiming to make the data more accessible and engaging for the audience, providing them with a better understanding of the NEET-UG 2024 results.

@Indu16910
Copy link

Indu16910 commented Jul 28, 2024

Name: Indumathi Kalla
roll no: ce22b062
Design Process Documentation
What is the story the author is trying to tell?
The author of the original article aims to provide a comprehensive and visually engaging breakdown of the Indian Budget 2024-2025. The story seeks to highlight the allocation of funds across various sectors and track the growth and changes in these allocations compared to previous years.

What data is used to tell the story?
Type of Data:

Quantitative data representing the budget allocations in monetary terms.
Percentage data showing the share of each sector in the total budget.
Comparative data from previous budgets to illustrate growth or decline.
Extent of the Data:

The entire budget for the fiscal year 2024-2025.
Historical budget data for trend analysis.
Dimensions of the Data:

Sector names.
Allocation amounts.
Percentage shares.
Year-on-year growth.
Gaps in the Data:

Specific details about sub-sector allocations might be missing.
Potential lack of granularity in showing the impact of allocations on outcomes.
Essential vs. Irrelevant Data:

Essential: Sector names, allocation amounts, percentage shares, historical data for comparison.
Irrelevant: Overly detailed sub-sector data that doesn’t contribute to the main narrative.
How is it encoded, what problems are with it, and how have you attempted to improve it?
Original Encoding:

The original article used heat maps to visually represent the data.
Problems with Heat Maps:
Clarity: Heat maps use color intensity to represent values, which can make it difficult to precisely identify the exact allocation amount for each sector.
Comparison: It's challenging to compare sectors directly using heat maps, as similar color shades can be hard to differentiate, especially for viewers with color vision deficiencies.
Labeling: Heat maps often lack clear labeling, making it necessary for viewers to refer to legends, which disrupts the flow of understanding the data.
Your Improvements:

Pie Chart for Industry Percentage:

Clarity: Redesigned the budget allocation visualization using a pie chart to show the percentage share of each industry in the total budget. This provides a clear and immediate understanding of the distribution.
Direct Comparison: Each sector's share is represented as a slice of the pie, making it easy to compare the sizes of different sectors directly.
Enhanced Labeling: Added data labels directly on the pie chart, ensuring that viewers can quickly grasp the percentages without needing to refer to a legend.
Stacked Growth Bars for Trend Visualization:

Trend Analysis: Used stacked growth bars to represent the changes in budget allocations over time. This helps in understanding both the individual and cumulative growth of different sectors.
Precise Information: Added data labels to each segment of the stacked bars, providing precise information on the amount and percentage of growth or decline.

Scope Expansion:
Included additional data from previous years to provide a more comprehensive view of trends and changes in budget allocations.
Scope Curtailment:
Focused on the most significant sectors to avoid overwhelming the viewer with too much information at once.
Visualizations
Original Visualization Samples:
https://www.thehindu.com/data/analysis-of-union-budget-2024-sector-wise-impact/article68446110.ece
Your Redesign Iterations:
Budget expenditure in FY25 (in crores)

Change in sectors share in total expenditure from FY24RE (% points)

Social welfare, MGNREGA and Samagra Shiksha 2017-24 (% of budget)

POSHAN, Old age pension, Widow pension scheme and Ayushman Bharat 2017-24 (% of budget)

Swasthya suraksha, MORTH, Telecom department and Power 2017-24 (% of budget)

AMRUT, smart cities and NHAI 2017-24 (% of budget)

Railway Ministry, Signalling   Telecom and Aviation Ministry

UDAAN and shipping ministry

FAME, PMAY-U and PMAY-R

Agriculture, PMFBY, PMKISAN 2017-24 (% of budget)

Space, Science and Tech, Space Technology, Space applications 2017-24 (% of budget)

Health, Rural development, Higher education, School education 2017-24 (% of budegt)

Defance 2017-24 (% of budget) (1)

Pie Chart
Stacked Growth Bar
These visualizations should be embedded inline with your documentation to provide a coherent narrative and a clear comparison between the original and redesigned versions.

@Afringowhar
Copy link

Name: Syed Afrin Gowhar
Roll No. : 21f2001140

Assignment: 2024 polls: How people in high and low income areas voted in Chennai’s Mylapore, T Nagar and other areas
Link: https://www.thehindu.com/data/2024-polls-how-people-in-high-and-low-income-areas-voted-in-chennais-mylapore-t-nagar-and-other-areas/article68427083.ece

1. Story and Data Understanding

Story Objective: Highlight how voting patterns in Chennai's Lok Sabha elections vary by income levels, with a specific focus on the DMK and BJP's vote shares.

Data Details: The dataset includes vote shares for the DMK and BJP across various areas in Chennai, categorized by income levels. This data is critical for analyzing the correlation between income levels and voting preferences.

2. Analysis and Visualization Plan

Original: From The Hindu data
image

A. Overall Trends and State-Wise Breakdown
Objective: : To illustrate the overall trends in vote shares of the DMK and BJP across different areas, with an emphasis on income levels.

Visualization 1:
image
This bar chart visualizes the vote shares of the DMK and BJP across various areas in Chennai, sorted by income levels (from high to low). The data illustrates the voting patterns, showing higher support for the BJP in wealthier areas and stronger support for the DMK in lower-income areas.

Visualization 2:
image
This line graph shows the overall vote shares of DMK and BJP across different areas, categorized by income levels. The graph illustrates that DMK's vote share tends to be higher in lower-income areas, while BJP's vote share is higher in wealthier areas.

B. Detailed Area-wise Analysis
Objective:
: To provide a detailed comparison of vote shares within specific areas, highlighting the socio-economic differences.

Visualization 1:
image
This heatmap illustrates the vote shares of the DMK and BJP across different areas in Chennai, sorted by income levels. The color gradient effectively highlights the areas of strong and weak support for each party, with darker shades indicating higher vote shares.

Visualization 2:
image
The scatter plot provided shows the distribution of vote shares for the DMK and BJP across various areas. The x-axis represents the DMK vote share percentage, while the y-axis represents the BJP vote share percentage. The data points are color-coded based on the income level of the areas, with blue dots representing high-income areas and orange crosses representing low-income areas.
This scatter plot visually demonstrates the correlation between income levels and party support, highlighting socio-economic divisions in voting patterns. The data suggests that the DMK's support base is stronger in lower-income areas, whereas the BJP has a stronger presence in higher-income areas. This insight could be useful for understanding voter demographics and tailoring campaign strategies accordingly.

Summary of the Redesign Process:

  • The original story aimed to show the correlation between income levels and voting patterns in Chennai.
  • Identified the data types, dimensions, and issues in the original visualization.
  • Created a bar chart, a heatmap, scatter plot and line graph to provide clearer and more intuitive visualizations, making it easier to compare the vote shares and understand the data.

@Harsehraab
Copy link

Harsehraab commented Jul 28, 2024

Name: Harsehraab Singh Sarao
Roll Number: 21f1000507

Original Article : On unemployment in Indian States
Link : https://www.thehindu.com/news/national/on-unemployment-in-indian-states/article68051708.ece

Story by the author:

  1. The author analyses the extent of unemployment in major states.
  2. Highlights the disparities in unemployment rates across different states.
  3. Goa has the 3 times the unemployment rate than the national average,
  4. Wealthy states like Kerala, Haryana, and Punjab seem to have higher unemployment rates.
  5. Rich western states like Gujarat and Maharashtra seem to have lower rates of unemployment.
  6. States with a higher proportion of self-employment have lower unemployment rates,
  7. Urbanisation seems to increase the rate of unemployment due to fewer informal job opportunities.
  8. Education also seems to play a role in increasing unemployment because well educated graduates want to work in adequately paying roles. The number of such job opportunities is scarce as compared to the number of candidates.

What data he/she is using to tell the story?

  1. The article is utilising data that considers individuals aged 15 and above.
  2. The data seems to be sourced from the Periodic Labour Force Survey (PLFS) of 2022-23.
Screenshot 2024-07-28 at 5 07 22 PM Screenshot 2024-07-28 at 5 08 01 PM Screenshot 2024-07-28 at 5 08 38 PM

Type of data:

  1. Quantitative data for unemployment rates across states of India.
  2. Quantitative data for self employment across the states of India
  3. Quantitative data for the well educated individuals across the states of India

Extent of the data:

  1. Data for the duration 2022-23.
  2. Data from all states across India.

Gaps in the data:

  1. No data for Union Territories.
  2. No data for industries present in various states, states with higher numbers of industries would be able to provide more job opportunities.

What data is essential:

  1. Unemployment rates
  2. Self-employment rates
  3. Educational levels.

Key Findings

  1. States with well educated individuals seem to have a higher rate of unemployement.
  2. Unemployment is less in areas where self employment is common practice.
  3. The highest rate of unemployment in the country is 10% for the state of Goa.
  4. Northern states seem to have a higher rate of unemployment; even wealthy states like Haryana, and Punjab also have high unemployment rates.
  5. Informal job opportunities seem to be diminishing as urbanisation increases

Original Encoding:

  1. Bar charts and line graphs to show unemployment rates and trends.
  2. Scatter plots to depict the relationship between self-employment and unemployment.

Problems:

  1. Lack of visual clarity in comparing states.
  2. Limited use of colour to differentiate data points.
  3. No interactive elements to explore data in depth.

Redesigning the Visualisation
Improvement Goals:

  1. Enhance visual clarity and comparison.
  2. Use colour effectively to highlight key data points.
Screenshot 2024-07-28 at 8 45 42 PM ata points.

@SOORYAKIRAN-B
Copy link

Name: SOORYAKIRAN B
Roll No.: 21f1003835

Unemployment remains a concern in India post-pandemic

Story the Author is Trying to Tell

The author aims to highlight the persistent issue of unemployment in India, especially in the aftermath of the COVID-19 pandemic, by illustrating how various individuals are struggling with joblessness and how the Labour Force Participation Rate (LFPR) and Unemployment Rate (UR) have changed over time.

Data utilised

Table 1: Labour Force Participation Rate (LFPR)

Table1

The LFPR in India has shown a significant decline post-pandemic, indicating that fewer people are either working or seeking employment.

  • Type: Annual LFPR percentages
  • Extent: From 2016-17 to 2022-23
  • Dimensions: Year, Total (T), Male (M), Female (F), Urban (U), Rural (R)
  • Gaps: No specific gaps; consistent yearly data.
  • Essential Data: LFPR percentages for each category.

Encoding and Problems:

  • Current Encoding: Tabular format with colour coding.
  • Problems: Hard to compare trends over time visually.
  • Improvement: Using a Bar chart to show trends more clearly.

Redesign of Table 1

Table 1_ Labour Force Participation Rate (LFPR)

Insights

Table 1: Labour Force Participation Rate (LFPR)

Insights:

  • Overall Decline: Consistent decline from 2016-17 to 2022-23 for all demographics.
  • Gender Disparity: Male rates significantly higher than female rates, with increasing gaps.
  • Urban vs. Rural: Higher LFPRs in rural areas, especially for men; consistently low female LFPR across both.
  • Pandemic Impact: Notable decline during 2020-21, less recovery afterward.

Table 2: Unemployment Rate (UR)

Table2

The unemployment rate in India has increased post-pandemic, with a noticeable difference between urban and rural areas and between genders.

  • Type: Annual UR percentages
  • Extent: From 2016-17 to 2022-23
  • Dimensions: Year, Total (T), Male (M), Female (F), Urban (U), Rural (R)
  • Gaps: No specific gaps; consistent yearly data.
  • Essential Data: UR percentages for each category.

**Encoding and Problems: **

  • Current Encoding: Tabular format with colour coding.
  • Problems: Difficult to see trends and compare different categories.
  • Improvement: Using a line chart to compare different categories and see trends.

Redesign of Table 2

Unemployment Rate (UR)

Insights

Table 2: Unemployment Rate (UR)

Insights:

  • Fluctuating Rates: Notable increases in certain years, highest female UR in 2016-17.
  • Gender Disparity: Females have consistently higher URs.
  • Urban vs. Rural: Higher URs in urban areas, especially for females.
  • Pandemic Impact: Increased during 2020-21, remains higher post-pandemic.

Table 3: LFPR and UR by Quarter

Table3

Both LFPR and UR show quarterly trends over the years, with noticeable fluctuations around the pandemic period.

  • Type: Quarterly LFPR and UR percentages
  • Extent: From September 2016 to September 2023
  • Dimensions: Quarter, LFPR (Total, Urban, Rural), UR (Total, Urban, Rural)
  • Gaps: No specific gaps; consistent quarterly data.
  • Essential Data: LFPR and UR percentages for each quarter.

Encoding and Problems:

  • Current Encoding: Tabular format with colour coding.
  • Problems: Hard to visualise quarterly trends and compare across years.
  • Improvement: Using a line chart to show LFPR and UR trends over quarters.

Redesign of Table 3

Tab3

Insights

Table 3: LFPR and UR for Quarters Ending in September

Insights:

  • Declining LFPR: Downward trend from September 2016 to September 2023.
  • UR Trends: Peaks in 2019 and 2023, slight recovery post-pandemic.
  • Urban vs. Rural: More uncertainty in urban areas.

Table 4: LFPR and UR by Month

Table4

Monthly trends in LFPR and UR, showing how these metrics change within a year.

  • Type: Monthly LFPR and UR percentages
  • Extent: From November 2019 to November 2023
  • Dimensions: Month, LFPR (Total, Urban, Rural), UR (Total, Urban, Rural)
  • Gaps: No specific gaps; consistent monthly data.
  • Essential Data: LFPR and UR percentages for each month.

Encoding and Problems:

  • Current Encoding: Tabular format with colour coding.
  • Problems: Difficult to track monthly changes and compare across years.
  • Improvement: Usage of a line chart to show monthly trends over the years.

Redesign of Table 4

Tab4

Insights from the story

Table 4: LFPR and UR for November Months

Insights:

  • Steady Decline in LFPR: From November 2019 to November 2023.
  • Increasing UR: Significant increase in November 2023.
  • Urban vs. Rural: Higher UR in urban areas in November 2023.

Insights from Story

The data reveals a concerning trend of decreasing labour force participation and persistently high unemployment rates, especially among females and in urban areas. The impact of the COVID-19 pandemic worsen these issues, and the recovery appears to be slow, indicating a need for targeted employment initiatives.

@Arshi81099
Copy link

Arshi81099 commented Jul 28, 2024

Maoist Setbacks in Chhattisgarh, 2024

Story the Author is Trying to Tell
The author is detailing the severe setbacks faced by Maoists in Chhattisgarh in 2024, highlighting that the insurgency is most intense in districts with poor development indicators. The report emphasizes the correlation between Maoist activity and underdeveloped regions, suggesting that socio-economic factors play a significant role in the insurgency.

Data Analysis

  1. Year-wise Deaths of Maoists in Chhattisgarh (Chart 1)
    Type: Quantitative
    Extent: Yearly data on Maoist deaths in Chhattisgarh
    Dimensions: Number of deaths
Screenshot 2024-07-28 at 8 07 32 PM
  1. Deaths of Civilians, Security Forces, and Maoists Over the Years (Chart 2)
    Type: Quantitative
    Extent: Yearly data on deaths of civilians, security forces, and Maoists
    Dimensions: Number of deaths
Screenshot 2024-07-28 at 8 12 06 PM
  1. District-wise Average of Maoist Deaths (Table 3)
    Type: Quantitative
    Extent: District-wise average of Maoist deaths every four years from 2001 to 2024
    Dimensions: Number of deaths
Screenshot 2024-07-28 at 8 09 08 PM
  1. District-wise Development and Welfare Indicators in Chhattisgarh (Table 4)
    Type: Quantitative and Qualitative
    Extent: District-wise development and welfare indicators
    Dimensions: Sanitation, Literacy, and other socio-economic indicators
Screenshot 2024-07-28 at 8 09 14 PM

Gaps in the Data

  1. Lack of detailed analysis on the effectiveness of security operations and strategies used.
  2. No data on the socio-economic impact on the local population due to the insurgency.
  3. Limited information on the long-term trends in insurgency and development indicators.

Essential vs. Irrelevant Data
Essential: Year-wise data on deaths of Maoists, civilians, and security forces; district-wise development indicators; district-wise average of Maoist deaths.
Irrelevant: Detailed operational tactics used by security forces without context to broader trends.
Redesigned Visualizations

1. Year-wise Deaths of Maoists in Chhattisgarh
Original: Line chart showing yearly Maoist deaths.
Redesign: This line chart displays the number of Maoist deaths in Chhattisgarh from 2001 to 2024. The visual highlights significant
Screenshot 2024-07-28 at 10 00 52 PM
years such as 2009 and 2024, showing peaks in casualties.

2. Deaths of Civilians, Security Forces, and Maoists Over the Years
Original: Bar chart comparing deaths of civilians, security forces, and Maoists over the years.
Redesign: This stacked bar chart compares the deaths of civilians, security forces, and Maoists from 2001 to 2024. It shows the decreasing trend in security force casualties and the varying trends in Maoist and civilian deaths.
Screenshot 2024-07-28 at 10 01 18 PM

3. District-wise Average of Maoist Deaths
Original: Table listing district-wise average of Maoist deaths.
Redesign: This heatmap shows the average number of Maoist deaths in Chhattisgarh districts every four years from 2001 to 2024. Districts with higher averages are highlighted to indicate hotspots of insurgency activity.
Screenshot 2024-07-28 at 10 02 29 PM

4. District-wise Development and Welfare Indicators
Original: Table showing district-wise development indicators.
Redesign: This combined bar and line chart visualizes district-wise development and welfare indicators such as sanitation and literacy rates. The chart compares these indicators with the average number of Maoist deaths to highlight the correlation between poor development and higher insurgency activity.
Screenshot 2024-07-28 at 10 03 21 PM

Summary of the Article:
In 2024, Maoist insurgents in Chhattisgarh faced severe setbacks, with 141 of the 162 Maoist deaths in India occurring in the state. This marks the highest casualties since 2009's 'Operation Green Hunt.' The return of the Bharatiya Janata Party (BJP) to power in December 2023 coincides with this spike. Bijapur district saw the most clashes, resulting in 74 Maoist deaths. Despite these setbacks, the insurgency persists, particularly in poorly developed, forested areas. The data shows a correlation between intense insurgency and poor development indicators such as sanitation and literacy.

Final Redesign
The redesigned visualizations aim to provide a clearer and more comprehensive understanding of the data, emphasizing the correlation between poor development indicators and higher Maoist activity. The use of line charts, stacked bar charts, heatmaps, and combined bar-line charts enhances the clarity and impact of the data, making it more accessible and insightful for the audience.
By following these steps and documenting the design process, the redesign effectively communicates the key insights and trends related to the Maoist insurgency in Chhattisgarh, emphasizing the importance of development in mitigating insurgency.

Regards,
Name: Arshi Khan
Roll No: 21f3002806

@SriNandhiniThiyagarajan

Analysis of Union Budget 2024: Sector-Wise Impact

Original Story Analysis:

The article aims to provide a detailed analysis of the impact of the Union Budget 2024 on various sectors of the economy. The story is centered on how different sectors are affected by the budgetary allocations and reforms proposed by the government. The author seeks to highlight which sectors have received increased funding, which have seen cuts, and the overall implications of these changes for the economic landscape.

Key Points of the Story:

  • Sector Allocation: How funds are distributed across different sectors such as healthcare, education, infrastructure, etc.
  • Impact Assessment: The effect of these allocations on the respective sectors and on the broader economy.
  • Comparative Analysis: Comparing the budgetary allocations of the current year with previous years to show changes and trends.

image

Data Details:

Type of Data:

  • Quantitative Data: Budget allocations for different sectors in numerical form (e.g., amount in crores or billions).
  • Comparative Data: Previous years’ allocations for the same sectors for trend analysis.

Extent of the Data:

  • Current Year’s Budget: Specific numbers for the 2024 budget.
  • Historical Data: Allocations from previous years for comparative purposes.

Dimensions of the Data:

  • Sectors: Healthcare, education, infrastructure, defense, etc.
  • Time Dimension: Annual allocations over a period (e.g., 3-5 years).
  • Geographical Dimension: If data includes state-wise or regional breakdowns.

Gaps in the Data:

  • Detailed Breakdown: Lack of granularity in how funds are distributed within each sector.
  • Impact Assessment: Absence of direct measures of the impact of these allocations on sectoral outcomes.

Essential Data:

  • Budget Figures: Exact allocation amounts for each sector.
  • Historical Comparisons: Previous budget allocations for trend analysis.

Original Encoding Analysis:

Visualization: The original story likely uses bar charts or stacked bar charts to show the allocation across sectors, and perhaps line graphs to show changes over time.

Problems with Original Encoding:

  • Clarity: If the visualizations are cluttered or too complex, they can be difficult to interpret at a glance.
  • Comparative Analysis: If there are no side-by-side comparisons, it can be hard to see the differences and trends over time.
  • Detail: Lack of detailed breakdown within sectors or the overall impact is missing.

image
image
image

Improvements:

Enhanced Visualizations:

  • Bar Charts: Use grouped bar charts to compare sectoral allocations for the current year vs. previous years side-by-side. This provides a clear visual comparison.
  • Stacked Bar Charts: For a more detailed view, stacked bar charts can show how different categories within each sector are funded.

Trend Analysis:

  • Line Graphs: Use line graphs to show trends over time for each sector. This helps in visualizing the increase or decrease in allocations over the years.

Impact Visualization:

  • Heatmaps: Use heatmaps to show the intensity of funding changes across sectors. This can highlight which sectors have received the most significant changes.
  • Pie Charts: For a snapshot of the current budget distribution, pie charts can show the proportion of total budget allocated to each sector.

Grouped Bar Chart
The grouped bar chart will show the budget allocations for each sector side-by-side for the years 2022, 2023, and 2024.
image

Stacked Bar Chart
The stacked bar chart will show the budget allocations for each sector stacked on top of each other for the years 2022, 2023, and 2024.
image

Heat Map:
To show the intensity of funding changes across sectors.
image

Pie Chart:
To show the proportion of the total budget allocated to each sector for 2024.
image

Conclusion:

The redesign of the Union Budget 2024 analysis enhances clarity and interpretability through improved visualizations such as grouped bar charts, stacked bar charts, heatmaps, and pie charts. These new visualizations provide clearer comparisons and trends across sectors, making the data more accessible and informative.

Regards,
Name: SriNandhini T
Roll No: 21f2001390

@NikitaSharma1
Copy link

Name: Nikita Sharma
Roll Number: 21f1000637

2024 polls: How people in high and low income areas voted in Chennai’s Mylapore, T Nagar and other areas

Objective:

The article explores the voting patterns in Chennai, focusing on how wealth/income levels affect party preferences, specifically for DMK and BJP. It analyzes voting data across different streets/areas with varying income levels.

Data Used

  1. Polling Stations and Areas:

    • Lists all polling stations and the corresponding areas that voted in those stations for the 2024 LS polls.
    • Example: Voters from Choolaimedu’s Rajeevgandhi Nagar voted in the primary school in Namachivayapuram.
  2. Form-20 Data:

    • Party-wise votes polled in each polling station.
    • Example: In a particular station, DMK secured 57% of 260 valid votes, BJP secured 12.7%.
  3. Guideline Value Data:

    • Reflects the minimum value at which a property can be registered, serving as a proxy for wealth/income.
    • Example: Guideline values range from ₹1,100/sq ft (low-income) to over ₹10,000/sq ft (high-income).

Visual Encoding in Original Visualization

image

  • Horizontal Bar Chart:
    • X-axis: Vote share percentage.
    • Y-axis: Streets/areas ordered from high to low guideline values.
    • Red Dots: DMK’s vote share.
    • Blue Dots: BJP’s vote share.
  • Key Observations:
    • DMK’s vote share increases in lower-income areas.
    • BJP’s vote share increases in higher-income areas.

Design Process and Improvement

  1. Understanding the Story:

    • The story highlights the correlation between income levels and voting patterns for DMK and BJP in Chennai.
    • The key message is the differing vote shares in areas of varying wealth.
  2. Data Description:

    • Type of Data: Quantitative (vote shares) and categorical (streets/areas with income levels).
    • Extent of Data: Covers multiple streets/areas within Chennai.
    • Dimensions: Income level (high to low), vote share for DMK and BJP.
    • Gaps: No explicit mention of change in the income of the areas within the visualisation.
    • Essential Data: Vote shares, guideline values of streets/areas.
    • Irrelevant Data: The range of the vote share for a area since it might confuse the person looking at the visualisation.
  3. Encoding Issues and Improvements:

    • Current Encoding:
      • Effective in showing vote shares but lacks clarity on income levels.
      • Legend is missing in visualization.
    • Improvements:
      • Use a line chart to incorporate the pattern properly.
      • Adding guideline value as the axis value instead of street names.
      • Include hover functionality for precise vote share values.

Redesign Proposal

  1. Visualization Type:
    • Dual-Axis Line Chart:
      • X-axis: Guideline values (income levels).
      • Y-axis : Vote share percentage.
      • Lines with different patterns (e.g., solid for DMK, dashed for BJP).
  2. Additional Features:
  • Hover Tooltips: Show exact vote shares and guideline values with the area name.
  • Voter Turnout: Include turnout percentage for more insight.

Iterations:

Step 1: Creating the base line chart with the vote share percentage.

image

Step 2: Adding gridlines and helpful insights

image

Step 3: Including hover tooltips and improving the layout

image

Conclusion

The redesign aims to provide a clearer and more accessible visualization of voting patterns based on income levels, enhancing the reader’s ability to understand the correlation and key insights.

@21f1005544
Copy link

Name: John Joshi Alapatt
Roll No: 21f1005544

Story Objective

The author aims to demonstrate how Lionel Messi's performance in the 2022 FIFA World Cup was exceptional, both compared to other players and his own previous performances in the 2014 and 2018 World Cups. The data shows his superiority in several key metrics, particularly highlighting his dual role as a playmaker and a striker.

Data Analysis

Chart 1
Shots Attempted vs. Shots on Target Percentage
Data: Number of shots attempted and percentage of shots on target.
Key Insight: Messi and Mbappé attempted the most shots, with Messi being more accurate.

chart1

Chart 2
Key Passes vs. Passes Completed into the 18-yard Box
Data: Number of key passes and passes completed into the 18-yard box.
Key Insight: Messi leads in both metrics, showcasing his playmaking abilities.

chart2

Chart 3
Touches in Attacking Third and Penalty Area vs. Successful Dribbles
Data: Number of touches in the attacking third/penalty area and successful dribbles.
Key Insight: Messi and Mbappé are significantly ahead in both metrics.

chart3

Chart 4
Radar Chart Comparison of Messi and Mbappé
Data: Goals and assists, shots on target, key passes, passes into the 18-yard box.
Key Insight: Messi’s balanced performance as both a midfielder and a forward compared to Mbappé.

chart4

Chart 5
Messi’s Performance Over Three World Cups
Data: Goals per game, assists per game, shots on target per game, key passes per game.
Key Insight: Messi's performance in 2022 is superior compared to 2014 and 2018.

chart5

Improvements

Chart 1
Goal: Highlight Messi and Mbappé, showing their shots attempted vs. shots on target percentage.
Changes: Use distinct colors or markers for Messi and Mbappé, add data labels.
Impr_chart1

Chart 2
Goal: Clearly show Messi's superiority in key passes and passes completed into the 18-yard box.
Changes: Add data labels for key players, use color to differentiate players.
Impr_chart2

Chart 4
Goal: Compare Messi and Mbappé's performances in multiple dimensions.
Changes: The bar chart will allow a straightforward comparison of the individual metrics for each player side by side.
alternative_chart4

Chart 5
Goal: Show Messi’s performance improvement over the three World Cups.
Changes: Using a grouped bar chart and highlighting each metric in different colors allows easier comparison
Impr_chart5

Conclusion

The redesigned charts helps in conveying the story clearly. The data underscores Messi's exceptional precision and scoring ability, reinforcing his status as one of the most proficient and reliable goal scorers in the sport.

@vpleaides8
Copy link

Name: Kruttika Milind Soni
Roll no.: 21f1001029

In 2024, Maoists suffer severe setbacks in Chhattisgarh

The aim of this article https://www.thehindu.com/data/in-2024-maoists-suffer-severe-setbacks-in-chhattisgarh/article68395649.ece
was to show how the maoist movement in Chhattisgarh was stalled in 2024 as compared to previous years. Additionally, some impacts of the maoist insurgency were discussed by comparing more affected districts with less affected ones.

Chart 1

Type of data: Number of Maoist deaths
Extent of data: yearly data from 2000 to 2024
Dimension of data: years on x axis and no.of deaths on y axis
Gaps in data: Doesn’t show data about which operations led to the deaths.

Encoding

Number of deaths are represented clearly with a line with points signifying the values. Continuity is met.
Problems: There are less gridlines which makes it hard to identify what the values are.
Improvements: Maoist deaths can be represented in red as a form of semantic encoding.
Screenshot 2024-07-28 at 3 55 18 PM

Chart 2

Type of data: Number of deaths of civilians, security forces, insurgents
Extent of data: yearly data from 2000 to 2024
Dimension of data: years on x axis and no.of deaths on y axis.
Gaps in data : Doesn’t show operations and attacks data
Essential and irrelevant data: number of deaths is essential data. Civilian deaths is not referenced again so might be irrelevant overall as it only shows maoist activity.

Encoding

Red graph for maoist deaths is good semantic encoding. 3 different graphs show different magnitudes.
Problems: multiple bar charts make it hard to compare the 3 categories
Improvements: a grouped line graph can show the trend of increasing maoist deaths and decreasing security force casualties. The data for maoist deaths was represented for the second time here after Chart 1.
Screenshot 2024-07-28 at 3 55 29 PM

Table 3

Type of data: average number of Maoist deaths
Extent of data: geographical extent of Chhattisgarh and 4 yearly data from 2001 to 2024
Dimension of data: district wise data with four yearly measurements of average deaths.
Gaps in data: Some districts had their names changed.
Essential and irrelevant data: districts with 0 Maoist deaths overall are irrelevant.

Encoding

Red for important districts pulled attention to high deaths in some districts. The year wise data showed change in numbers over the years.
Problems: It was hard to parse the difference between districts and deadliest years in the table format.
Improvements: Heatmap to represent worst districts and years. Uses red semantic encoding for maoist deaths.
Screenshot 2024-07-28 at 4 35 36 PM

Table 4

Type of data: percentage of people of a certain group falling in the particular developmental parameters.
Extent of data: geographical extent of Chhattisgarh and its population.
Dimension of data: District wise data for following parameters : Population using improved sanitation facility, women with over 9 years of schooling, stunted children under 5 years of age and if districts are impacted by maoists.
Gaps in data: It is not clear what year this data is collected for, gaps arising due to changes in district names and boundaries.
Essential and irrelevant data: Essential developmental parameters are checked like sanitation, education and nutrition.

Encoding

Not much encoding done here except red coloured cells showing worst affected districts. Red as a colour calls attention quickly on light backgrounds.
Problems: The worst affected districts are at the top but the difference between maoist affected and limited impact districts is not visible. It is also not possible to tell where in the developmental parameters should higher percentages signify more development.
Improvements: Bifurcation of maoist affected and limited impact districts. Visualisation of the developmental parameters for comparison between districts.

Screenshot 2024-07-28 at 4 35 44 PM

Redesign

Iteration 1

Decided Chart 1 was a good introduction to the article. Planned to convert Table 4 into a choropleth map visualisation with different maps for each parameter.

Iteration 2

Developed the heatmap for Table 3. Annotated and changed colour of chart 1.

Iteration 3

Scatter plot for Chart 4 with Districts represented by dots and sanitation and women's education factors used as axes. Size of point determined by child nutrition(stunting).

Untitled29_20240728230150

The final redesigned visualisations for the tables are:

Table 3

Districtwise avg Maoist Deaths

Table 4

DW development in Chhattisgarh
Interactive visualisation: https://public.flourish.studio/visualisation/18885572/
This scatter plot shows the comparison between affected and limited impact districts where we can see the trend of affected districts performing worse that those with a limited impact.

Other data that can be used in this article includes:

  1. Data about maoist attacks and periods of high conflict
  2. Government interventions and operations to combat maoists in the area that led to maoist killings
  3. Maoist surrenders and other related politics which may lead to drop in maoist influence.
  4. Forested districts where maoist influence is higher.

Software used: Flourish, Excel, matplotlib, folium

@Jigyasa2408
Copy link

Jigyasa2408 commented Jul 28, 2024

Name - Jigyasa
Roll No - 21f1001644

The Story - https://www.thehindu.com/data/diseases-with-higher-burden-in-asia-and-africa-lack-research-funding-data/article68319946.ece

  1. What is the story the author is trying to tell?
    The author wants to draw attention to the stark difference in research funding between neglected tropical diseases (NTDs) and illnesses like HIV/AIDS, TB, and malaria. The poorest populations are particularly affected by NTDs, which are highly prevalent in tropical and subtropical countries and receive relatively less money for research and development.

Key points :

  1. Funding Disparities: There is a significant funding gap between high-profile diseases (e.g., HIV/AIDS, tuberculosis, malaria) and neglected tropical diseases (NTDs), with NTDs receiving much less funding.
  2. Global Burden of NTDs: NTDs affect millions of people worldwide, primarily in poorer regions such as Asia and Africa, with India having the highest number of affected individuals.
  3. Underfunded Research: Despite the large number of people affected by NTDs, global research and development funding for these diseases is minimal compared to other diseases.
  4. Historical Funding Trends: Research funding has fluctuated over the years, with significant increases during events like the COVID-19 pandemic, but overall, NTDs remain underfunded.
  5. Call to Action: There is a need for increased funding and research to address the significant health burden imposed by NTDs and improve the lives of those affected.

The Data :

Type of data :
Funding Data: Annual research & development funding for neglected tropical diseases, 2022. This data is expressed in US
dollars, adjusted for inflation.
Burden Data: Estimated number of people requiring treatment against neglected tropical diseases, 2021
Technological Focus: Distribution of global research funding across different technologies (vaccines, drugs, diagnostics,
basic research) over time.

Extent of the data:
Funding data for multiple diseases.
Population data for countries requiring treatment.
Historical funding trends from 2007 to 2022.

Dimensions of the data:
Disease type.
Amount of funding.
Geographic distribution.
Time series for technological funding.

Gaps in the data:
No specific information on individual NTDs' funding distribution.
Lack of detailed burden metrics beyond population numbers (e.g., mortality rates, disability-adjusted life years).
Funding sources and allocations by different countries or organizations are not specified.

What data is essential?
Disease type.
Amount of funding.

Analyzing and Improving the Visual Encodings :

Original Visualizations
Bar Chart: Showing annual research funding for various diseases in 2022.
Map: Displaying the number of people requiring treatment for NTDs by country in 2021.
Line Chart: Illustrating funding trends for different technological focuses from 2007 to 2022.

Problems and Improvements
Bar Chart:
Problem: The wide range of funding amounts creates a visual disparity that makes it hard to see funding for lesser-funded
diseases.
image
Improvement: Use of logarithmic scale or annotations to better compare smaller funding amounts like shown below.
Improved chart
image

Map:
Problem: Including markers or labels for countries with the highest burdens will help the viewers to interact better.
Improvement: (https://ourworldindata.org/grapher/interventions-ntds-sdgs). This link here shows an interactive map
which highlights the names of the countries while hovering over it.
Improved map
image

@pranaydeep139
Copy link

Name: Sakiley Pranay Deep
Roll Number: 21F1005603

Article: Which topics are India's researchers publishing papers on?

Story the author is trying to tell:

The author is trying to convey the story of the prevailing research trends in scientific and technological fields based on publications in the Web of Science database, and comparing India's most researched topics with those from other developed nations like USA and China through visualizations. The article showcases the global scientific community's focus on topics such as coronavirus, artificial intelligence, clean energy, and nanotechnology, and how different countries prioritize and contribute to research in these areas. The author also seems to be aiming to show how research trends can guide policy decisions and resource allocation, highlighting the importance of these researches in addressing global challenges and advancing technological progress.

Screenshot 2024-07-28 190123

Screenshot 2024-07-28 190216

Key insights from the data:

Research focus shift in India: There has been some shift in research focus in India in recent years (2019-2023) compared to the long-term (2004-2023). In recent years, there has been a focus on coronavirus research and nanotechnology (nano fluids and silver nanoparticles), whereas the long-term focus has been on nanotechnology again and wireless sensor networks.

Focus on Corona virus research: Corona virus research has been a global focus in recent years, as it is the most published topic in USA and India according to the data for 2019-2023. It is worth noting that Corona virus has not been in China's five most researched topics.

Deep Learning prominence: Deep learning is a rising area of research prominence in all three countries, with China having the highest number of publications in this field.

China's strength in material science: China has a consistent focus on material science, as evidenced by their high publication rates in photocatalysis and supercapacitors in both recent and long-term datasets.

USA's focus shift in recent years: The USA has also shown a shift in research focus in recent years, moving away from long-term focus areas like HIV, parenting, and galaxies to focus on coronavirus research and deep learning.

Data used to tell the story:

Type of data: This is quantitative data (ratio data), focusing on the volume of research paper outputs. It consists of the number of published research papers categorized by topics in India, the USA, and China.

Extent of the data:
Temporal coverage: The data covers two time periods: 2019-2023 and 2004-2023.
Geographic coverage: The data is specific to three countries: India, the USA, and China.

Dimensions of the data:
Country: India, USA, China.
Time Periods: 2019-2023, 2004-2023.
Topics: Specific research topics under which the highest number of papers were published.

Gaps in the data:
Limited scope of topics: Only the top five topics are listed for each country, which may not fully represent the diversity of research areas.

Essential data:
Number of papers: The exact count of research papers published on each topic.
Topic names: Specific areas of research focus.

Irrelevant data: The data used in this story is concise and specific. Hence there is no irrelevant data.

Analysis of the original encoding:

In each visualization, three bar charts have been placed next to each other for comparison (one representing each country) that show the count of research papers published in the top five research fields.

Problems with this encoding: The major problem with this visualization is inconsistent scaling across bar charts of different countries. For example, India's Corona virus that has 12629 publications has a bar of smaller length than USA's Gut microbiota that has 12435 publications, which can be misleading.

Improvement proposal for the redesign: This encoding can be redesigned in a better way by using a single bar graph with precise scaling for all three countries for each timeline, with different countries represented in different colours.

Screenshot 2024-07-28 232152

Screenshot 2024-07-28 232217

Thank you!

@Sa-N98
Copy link

Sa-N98 commented Jul 28, 2024

Name: Saranya Nayak
Roll Number: 21f1005767

Which topics are India’s researchers publishing papers on?

Source: https://www.thehindu.com/data/which-topics-are-indias-researchers-publishing-papers-on/article68410121.ece

image
image

What is the story the author is trying to tell?

The author highlights the research focus trends in India and globally over the last two decades, with a specific emphasis on the last five years. The story reveals that while coronavirus remains a predominant research topic worldwide, India's researchers are also significantly contributing to deep learning, photocatalysis, and nanotechnology. The article contrasts India's concentrated efforts in nanotechnology, partly driven by the Nano Mission, with China's focus on high-impact technological fields and the U.S.'s diverse research interests, particularly in health and social well-being

What data he/she is using to tell the story?

The author uses data from the Web of Science, a scholarly publication database, to analyze research trends over the last 20 years and the last five years. This data includes the number of published papers on various topics by researchers from different countries, allowing for a comparative study of the most researched topics globally and within specific nations such as India, the U.S., and China. The article also references specific research outputs and projects, like India's Nano Mission, to illustrate the focus areas and the volume of research in these fields.

###What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

Type of Data:
The data used in the article primarily consists of bibliometric information from the Web of Science database. This includes:

  1. Publication Counts: Number of research papers published on specific topics.
  2. Research Topics: Specific subjects that are the focus of these papers, such as coronavirus, deep learning, photocatalysis, nanotechnology, etc.
  3. Geographical Information: Country-specific data indicating the research output from India, the U.S., China, and other selected nations.
    Extent of the Data:
    The extent of the data covers:
  4. Time Span: Research trends over the last 20 years and a focused look at the last five years.
  5. Countries: Comparative analysis of research output from multiple countries.
  6. Research Areas: Different scientific and technological fields, from health and AI to energy and materials science.
    Dimensions of the Data:
  7. Temporal Dimension: Distribution of research publications over time.
  8. Geographical Dimension: Distribution across different countries.
  9. Topical Dimension: Focus areas and specific research topics.
    Gaps in the Data:
  10. Detail on Methods: Lack of detailed methodological explanation on how the data was collected and analyzed.
  11. Incomplete Charts: Mention of potentially incomplete charts, affecting the clarity of the visual data representation.

Essential vs. Irrelevant Data:

Essential Data:
• Number of publications on key research topics.
• Country-specific research focus and output.
• Trends over different time periods (last 20 years vs. last five years).
Irrelevant Data:
• Specific names of researchers and their affiliations (unless discussing the impact of individual contributions).
• Photos or unrelated visuals that do not add to the understanding of research trends.

How is it encoded, what problems are with it, and how have you attempted to improve it?

Encoding:
The data is primarily encoded in textual descriptions and charts. The textual data includes numerical counts and comparisons between different countries and topics.
Dividing the data into two time frames was a good decision by the author as the last 5 year data are biased towards coronavirus.

Problems with the Data:
While the data displayed in the original graph contains all the info, it failed to give a clear comparison about the amount of research being done in india and other countries and how popular each top topics are in other countries.

For example: in case of Deep learning one has to look at different parts of the graph to come to an conclusion. As deep learning positions are not in the same place in the graph.
Can’t determine total amount of research being done in each country

Solution: Group similar topics and countries data and visualize using proper chart.

Design Iterations :

image

Iterations 1: Used Treemap to group the data and first by countries and then by topic.
Benefits: Indicates clearly the proportion of how much research each countries are giving in different areas.
Drawbacks: Hard to compare the Research topics of different countries.

image

Iterations 2 : Stacked bar chart does help in visualizing the and comparing the total number of research paper and total number of research don by topic
Drawbacks: Hard to navigate topic color coding.

image

Iterations 3: Clustered Bar Chart helps in grouping the data in terms of country and topic. One can easily comapire the amount of effort given to different research areas in different countries.
Drawbacks: This graph looses total amount of research being done in a country.

Final:

image

image

@varunbalaji1303
Copy link

Name: Varun Balaji
Roll No: 21f1005027

Title: "2024 polls: How people in high and low income areas voted in Chennai’s Mylapore, T Nagar and other areas"
Link to story: https://www.thehindu.com/data/2024-polls-how-people-in-high-and-low-income-areas-voted-in-chennais-mylapore-t-nagar-and-other-areas/article68427083.ece

Objective:

The article examines voting patterns in Chennai's 2024 Lok Sabha elections, comparing high and low-income areas to understand if income levels influenced voting behaviour.

Main Points:

  • DMK holds a strong vote share among urban poor.
  • BJP garners more votes in wealthier areas.

Analyzing the Data:

Types of Data:

  • Polling Booth Data: Specific polling stations and associated streets.
  • Vote Share Data: Party-wise vote percentages at each polling station.
  • Guideline Values: Property guideline values as proxies for income levels.

Extent of Data:

  • Covers all polling stations in Chennai’s North, Central, and South constituencies.
  • Includes streets/areas with varying property values from ₹1,100/sq ft to over ₹10,000/sq ft.

Data Dimensions:

  • Geographical (streets and areas)
  • Economical (guideline property values)
  • Political (vote shares of DMK and BJP)

Gaps and Relevance:

  • Essential Data: Guideline values, vote shares
  • Irrelevant Data: Detailed personal information of voters (not provided)

Visual Encoding:

Current Encoding: Scatter plots showing vote shares (DMK in red, BJP in blue) across streets with varying guideline values.

Problems Identified:

  • Cluttered labels may be hard to read.
  • Color choices might not be color-blind friendly.
  • Limited interaction to explore data points.

Original Graph 1:
Screenshot 2024-07-28 at 10 48 56 PM

Original Graph 2:
Screenshot 2024-07-28 at 10 48 45 PM

Proposed Improvements:

Redesign Strategy:

  • Simplify Labels: Use abbreviated names or hover tooltips to reduce clutter.
  • Color-Blind Friendly Palette: Use colors distinguishable by those with color vision deficiencies.
  • Interactive Elements: Implement interactive visualizations where users can hover over data points for detailed information.
  • Additional Datasets: Integrate demographic data to enrich the analysis.

Design Process Documentation:

- Step 1: Initial Analysis

  1. Review the original scatter plots.
  2. Note the distribution and spread of vote shares.

- Step 2: Simplifying Visualization

  1. Create a cleaner layout with concise labeling.
  2. Ensure the plot is easy to interpret at a glance.

- Step 3: Enhancing Accessibility

  1. Select a color palette accessible to color-blind individuals.
  2. Test different color schemes for effectiveness.

- Step 4: Adding Interactivity

  1. Utilize tools like Plotly or D3.js to add hover effects.
  2. Allow users to click on data points to reveal more details.

- Step 5: Including Additional Data

  1. Collect relevant demographic data.
  2. Use it to create multi-layered visualizations that provide deeper insights.

By following these steps, the redesigned visualization will maintain the original story's integrity while enhancing clarity and accessibility.

Redesigned Graph 1:
Screenshot 2024-07-28 at 11 38 13 PM

Redesigned Graph 2:
Screenshot 2024-07-28 at 11 38 42 PM

Here are the redesigned scatter plots for the voting patterns in Nungambakkam and Kodambakkam, and Adyar and Guindy:

  1. Nungambakkam and Kodambakkam Voting Patterns
  2. Adyar and Guindy Voting Patterns

Key Takeaways:

  • Simplified Presentation: The original scatter plots were cluttered with overlapping labels, making them difficult to read. The redesigned plots use concise labels and annotations, enhancing clarity without losing essential information.
  • Color Accessibility: The use of red and blue in the redesigned plots ensures that the visualizations are accessible to those with color vision deficiencies, broadening the audience that can accurately interpret the data.
  • Enhanced Readability: A cleaner layout with clear grid lines and improved spacing allows for quick and easy interpretation of the data. This is crucial for making informed insights about voting behavior across different income areas.

Insights:

  • The visualizations reaffirm the original story's narrative: DMK's stronghold in lower-income areas and BJP's relatively higher performance in wealthier regions.
  • By presenting the data in a more accessible format, these redesigned charts can help policymakers, analysts, and the public better understand the socio-economic dimensions of voting patterns.

Future Improvements:

To further enhance the data story, integrating interactive elements and additional datasets, such as demographic information, can provide deeper insights. Interactive visualizations using tools like Plotly or D3.js can offer users the ability to explore data points in more detail, fostering a more engaging and informative experience.

Overall, the redesign maintains the original story's intent while significantly improving accessibility and readability, making the data more meaningful and actionable for a wider audience.

@miqbal07
Copy link

miqbal07 commented Jul 28, 2024

Name - Iqbal Hossain
Roll no - 21f2000965

Title - Redesigning the Budget Allocation Story for Andhra Pradesh

Source link - https://www.thehindu.com/news/national/andhra-pradesh/centre-allocated-50475-crore-which-is-about-4-of-the-national-budget-to-andhra-pradesh-murugan/article68453074.ece

Key Points of the Original Story

- Significant Budget Allocation:

The Central Government has allocated approximately ₹50,475 crore to Andhra Pradesh for the fiscal year 2024-25, which is about 4% of the total national budget.
Focus on Development Projects:

Major projects funded include the construction of the capital city Amaravati (₹15,000 crore) and the Polavaram project, indicating strategic priorities for the state's development.
Support for Andhra Pradesh Post-Bifurcation:

The allocation is portrayed as essential support for Andhra Pradesh, which requires financial assistance due to challenges following the bifurcation from Telangana.
Comparison with Other States:

Implicitly, the story emphasizes that Andhra Pradesh's allocation is significant compared to other states, highlighting the Central Government's focus on the state’s development needs.

Analysis of Original Data and Encoding

Details of the Data:
Type of Data:

Quantitative Data:
Budget allocations in crores for Andhra Pradesh and other states.
Specific allocations for key projects within Andhra Pradesh.
Categorical Data:
Names of states and projects.
Percentage Data:
Share of Andhra Pradesh’s allocation as a percentage of the national budget.
Extent of the Data:

The data includes budget figures for Andhra Pradesh and selected states, along with project-specific allocations within Andhra Pradesh.
Dimensions of the Data:

State Allocations: Budget amounts allocated to Andhra Pradesh, Karnataka, Tamil Nadu, Kerala, and Telangana.
Project Allocations: Breakdown of Andhra Pradesh’s allocation into key projects.

Gaps in the Data:

The story lacks a detailed breakdown of other states' allocations, limiting broader comparison.
Historical data or trends over previous years are not included, which would provide additional context.
Limited information on how these funds fit into the overall budgets of Andhra Pradesh or the other states.
Original Encoding Analysis:
Original Encoding:
Textual Presentation:
The data is primarily presented through descriptive text, with numerical values interspersed within paragraphs.
Numeric Values:
Figures for allocations and percentages are provided in text form, requiring readers to interpret and compare them manually.
Identifying Problems with Original Encoding
Problems with Original Encoding:
Lack of Visual Clarity:

The story does not use any visual aids or charts, which makes it difficult for readers to quickly understand and compare budget allocations.
Inefficient Comparison:

Text-based comparisons require readers to process multiple numbers mentally, making it hard to assess the relative importance of Andhra Pradesh’s allocation.
Information Overload:

The text-heavy format may overwhelm readers, making it challenging to extract key insights and priorities from the data.
No Visual Storytelling:

The absence of visual elements leads to a lack of narrative flow, which could guide readers through the data more effectively.
Proposed Improvements
Improvements:
Incorporation of Visual Aids:

Introduce visualizations such as bar charts and pie charts to represent the data, enhancing readability and comprehension.
Emphasis on Key Insights:

Use color coding and visual emphasis to highlight Andhra Pradesh’s budget allocation, making it stand out for easier comparison.
Comparative Analysis:

Include comparative data for other states, allowing readers to see Andhra Pradesh’s allocation in a broader context.
Focus on Project Allocation:

Visualize the distribution of funds among key projects within Andhra Pradesh, providing a clearer understanding of the state’s priorities.

Bar Chart: State Budget Allocations
Purpose: Visualize the allocation amounts to different states, emphasizing Andhra Pradesh's significant share.
image

Pie Chart: Project-wise Allocation in Andhra Pradesh
Purpose: Display the breakdown of budget allocation among major projects within Andhra Pradesh.

image

Bar Chart: Percentage of National Budget
Purpose: Highlight Andhra Pradesh's share of the national budget compared to other states.

image

Conclusion:
By incorporating these visualizations, the redesigned story enhances comprehension and engagement, making the budget allocations more accessible and meaningful to the audience. The visual elements address the original story's shortcomings by providing clarity, facilitating comparisons, and highlighting key insights, thereby improving the overall narrative.

@sujashaaa
Copy link

Name: Sujasha S
Roll: 21f3001115
Hindu IGD Datapoint for review
Title: Wealth Tax-Financed Green Deal in Indian Budget 2024
Publisher : SHOUVIK CHAKRABORTY,ROHIT AZAD

Publisher’s Data Summary
India's new NDA government is preparing to present its Budget 2024 (now done), focusing on critical issues like unemployment, climate change and inequality. A key proposal is an Indian Green Deal (IGD),financed entirely by the introduction of wealth tax, designed to address climate change, inequality, and joblessness. The wealthiest 10% of Indians, through their consumption of carbon-intensive goods, have significantly contributed to rising emissions and inequality.
The IGD aims to prioritize green energy, infrastructure, and the care economy (education and health), inspired by the Atmanirbhar package from 2020. It proposes an investment of 10% of GDP over ten years: 5% on infrastructure, 3% on the care economy, and 2% on green energy. This initiative could generate 38.6 million jobs, accounting for 8.2% of the labor force.
Funding the IGD would necessitate a wealth tax of around 1.7%, potentially decreasing to 1.3% by 2032 due to the anticipated increase in wealth among the Indian elite. India’s aim to be a bold leader in climate action will be visible with this strategy.
Data & Charts
Carbon Emission Data (Chart 1):
Type: Per capita carbon emissions.
Extent: 3 decades (1990 to 2020) Comparative analysis of the top 10% of the Indian population versus an average Indian and a first-world citizen.
Dimensions: Demography (Indian elite (top 10%) vs normal Indian citizen vs developed countries citizen)
Essential: Yes, to connect how wealth inequality boosts per capita emissions and how it destroys environment to justify the IGD green tax.

image

Expenditure Data (Chart 2 and 3 above):
Type: Indian elite vs citizens’ Expense ratio across commodities and overall budget categories
Extent: Extent is unclear but believe it’s 3 decades (1990 to 2020)
Dimensions: Commodties, budget categories
Essential: Yes, to illustrate the higher expenses leading to higher consumption, thus driving more carbon emissions, to justify the wealth tax.
Irrelevant - Not clear how elites are consuming vs normal citizens in these categories
Employment Creation Data (Chart 4 below changed to chart 5)
When the author is trying to come up with data for expenditure and employment created (chart 3 and 4), they are using 2 different charts which makes it tougher to compare them easily. Charts don’t have units of measurement as well which makes it difficult to follow what the chart is about.
To remove anomaly, I’m proposing to use a bubble chart which helps us combine both chart 3 and chart 4 into one chart with the size of the bubble indicating the size of each category. Larger the size, larger is the expenditure and employment opportunity.

image
Projected Wealth and Tax Rate Data (Chart 6 and 7 below):
Type: Projected wealth increase forecast for Indian elite which becomes a target category to introduce declining wealth tax rate to fund IGD.
Extent: One decade (2023 to 2032)
Dimensions: No dimensions by which wealth and tax rate metrics are broken out by
Essential: Yes, to introduce wealth tax from increasing wealth and financing green deal and projecting how much could be attained

image

@mnatasha1402
Copy link

mnatasha1402 commented Jul 28, 2024

Name: Natasha Mittal
Roll no. 21f1005823
Analysis and Redesign of the Story: "Which topics are India’s researchers publishing papers on?"

Data source: https://www.thehindu.com/data/which-topics-are-indias-researchers-publishing-papers-on/article68410121.ece

Story Intent
The author aims to inform readers about the research topics that Indian researchers are focusing on, comparing these trends with those of other countries like the U.S. and China. The story highlights the predominant research areas, especially in the context of global trends over the last 20 years and specifically in the last five years.

Data Description

  1. Type of Data:

    • Publication counts by research topics
    • Comparative data between different countries (India, U.S., China)
    • Time periods (last 20 years, last 5 years)
  2. Extent of the Data:

    • Covers two time periods: last 20 years and last 5 years
    • Includes multiple research topics
    • Compares three countries
  3. Dimensions of the Data:

    • Time (years)
    • Topics (e.g., Coronavirus, deep learning, photocatalysis)
    • Publication counts (number of papers published)
    • Geographic (countries: India, U.S., China)
  4. Gaps in the Data:

    • Detailed sources of the data (e.g., specific databases or institutions)
    • Breakdown of publication counts by specific sub-topics within broader categories
    • Qualitative insights on research impact or citations

Visual Encoding

  1. Current Encoding:
    • Bar charts and line graphs are used to display publication counts by topics and comparisons between countries.

image

  1. Problems with Current Encoding:

    • Limited interactivity
    • Potential clutter with multiple topics and countries in a single chart
    • Lack of detailed explanations or annotations for the visual elements
  2. Areas for Improvement:

    • Interactivity to allow users to explore specific data points can be enhanced.
    • Simplify visualizations to reduce clutter and improve clarity.
    • Adding detailed explanations and annotations to guide the reader through the visualizations.

2. Improved Visual Encoding

Simplified Charts:

  • Line Graphs for Trends: Use separate line graphs for different topics to prevent clutter.

image

  • Stacked Bar Charts: Show cumulative publication counts for better comparative analysis.

image
image

Final Thoughts

The redesign aims to improve user engagement, clarity, and depth of the original story by leveraging interactive visualizations and additional contextual data. This approach ensures that the main intent of showcasing research publication trends in India is maintained while providing a richer and more insightful user experience.

@pranam-pagi
Copy link

pranam-pagi commented Jul 28, 2024

Name: Pranam Premanand Pagi
Roll No: 21f3002964

Original Article: MPs 27 times wealthier than an average urban household

Authors: Vignesh Radhakrishnan, Sambavi Parthasarathy

Story of article in view of authors

The article highlights the wealth disparity between Members of Parliament (MPs) in India and the average urban household. It points out that MPs are significantly wealthier, with the majority possessing assets far above the typical urban or rural household. This wealth concentration suggests that election candidacies are often limited to affluent individuals.

Data Used:

  • Type of Data: Quantitative data on asset values.
  • Extent: The data spans the asset values declared by MPs and candidates in the 2024 and 2019 general elections.
  • Dimensions:
    • Median asset values of winning and runner-up candidates in 2019 and 2024.
    • Asset values of candidates from different political parties.
    • Comparison of MP wealth with average urban and rural household assets.
    • Data from the All India Debt & Investment Survey (2019) regarding household wealth.

Gaps in the Data:

The data focuses primarily on wealth, without contextual information about income sources, liabilities, or the potential impact of these wealth disparities on electoral outcomes.

Chart 1 | The chart shows the median assets of winners and runners-up in 2019 and 2024.

Chart 1

Chart 2 | The chart shows the median assets of candidates of the major political parties in 2024.

Chart 2

Chart 3 | The chart shows the average value of household assets for different decile classes for rural and urban areas in 2019 (in ₹ 1000s).

Chart 3

Analysis of the Original Visualization and Design Considerations

Original Visualization:

The original story uses multiple charts to illustrate the wealth of MPs compared to urban households. These include:

  • Median asset values of winning and runner-up candidates in 2019 and 2024.
  • Asset values of candidates by political party.
  • Household asset values for different decile classes.

Problems Identified:

  • The charts may lack sufficient labeling or explanations, making them harder for readers to interpret quickly.
  • The use of median values, while useful for mitigating outliers, might not fully communicate the range of wealth among MPs.
  • The color scheme and design choices might not be optimized for clarity or accessibility.

Improvement Suggestions:

  • Enhance labeling and add more descriptive legends to make charts easier to understand.
  • Include a wider range of data representations, such as box plots or histograms, to show the distribution of assets more clearly.
  • Use a color scheme that is accessible to colorblind readers and provides better contrast.
  • Include narrative elements or annotations on the charts to emphasize key points or anomalies.

Redesigning the Story

Objectives:

  1. Clarify the wealth disparity between MPs and average households.
  2. Highlight the distribution of wealth among MPs and across different political parties.
  3. Provide context by comparing with household wealth in different deciles.

Steps in Redesign:

  1. Data Collection and Cleaning: Gather detailed data on MP assets, including minimum, maximum, and median values, and the distribution of wealth within parties.

  2. Visualization Selection:

    • Use a box plot to display the range and distribution of assets among MPs.
    • Utilize a bar chart to compare median assets across political parties.
    • Implement a line or area chart to show the trend in wealth disparity over time.
  3. Design and Layout:

    • Ensure charts are clearly labeled and include explanations for median vs. average values.
    • Use consistent and accessible color schemes.
    • Include annotations to highlight significant data points or exceptions.

@Fashmina123
Copy link

NAME: FASHMINA MOHAMED
ROLL NO.: 21f3003099

Link: https://www.thehindu.com/data/2024-polls-how-people-in-high-and-low-income-areas-voted-in-chennais-mylapore-t-nagar-and-other-areas/article68427083.ece

The story:
The article aims to highlight the voting patterns in Chennai's Lok Sabha elections, focusing on the differences in voting patterns between wealthier and lower-income areas. The story suggests that DMK has a stronger support in lower-income area, while BJP tends to perform better in wealthier neighborhoods.

Data Used:
Type: It includes the voting percentages from polling stations across different areas in Chennai, categorised by a guideline value of properties in these areas. The guideline value acts as a proxy for wealth level of the residents.
Data Extent: It covers 3 Lok Sabha constituencies in Chennai - North, Central and South.
Dimensions: THe geographical areas, vote shares of DMK and BJP in various areas.
Data Gaps: The data doesn't cover other political parties and focuses on these 2 parties.

Original Encoding:
The visualisation includes a dot plot to represent the vote shares of the parties across different areas.
vertical axis - streets ordered by wealth (descending)
horizontal axis - vote share percentage
red dots - DMK
blue dots - BJP

image:
image

Problems in the original encoding:
The colour choice are standard but could have been more distinct, for colour blind viewers especially.
The plot could have been clearer in separating the groups based on the income.
Annotation could have been more comprehensive to enhance the understanding.

Redesign Proposal:
Use distinct shapes or their logos to indicate the data better
Introduce background shading to visually distinguish between high, medium and low income areas.
add more annotations and a legend to explain the guideline values and income levels.
Create an interactive version to see detailed vote share percentages and other relavant data.

The purpose of redesigning includes better accesibility to the data and for a better understanding.

@DHIBIN-VIKASH
Copy link

Name: Dhibin Vikash K P
Roll No: 21f3001664

Article Tittle: Sikar, Namakkal, Kota: Select “coaching hubs” are host to many high scoring NEET-UG-2024 candidates
Main Story:
The article highlights cities with the highest candidates share, scoring 650+ and 700+ in NEET-UG 2024. Key cities include Sikar, Namakkal, Kottayam, Tanuku, Jhunjhunu, and Kurukshetra. Sikar leads with the highest percentages, while Namakkal stands out in Tamil Nadu due to its coaching institutes. Specific centers in these cities have high averages, with some centers showing anomalies.

Data Used:

Type of Data: Quantitative data on student scores from the NEET UG 2024 exams.
Extent of Data: The dataset includes scores from all candidates who appeared for NEET UG 2024, focusing on those scoring 650 and above.
Dimensions of the Data: The data includes candidate scores, cities, states, and specific educational centers.
Gaps in the Data: The article needs to provide detailed demographic information or historical comparison data.
Relevance: Essential data includes candidate scores and their respective cities and centers. Irrelevant data might include unrelated demographic details not covered in the story.

Current Visual Encoding:

Chart 1: A scatter chart displaying the percentage of students scoring above 650 marks across different cities.
image

Table 2: A table listing the top centers with the highest share of candidates scoring above 650 marks.
image

Problems with Current Encoding:

Scatter Chart:
Cluttered Data Points: The scatter chart is very much disorganized, making it difficult to interpret what they are about to convey.
While we hover over it conveys the percentage value which seems to be very redundant.
Color Gradient: The color gradient from 0 to 7.48% might not be intuitive for quick interpretation.
Table:
Limited Information: The table lists the top ten centers but does not provide additional context or comparisons.
Lack of Visual Appeal: The table needs to be more clear and could benefit from visual enhancements for better readability.
Redesigning the Visualization

Improvement Plan:

Create more apparent, intuitive charts highlighting key insights without overwhelming the viewer.
Use Effective Visual Elements: Use bar charts, heat maps, and annotated visualizations to emphasize the difference in marks scored across each state and showcase the district-wise split-ups.

Improve the existing tabular format with interactive charts that convey the information of the centers conducting the exam and the percentage of candidates scoring above 650.

Redesigned Visualizations:

Bar Chart: Displaying the top cities with the highest share of students scoring above 650 marks.
Heat Map: Showing the concentration of high scores across different states.
Annotated Visuals: Highlighting the top-performing centers and cities.

Data on the NEET scores of candidates was taken from the official websites and the below charts were prepared.

Redesigned Heat Map for state-wise distribution of scores >650:

image

Redesigned Bar chart showcasing the centers along with percentage of students scored above 650:

image

Documentation

Original Story:

Link to the original story: https://www.thehindu.com/data/neet-ug-2024-data-reveals-top-cities-for-high-scoring-candidates-crucial-for-government-medical-college-admissions/article68441411.ece

Redesign Documentation:

Map Visualization:
Encoding: Geographical distribution of high-scoring candidates.
Color Gradient: Represents % of candidates scoring above (650+).

Bar chart:
Each bar represents the percentage of students who have scored above 650 across the top 5 centers in India. The length of the bar is encoded as a percentage, and the relevant numbers are given for further reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests