Here’s Your Guide to IBM’s “Data Visualization with Python” Final Assignment (or How to Code a Dashboard using Plotly Dash)

Published on

This article will guide you to accomplish the final assignment of Data Visualization with Python, a course created by IBM and offered by Coursera. Nevertheless, this tutorial is for anyone— enrolled in the course or not — who wants to learn how to code an interactive dashboard in Python using Plotly’s Dash library. I assume, however, that you already know how to use Pandas for analysis.

I want you to learn

If you just copy the codes in this article and ignore how to understand them, you miss the whole point of learning data visualization (or data science in general). Typing the codes on your own helps you be more aware of their syntax, logic, and structure. Also, learning by doing nudges you to ask relevant questions and search for their answers in your browser. This process pushes you to learn better.

What is the final assignment?

You will play a role of a data analyst tasked to monitor and report US domestic airline flights. Your goal is to analyze the performance of reporting airlines to improve flight reliability, thereby enhancing customer reliability. Your dashboard should have a dropdown for the type of report with two options:

  • Yearly Airline Performance Report, and
  • Yearly Airline Delay Report.

It should also have another dropdown that selects what Year (from 2005 to 2020) to report. The two pictures below (corresponding to the two report types) show my dashboard for 2009. The kinds of graphs for each report are instructed by the assignment.

💡 Speed up your blog creation with DifferAI.

Available for free exclusively on the free and open blogging platform, Differ.

Where will I code?

I will code the Dash application in my local Jupyter Notebook. The course, however, invites the student to use IBM’s Skills Networks Labs Cloud IDE (which uses Theia Cloud IDE). Unfortunately, all of your work in this environment is lost when you close your session, timed out due to inactivity, or logged off.

Gladly, the final assignment is graded only on the screenshots of your dashboard (given a report type and a year), no matter how or where you coded it. Thus, I am going to use Jupyter Notebook, which works nicely with Dash 2.11 and later versions.

Note: My codes do not follow everything from the instructor’s code hints. You might also see that some of my graph titles and codes for layout customizations are different from the course’s expectations. What matters is that we can produce the desired dashboard. In the real world, most stakeholders do not care about the behind-the-scenes codes. What they expect is the output.

Basic components of a Dash app

  1. Layout = This is the structure of your dashboard. The codes for this prepare how your dashboard will look like.
  2. Callback Function = This is the function called by Dash whenever an input component’s property changes. This makes the dashboard interactive.

Necessary installations

In case you have not installed them yet:

pip install pandas
pip install numpy
pip install plotly
pip install dash
pip install dash-core-components
pip install dash-html-components

Where does the data come from?

The dataset (in .csv format) is provided by IBM and it contains real data on US Domestic Flights from 1987 to 2020, taken from the US Bureau of Transportation Statistics. You can download the dataset from here. To know what each column means, refer to this link.

Let’s start coding!

Importing libraries

import pandas as pd
import numpy as np

import plotly.express as px

from dash import Dash, html, dcc, Input, Output, callback

Preparing our data

url = 'https://raw.githubusercontent.com/marvin-rubia/US-Airlines-Analytics-Dashboard/main/airline_data.csv'

df = pd.read_csv(url)

# Check our dataframe
df.info()

The output for this is:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27000 entries, 0 to 26999
Columns: 110 entries, Unnamed: 0 to Div5TailNum
dtypes: float64(74), int64(19), object(17)
memory usage: 22.7+ MB

Getting only 2005–2020 data (as per the instruction)

# Get only 2005-2020 data, as per course instruction

condition = (df['Year'] >= 2005) & (df['Year'] <= 2020)

airline_data = df[condition]

airline_data.head()

The output for the previous codes shows the first 5 rows of the 2005–2020 dataframe.

The airline_data is now our dataframe moving forward. We are going to plot 5 graphs for each type of report. In other words, we need to have a total of 10 subsets of our airline_data.

Function to get necessary data for performance report (Type 1)

def compute_data_1(airline_data):
    ## Get different sets of data

    # For plot1
    tree_data = airline_data.groupby(['DestState', 'Reporting_Airline'])['Flights'].sum().reset_index()

    # For plot2
    condition = airline_data['DivAirportLandings'] != 0.0
    div_data = airline_data[condition]

    # For plot3
    map_data = airline_data.groupby('OriginState')['Flights'].sum().reset_index()

    # For plot4
    bar_data = airline_data.groupby(['Month','CancellationCode'])['Flights'].sum().reset_index()

    # For plot5
    line_data = airline_data.groupby(['Month','Reporting_Airline'])['AirTime'].mean().reset_index()

    return tree_data, div_data, map_data, bar_data, line_data

Function to get necessary data for delay report (Type 2)

def compute_data_2(airline_data):
    ## Compute delay averages

    # For plot1
    avg_car = airline_data.groupby(['Month','Reporting_Airline'])['CarrierDelay'].mean().reset_index()

    # For plot2
    avg_weather = airline_data.groupby(['Month','Reporting_Airline'])['WeatherDelay'].mean().reset_index()

    # For plot3
    avg_NAS = airline_data.groupby(['Month','Reporting_Airline'])['NASDelay'].mean().reset_index()

    # For plot4
    avg_sec = airline_data.groupby(['Month','Reporting_Airline'])['SecurityDelay'].mean().reset_index()

    # For plot5
    avg_late = airline_data.groupby(['Month','Reporting_Airline'])['LateAircraftDelay'].mean().reset_index()

    return avg_car, avg_weather, avg_NAS, avg_sec, avg_late

Just like what I said in the introduction, I assume you already know how to manipulate dataframe using Pandas. Although, you can have a sense of each sub-dataframe by running them one by one.

Coding the dashboard’s layout

Note for desktop viewers: To see the remaining code on a given line, you can highlight any word and extend it to the right.

# Create the Dash app
app = Dash(__name__)

# Create the app layout
app.layout = html.Div(style={'backgroundColor': 'lightblue', 'border': 'ridge', 'padding': '50px'}, children=[
                html.H1('US Domestic Airline Flights Performance [2005-2020]',
                       style={'textAlign': 'center', 'color': '#00008B',
                             'font-size': 36}),

                # Dropdown creation for report_type
                html.Div(style={'display': 'flex'},
                         children=[
                            html.H2('Choose Type of Report:', style={'white-space':'nowrap', 'font-size': 20}),
                            dcc.Dropdown(
                                 options=[{'label': 'Yearly Airline Performance Report', 'value':'Type1'},
                                          {'label': 'Yearly Airline Delay Report', 'value': 'Type2'}],
                                 id='report_type_input',
                                 value='Type1',
                                 placeholder='Choose a report type.',
                                 style={'textAlign': 'center', 'width':'100%', 'padding':'2px', 'font-size': 26})
                                  ]
                         ),

                # Dropdown creation for year of interest
                html.Div(style={'display': 'flex'},
                         children=[
                            html.H2('Choose Year:', style={'white-space':'nowrap', 'font-size': 20}),
                            dcc.Dropdown(options=list(range(2005,2021)),
                                 id='year_input',
                                 value='2020',
                                 placeholder='Choose a year.',
                                 style={'textAlign': 'center', 'width':'100%', 'padding':'2px', 'font-size': 26})
                                 ]
                        ),

                # Area for the graphs
                html.Div(dcc.Graph(id='plot1'), style={'border': 'ridge', 'padding': '10px',
                                                       'backgroundColor': '#00008B'}),

                html.Div(style={'display': 'flex', 'justify-content': 'center'},
                         children=[
                            html.Div(dcc.Graph(id='plot2'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'}),
                            html.Div(dcc.Graph(id='plot3'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'})
                            ]
                         ),

                html.Div(style={'display': 'flex', 'justify-content': 'center'},
                         children=[
                            html.Div(dcc.Graph(id='plot4'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'},),
                            html.Div(dcc.Graph(id='plot5'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'})
                            ]
                         )
]
)

'''
You are going to put the callback function code here later!
'''

# Run the app and open it in a new tab
if __name__ == '__main__':
    app.run(jupyter_mode='tab')

Read the subheading comments to understand the purpose of each chunk of codes above. Remember, we installed the dash-html-components, which allows us to produce HTML-like codes in Python. The html.Div() is a division in our dashboard. It tells Dash to allot a space for whatever we want to code inside that division. Below the heading html.H1 are 5 layers of divisions. When you run the previous codes, you’ll get something like this:

If you count the total html divisions from the layout code, we actually have a total of 10. But we were able to nest the two sub-divisions in the 2nd, 3rd, 5th, and 6th layers with the code style={‘display’: ‘flex’}.

In our layout, the inputs (report types and year) come from two dcc.Dropdown() components represented by their id parameters. Meanwhile, the id inside each dcc.Graph() component represents each graph, which is our output. The logic is this: the output graphs should respond to the change in input values as instructed by the callback function.

To learn more about Dash Dropdowns, refer to this documentation.
To learn more about Dash Graphs, refer to this documentation.

Coding the dashboard’s callback decorator and function

We are going to add the callback function after our app.layout codes and before running the app. The following shows the callback decorator and the callback function.

# Create callback decorator
@app.callback([Output(component_id='plot1', component_property='figure'),
           Output(component_id='plot2', component_property='figure'),
           Output(component_id='plot3', component_property='figure'),
           Output(component_id='plot4', component_property='figure'),
           Output(component_id='plot5', component_property='figure')],
           [Input(component_id='report_type_input', component_property='value'),
           Input(component_id='year_input', component_property='value')]
           )

# Create callback function
def get_graphs(report_type, year):
    condition = airline_data['Year'] == int(year)
    data = airline_data[condition]

    # If report type 1 is chosen:
    if report_type == 'Type1':
        # Get plotting data
        tree_data, div_data, map_data, bar_data, line_data = compute_data_1(data)

        # Tree map
        tree_fig = px.treemap(tree_data, path=[px.Constant('United States'), 'DestState', 'Reporting_Airline'], values='Flights',
                              color='Flights', color_continuous_scale='RdBu', title='Flights by Airlines Going to a Destination State')
        tree_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Pie graph
        pie_fig = px.pie(div_data, values='Flights', names='Reporting_Airline', title='% of Diverted Landings by Airline')
        pie_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Choropleth map
        map_fig = px.choropleth(map_data,
                                locations='OriginState',
                                color='Flights',
                                hover_data=['OriginState', 'Flights'],
                                locationmode = 'USA-states', # Set to plot as US States
                                color_continuous_scale='GnBu',
                                range_color=[0, map_data['Flights'].max()])

        map_fig.update_layout(title_text = 'Number of Flights from Origin State', geo_scope='usa',
                             title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style') # Plot only the USA instead of globe)

        # Bar graph
        bar_fig = px.bar(bar_data, x='Month', y='Flights', color='CancellationCode', title='Monthly Flight Cancellation')
        bar_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Line graph
        line_fig = px.line(line_data, x='Month', y='AirTime', color='Reporting_Airline', title='Monthly Airtime Per Airline')
        line_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        return [tree_fig, pie_fig, map_fig, bar_fig, line_fig]

    # If report type 2 is chosen:
    elif report_type == 'Type2':
        avg_car, avg_weather, avg_NAS, avg_sec, avg_late = compute_data_2(data)

        # Create line graphs
        carrier_fig = px.line(avg_car, x='Month', y='CarrierDelay', color='Reporting_Airline', title='Average Carrrier Delay Time (minutes) by Airline')
        carrier_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        weather_fig = px.line(avg_weather, x='Month', y='WeatherDelay', color='Reporting_Airline', title='Average Weather Delay Time (minutes) by Airline')
        weather_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        nas_fig = px.line(avg_NAS, x='Month', y='NASDelay', color='Reporting_Airline', title='Average NAS Delay Time (minutes) by Airline')
        nas_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        sec_fig = px.line(avg_sec, x='Month', y='SecurityDelay', color='Reporting_Airline', title='Average Security Delay Time (minutes) by Airline')
        sec_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        late_fig = px.line(avg_late, x='Month', y='LateAircraftDelay', color='Reporting_Airline', title='Average Late Aircraft Delay Time (minutes) by Airline')
        late_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        return [carrier_fig, weather_fig, nas_fig, sec_fig, late_fig]

The callback decorator tells Dash that we have 5 outputs from two inputs. Each of which has designated id we coded in app.layout().

In contrast with the assignment’s instruction, I did not use State() objects in the callback operator. State() is similar to Input(), except that you have to finish updating all State() objects before the callback function is fired. (To read more about it, click here.) I believe it is not necessary for our dashboard. If I update the report type dropdown input, I am okay with seeing the graphs immediately change even if I have not updated yet the year dropdown input.

The first two lines of our callback function get_graphs(report_type, year) select the data from our airline_data dataframe with the given year input.

Our callback function returns either the list [tree_fig, pie_fig, map_fig, bar_fig, line_fig] if Yearly Airline Performance Report is chosen or the list [carrier_fig, weather_fig, nas_fig, sec_fig, late_fig] if Yearly Airline Delay Report is chosen. The order within either list matters. The order corresponds to the order of the Output(component_id) in the callback decorator.

The rest of the codes are the syntax for creating graphs using Plotly Express, which I assume you have learned from the course modules (or from Plotly documentations) prior to the final assignment.

Full Code (from creating the Dash app to running it)

# Create the Dash app
app = Dash(__name__)

# Create the app layout
app.layout = html.Div(style={'backgroundColor': 'lightblue', 'border': 'ridge', 'padding': '50px'}, children=[
                html.H1('US Domestic Airline Flights Performance [2005-2020]',
                       style={'textAlign': 'center', 'color': '#00008B',
                             'font-size': 36}),

                # Dropdown creation for report_type
                html.Div(style={'display': 'flex'},
                         children=[
                            html.H2('Choose Type of Report:', style={'white-space':'nowrap', 'font-size': 20}),
                            dcc.Dropdown(
                                 options=[{'label': 'Yearly Airline Performance Report', 'value':'Type1'},
                                          {'label': 'Yearly Airline Delay Report', 'value': 'Type2'}],
                                 id='report_type_input',
                                 value='Type1',
                                 placeholder='Choose a report type.',
                                 style={'textAlign': 'center', 'width':'100%', 'padding':'2px', 'font-size': 26})
                                  ]
                         ),

                # Dropdown creation for year of interest
                html.Div(style={'display': 'flex'},
                         children=[
                            html.H2('Choose Year:', style={'white-space':'nowrap', 'font-size': 20}),
                            dcc.Dropdown(options=list(range(2005,2021)),
                                 id='year_input',
                                 value='2020',
                                 placeholder='Choose a year.',
                                 style={'textAlign': 'center', 'width':'100%', 'padding':'2px', 'font-size': 26})
                                 ]
                        ),

                # Area for the graphs
                html.Div(dcc.Graph(id='plot1'), style={'border': 'ridge', 'padding': '10px',
                                                       'backgroundColor': '#00008B'}),

                html.Div(style={'display': 'flex', 'justify-content': 'center'},
                         children=[
                            html.Div(dcc.Graph(id='plot2'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'}),
                            html.Div(dcc.Graph(id='plot3'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'})
                            ]
                         ),

                html.Div(style={'display': 'flex', 'justify-content': 'center'},
                         children=[
                            html.Div(dcc.Graph(id='plot4'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'},),
                            html.Div(dcc.Graph(id='plot5'), style={'border': 'ridge', 'padding': '10px', 'backgroundColor': '#00008B'})
                            ]
                         )
]
)

# Create callback decorator
@app.callback([Output(component_id='plot1', component_property='figure'),
           Output(component_id='plot2', component_property='figure'),
           Output(component_id='plot3', component_property='figure'),
           Output(component_id='plot4', component_property='figure'),
           Output(component_id='plot5', component_property='figure')],
           [Input(component_id='report_type_input', component_property='value'),
           Input(component_id='year_input', component_property='value')]
           )

# Create callback function
def get_graphs(report_type, year):
    condition = airline_data['Year'] == int(year)
    data = airline_data[condition]

    # If report type 1 is chosen:
    if report_type == 'Type1':
        # Get plotting data
        tree_data, div_data, map_data, bar_data, line_data = compute_data_1(data)

        # Tree map
        tree_fig = px.treemap(tree_data, path=[px.Constant('United States'), 'DestState', 'Reporting_Airline'], values='Flights',
                              color='Flights', color_continuous_scale='RdBu', title='Flights by Airlines Going to a Destination State')
        tree_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Pie graph
        pie_fig = px.pie(div_data, values='Flights', names='Reporting_Airline', title='% of Diverted Landings by Airline')
        pie_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Choropleth map
        map_fig = px.choropleth(map_data,
                                locations='OriginState',
                                color='Flights',
                                hover_data=['OriginState', 'Flights'],
                                locationmode = 'USA-states', # Set to plot as US States
                                color_continuous_scale='GnBu',
                                range_color=[0, map_data['Flights'].max()])

        map_fig.update_layout(title_text = 'Number of Flights from Origin State', geo_scope='usa',
                             title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style') # Plot only the USA instead of globe)

        # Bar graph
        bar_fig = px.bar(bar_data, x='Month', y='Flights', color='CancellationCode', title='Monthly Flight Cancellation')
        bar_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        # Line graph
        line_fig = px.line(line_data, x='Month', y='AirTime', color='Reporting_Airline', title='Monthly Airtime Per Airline')
        line_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        return [tree_fig, pie_fig, map_fig, bar_fig, line_fig]

    # If report type 2 is chosen:
    elif report_type == 'Type2':
        avg_car, avg_weather, avg_NAS, avg_sec, avg_late = compute_data_2(data)

        # Create line graphs
        carrier_fig = px.line(avg_car, x='Month', y='CarrierDelay', color='Reporting_Airline', title='Average Carrrier Delay Time (minutes) by Airline')
        carrier_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        weather_fig = px.line(avg_weather, x='Month', y='WeatherDelay', color='Reporting_Airline', title='Average Weather Delay Time (minutes) by Airline')
        weather_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        nas_fig = px.line(avg_NAS, x='Month', y='NASDelay', color='Reporting_Airline', title='Average NAS Delay Time (minutes) by Airline')
        nas_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        sec_fig = px.line(avg_sec, x='Month', y='SecurityDelay', color='Reporting_Airline', title='Average Security Delay Time (minutes) by Airline')
        sec_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        late_fig = px.line(avg_late, x='Month', y='LateAircraftDelay', color='Reporting_Airline', title='Average Late Aircraft Delay Time (minutes) by Airline')
        late_fig.update_layout(title_x=0.5, font=dict(size=18), title_font_family='Bookman Old Style')

        return [carrier_fig, weather_fig, nas_fig, sec_fig, late_fig]

# Run the app and open it a new tab
if __name__ == '__main__':
    app.run(jupyter_mode='tab')
    # The jupyter_mode parameter is only possible within Jupyter environment

You might notice that my last line above has a different syntax from the one taught by the instructor in running a Dash app. That’s because I coded it in Jupyter Notebook, and the app.run(jupyter_mode= ‘tab’) code automatically produces our dashboard in a new tab (instead of inline with the Notebook cell). This allows us to have a fullscreen view of our dashboard.

How will you be graded?

If you are enrolled in this course, you will be instructed to submit a screenshot of your dashboard for a given report type and year.

The primary purpose of this article is to help you learn how to code a dashboard in Python from its backend. The course certificate is just secondary. That said, thank you for reading this guide, and congratulations in advance on claiming your IBM certificate!

Update: I have now modified my dashboard by customizing the transparency of Plotly charts and adding a background image. It looks better. Check it out!

If you found value in this post, kindly click the clap button (many times?) and share this with your friends. You can also follow me on Medium.com for more articles on critical thinking, data analytics, machine learning, science and society, and more. Want to connect with me on LinkedIn? Find me here.

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics