Visually exploring ideas coming from data is a powerful tool. Instead of creating an endless number of charts to discover content, we can create interactive visual outputs using ipywidgets combined with seaborn.
For this article, the code will be run in Jupyter Notebook and I will use the IBM HR Analytics Employee Attrition & Performance dataset from Kaggle.
First, we have to install ipywidgets using pip.
To create interactive visualization we will need a function that can interact with an input and the chart — for example, interact: it generates UI controls and creates a bridge between a function and the control. Through this bridge, whenever the control input is changed, the function is being called with the new input. Here is a basic example:
The function f returns the square of the input number, and the interact function (because the x parameter input is numeric) creates a UI slide for numeric input. When a new input is given through the slider, f is called and the value from the function appears on the screen.
If we set x to be a boolean or a string the input widget will be a checkbox or a textbox respectively.
We can have multiple input variables as well.
The widgets can be configured of course to have more control over the input structure.
One more tool can be really useful for our purposes: interactive_output gives us the option to manage the layout of multiple input widgets thus creating a more user-friendly interface.
Ipywidgets has a ton of different tools to work with, please take a look at the documentation to find out more.
If you would like to keep me caffeinated for creating more content like this please consider to support me, with just a coffee.
Let's see how we can create some visualizations of our data. Seaborn gives great tools to create charts easily from DataFrames so I will use it for this demo, however, keep in mind that any visualization library can be used.
Counting and examining the distribution of categorical data can be done with countplot — here we only have to input one categorical variable (x-axis), the counting and placing the subtotals on the y axis is done by Seaborn.
Note the “hue” parameter — with that you can split the counted categorical data further along another dimension, like in our case the distribution of female and male employees by attrition.
One way of looking at relationships of numeric values is a relplot — here we have a point representing each value pair, hopefully visualizing some relationships when done on scale.
A boxplot can be used to visualize some of the statistical features of numeric values, like minimum and maximum values, the mean, outliers, and so on — like on the below boxplot featuring the monthly income data per educational field.
Note the “figsize” parameter set at the first-line — setting the plot to be a bit wider is sometimes good for readability.
Seaborn has a ton of exciting features, the above examples are not giving justice to it by any stretch of the imagination — however, this is not an article on visualization tutorials. Do give some time to read the official documentation if you are interested in the details.
Now that we have the two main elements, it is time for combining them to have the interactive visualization we are looking for. Let's have a countplot that can visualize a selected categorical dimension!
For simplicity, I assume that categorical data is stored in “object” data types (this is not necessarily the case, not in the IBM dataset for sure, since the dimension “PerformanceRating” for instance is clearly categorical — however in order to avoid manual construction of lists I will run with this blunt overgeneralization).
This list is going to be the basis of a dropdown selection widget.
Now we need a function accepting our selection from this dropdown, returning a countplot with the desired dimension. Nothing complicated as a starter, but I have thrown in a check to make sure that if we have several unique values in a column, rotate the x-axis labels for readability.
All we need now is to patch these elements together.
And here we are: if we select a new column, we get the chart updated.
The “hue” parameter is something I really like — can we do that dynamically too? Of course, we can. Just a few tweaks to our code.
And we can have the hue value selected in a second dropdown.
Examining numeric data relations can be achieved if we switch the columns in scope to the numeric dimensions (let's keep the hue categorical just to be on the easy side) and the plot type to relplot.
Three inputs, ready to draw:
Playing around with the dimension selection is a real-time saver when trying to find meaningful connections in the data.
With a numeric slider, we can narrow the underlying dataset, so that we can, for example, set a cap on the age of the employees we want to look at.
Thank you for reading this post, I hope you have found value in it. My purpose was not to give a comprehensive explanation on how either ipywidgets or seaborn works — the aim was to draw your attention to combining the two, creating very helpful functionality in your data exploration journey.