Skip to main content
SearchLoginLogin or Signup

Why Do We Plot Data?

Accompanying text for the “Designing for interactive exploratory data analysis requires theory of graphical inference” Explainer Zine
Published onSep 10, 2021
Why Do We Plot Data?
·
key-enterThis Pub is a Supplement to

Introduction

Before you read our explainer zine, it may be helpful to have some context about what a zine actually is. A zine, short for “magazine”, is traditionally a self-published collection of text and images that is not-for-profit. Think of it as a passion project, shared hand-to-hand. Historically, zines have been a way for subcultures to find their voice, be heard, and cultivate communities of shared interest.

This zine approach was inspired by a workshop by Sarah Mirk given at the Data Science by Design Creator Conf co-organized and attended by Stoudt. Mirk provided instructions for how to create a zine which can also be found online.

We like to think of zines as an opportunity for researchers to create and distribute accessible summaries of their work, like a business card, but more fun. Consider passing out a cartoon summary of your work to people at conferences: job seekers could concisely tell folks what they are working on while those hiring could inspire prospective candidates to come work with them. Since Hullman and Gelman’s paper is a call to action for a new line of research, we hope our explainer version will broaden the pool of candidates who might consider working in this field. 

Our zine was designed to have a stand-alone pull-out for those who want more details nestled in a high-level overview of the paper. The Bayes-ics zine-within-a-zine can be pulled out to be repurposed in other related explainers. Want a tangible version? Coming soon, you’ll be able to print these drawings on three pages of standard paper, and we will show you, via a video, how to fold and combine them into your very own zine. Our goal is to make the original research paper more accessible to a broader audience, and to ensure that the audience is as wide as possible we have also provided an alt-text version so that those with visual impairments can also enjoy the story.

You can print and fold this zine!

Download a printout in US letter format and in A4 format here.

For optimal folding, opt for borderless printing whenever possible or cut out the borders otherwise, and watch our folding instructions video here.


Front Cover

The title text is “Why do we plot data” This is written in large block letters.

There is a book with the title “Designing for interactive exploratory data analysis requires theory of graphical inference” on it.

There are cartoon versions of authors Jessica Hullman and Andrew Gelman, complete with pencils in hand.

Page 1 and 2

The title text is across the top of both pages and reads “Exploratory Data Analysis … a search for structure.”

The subtitle text is across the top of both pages and reads “...if EDA is the discovery of the unexpected … then this is defined relative to the expected.”

Page 1 has “EDA” written in the clouds and birds flying in a capital W shaped pattern. An observer from below spots this phenomena with their binoculars/telescope and exclaims, “What a finding!”. The “W” in “What” is underlined.

Page 2 shows a hill where the flowers on the top are tulips and the flowers on the bottom are daisies. An observer sees this difference with their magnifying glass and wonders aloud, “Hmm… that’s curious!”

Page 3 and 4

The title text is across the top of both pages and says “Confirmatory Data Analysis … testing hypotheses.”

Page 3 has “CDA” written on a computer screen. The investigator with the magnifying glass is looking back at a photo of the odd flower hill from Page 2, and wonders aloud “Could this have happened just by chance?”Page 4 shows the same investigator in a lab with a beaker and test tubes, wearing safety goggles. They state “Let’s put it to the test!” On the computer screen is a t-statistic of 2.6 and a p-value of 0.02.

Page 5

The text at the top of the page says “The concept of a model check in a Bayesian statistical framework unites EDA and CDA...” The words “model check” are written larger and in block letters, as on the front cover.The text at the bottom of the page continues on “...as you make observations in nature or in graphs.”

In the middle there is a balanced scale with a telescope and magnifying glass on one side and the beaker and test tubes on the other.

Front Cover - Bayes

The title text is “The Bayes-ics [pronounced “basics”] of Bayesian Statistics”.

A person holds out a coin, and the accompanying text prompts “Imagine you found a mythical coin.”

Page 1

The person daydreams: “Is this a fair coin? What is the probability, h, of getting a head on a given spin of the coin?”

Page 2

Text states: “Our assumption that the coin is fair can be encoded as our “prior,” a distribution centered on 0.5.”

There is an arrow between the word “prior” and the notation P(h)

Text continues: “How would we check this assumption? Is the coin fair, i.e. h=0.5? The “likelihood” allows us to compute the probability of observed data given a value for h.”

There is an arrow between the word “likelihood” and the notation P(x|h).

Page 3

Title text says “It’s time for some observed data! A one (1) denotes a head.”

The person looks on as he spins a coin on a plate with a large, tilted spinning coin in the background. There is a data table with the following entries: 0, 1, 0, 0, 1, 0, 1, 0, 1, 0.

Page 4

Text states: “We conduct an experiment by spinning the coin 10 times and observe 4 heads out of 10 spins. Bayes Rule allows us to combine our observed data and model, updating our belief about the coin with the results of our experiment.”

The label “Posterior (or “updated knowledge”):” is followed by an equation where P(h|x) is set equal to a ratio of P(x|h) * P(h) to P(x)

Text states: “It has been theorized that this is how humans make and revise their beliefs.” 

Page 5 and 6

Text at the top states: “It’s possible to get one head on ten spins of a fair coin! Though the experimenter might say ‘Hey, this is biased!’.”

Text continues at the top of the opposite page: “But… there is always a chance of a false positive -- concluding the coin is biased when it isn’t.”

Below, three figures are seen spinning coins.

Text at the bottom states: “Repeating the experiment can tell you whether this conclusion itself was just due to chance.”

Back Cover - BayesThere is a train with the person in the window, and the train is spouting out smoke. In the smoke are the words: “New data.”

Text at the bottom of the panel says: “The observed data and model can also be combined to tell us about what new data would look like from our fitted model.” 

A rectangle surrounds the following text: “The posterior predictive distribution becomes a data generating process.”

Page 7

The smoke from the train on the Back Cover of the Bayes-ic zine expands to this page and becomes a histogram of data generated from the posterior predictive distribution, labeled “Replicated data under model.” This is compared to a histogram of observed data, shown below. A person asks: “Is the model in question a good fit for my data?” A curved arrow points toward both the histogram labeled “replicated data under model” and “observed data”, with the word “compare” next to the arrow.

Page 8 

The title text states: “Some viz softwares can generate line-ups.” The word “line-ups” is underlined.

A stick figure with curly hair and glasses faces a lineup of stick figures holding plots. The first is labelled “observed data”, while the other two are labeled “data from a reference distr.” The lineup is labelled: “Can you guess which plot is of the observed data?” The stick-person points to the observed data and declares “This is my data!”

Two bullet points follow. The first: “If you’re right, this is evidence that the data is not consistent with the ref.” And below it, the second: “If you’re wrong, then we can’t reject its consistency.” 

Page 9

The title text states: “Traditionally, the reference is noise.” Below this are three plots, labeled “observed,” “noise,” “linear,” and “posterior predictive,” from left to right, respectively. Each plot shows a corresponding set of points, except the “posterior predictive” plot, which displays the front end of a locomotive.The word “noise” from the title is underlined and points to the plot labeled "noise.” 

The text continues: “But, we could use any ref. model and let people compare them. Therefore, we could perform a graphical model check!” The words “any ref. model” are underlined and arrows point to the final two plots in the array above. This is followed by a single bullet point that reads: “Plotting your data against samples from the reference points to discrepancies, inspiring more thinking about the data generating process, and more plots.”

Page 10 

Text at the top reads: “People may be doing Bayesian model checks in their head”

A stick figure with glasses is next to a thought bubble that says: “A linear relationship would look like:”, followed by a plot of data points rising linearly from the origin in the same thought bubble. Next to this, are two copies of the stick-person’s head, one on top thinking: “...and then my data looks like”, with a plot of data points rising roughly linearly from the origin in the same thought bubble. The second copy of the stick-person’s head is below and says “So, it looks linear!”

Text at the bottom reads: “We can specify models of human graphical inference and compare their predictions to what humans get out of EDA via model checks.”

Page 11

Text at the top reads: “The more we know about how a person would respond to a plot, the better we can design viz software.”

A dashed line separates this text from an example below.

Text reads: “E.g. Galaxy data #42, two different choices of color bar.” This is followed by a long, skinny rectangle -- the color bar -- labeled “a” at the left end and “b” at the right end. It is shaded from white to black from a to b.

Below this are two versions of a galaxy image, made distinct by their own color bar. In the lefthand image are two galaxies that do not seem to be interacting, and the color bar is labeled from 0 to 1. A stick figure below points to the image and says, “Nothing to see here, folks!” In the righthand image, the color bar is now labeled from 0.2 to 0.5, so we can see many more features including tidal debris between and around the galaxies. A stick figure below points to the image and says, “Woah! Check out that debris field!”

A dashed line separates the example from the rest of the text.

Text continues, “For example, viz software can learn that the color bar is highly influential to the human’s perception and can offer the user to try different settings. But that’s not the only thing we are interested in…”

Page 12 &13

Title text goes across both pages, and reads: “Viz software can help us see uncertainty”

A stick-person wearing glasses is shown with three sequential speech bubbles, each with a graph below it: 

the first says “A line seems reasonable” is above a plot labeled “Observed data” with data points decreasing as the x axis increases, with a line through the apparent middle that’s labeled “line of best fit”; 

the second says “Is it still good if I resample?” which is above a plot labeled “Resampled data” with new data points behaving in a similar way as before, and both the line from the last plot (labeled “old line”) and a new line of best fit (labeled “new line”) are shown; 

and finally, the final speech bubble says “Great! The slope looks stable!”, which is above a plot labeled “Observed data”, the original data points and line of best fit, in addition to a shaded-in region between two curves representing the uncertainty in the best fit line. 

Below, a pop-up window labeled “viz software message” is shown with the message: “Looks like you are checking on X and Y. Do you want to check how it is affected by Z?” Next to this window is a graph labeled “Observed data distinguished by group (Z)”, with half of the data from before drawn in circles near the top left-hand part of the graph, and the other half drawn as x’s toward the bottom right-hand part of the graph. Each group has its own line, drawn pointing in different directions from each other and the original line of best fit. The stick figure person clasps their head in joy, and says “Thanks viz software, you get me!”

Page 14

The title text asks: “What do we need out of viz software and how can we get there?”

Four figures sit around a table, labelled the “interdisciplinary round table”, one with curly hair, one with hair combed to the side, one with messy hair, and one with no hair.

The first says “We need to be able to easily plot data drawn from a variety of reference distributions.”

The second says “So, we’ll need to be able to easily generate this data.”

The third says “Then we’ll need to be able to easily specify the distributions.”

The final person says “We need a grammar mapping data generating processes to visualized data!”

Page 15

The title text asks: “Are you the one to take on this viz design challenge?” The word “you” is underlined twice, and the word “challenge” is written larger in bubble letters.

There is a checklist with a pencil beside it. Items on the checklist include “represent uncertainty,” “specify reference model,” “see predictions from” with an arrow pointing to the previous entry, and “graphical customization.” 

Back Cover

The title text is: “This zine is brought to you by HDSR (Harvard Data Science Review)” The word “zine” is written larger and in block letters as on the front cover.

Below, there are four stick figures with the faces of the four authors of this zine: Kelly Blumenthal, Aleksandrina Goeva, Sara Stoudt, and Ana Trisovic.
Connections
1 of 7
Comments
1
Abdul Rozak:

Plotting data is a useful tool for analyzing and interpreting data, and it allows us to better understand the patterns and relationships within the data.

We plot data for several reasons:

  1. To visualize relationships: Plotting data allows us to see patterns and relationships that may not be apparent from just looking at a table of numbers. For example, a scatter plot can show us the relationship between two variables, and a line graph can show us how a variable changes over time.

  2. To identify outliers: Outliers are data points that are significantly different from the others. Plotting data can help us identify outliers and investigate why they might be different.

  3. To communicate results: Graphs and charts are often easier to understand and more visually appealing than tables of numbers. By presenting data in a graphical format, we can communicate results more effectively to others.

  4. To make predictions: By analyzing the patterns and trends in data plots, we can make predictions about future outcomes.

  5. To make comparisons: Plotting data from different groups or categories side by side allows us to compare them and identify differences or similarities.

Regards: Ozza Konveksi