Motivation

This course enables you to transform data into persuasive and evidence-based visualizations. Visualizations are persuasive if they motivate actions in an intended audience. Visualizations are evidence-based if they are reproducible, functional, and truthful.

This course introduces and discusses the fundamental design principles and visualization technology that allows you to design, implement, and critique persuasive and evidence-based visualizations. In a data-rich environment, where decision-makers often drown in data but thirst for insight 1, mastering this course equips you with a moderate level of data literacy.

Data literacy is the ability to interpret, construct, and convey arguments through the functional and truthful visual presentation of data. Data literacy is a vital skill in our data-driven world. The chances are high that you will be interpreting and designing data visualizations throughout your career. The level of data literacy offered through this course allows you to establish a competitive advantage in Silicon Valley and the global marketplace.

Learning objectives

You will learn to design and implement visualizations and critique the persuasiveness and evidence of visualizations. Upon successful completion of this course you will:

  • Understand the conceptual and technological fundamentals of data visualization.
  • Analyze and critique the persuasiveness and evidence of data visualizations.
  • Implement persuasive and evidence-based visualizations.

Course Logistics

Textbooks

There is no required textbook for this class. Data visualization is still a fluid topic that is covered in arts, design, and technology. You will find a lot of “conventional wisdom” out there (including in the books below). Please consume information with a critical mind.

I consider the following books as the my ‘common core’ of contemporary data visualization:

The philosophy of data visualization:

  • Tufte (2001): The Visual Display of Quantitative Information, Graphic Press.
    • This book is the classic introduction to visual data representations.
    • This book is available as hard copy in the SCU library.
  • Tufte (2006): Beautiful Evidence, Graphic Press.
    • This book contains showcases that illustrate the thinking behind high-quality data visualizations.
    • This book is available as hard copy in the SCU library.

The concepts of data visualization:

  • Cairo (2012): The Functional Art: An Introduction to Information Graphics and Visualization, New Riders.
    • This book provides you with the conceptual background of data visualization.
    • This book is available online in the SCU library.
  • Cairo (2016): The Truthful Art: Data, Charts, and Maps for Communication, New Riders.
    • This book is the sequel to the first one and focuses on the ‘truthful’ part.
    • This book is available online in the SCU library.

The technology of data visualization:

  • Munzner (2014): Visualization Analysis and Design, CRC Press.
    • This book offers a technical introduction into the elements of effective data visualization.
    • This book is available online in the SCU library.
  • Grolemund & Wickham (2017): R for Data Science, O’Reilly. (in particular chapter 3 on data visualization)
    • Everything you need to know for working with R.
    • This book is available online
  • McKinney, W (2012): Python for Data Analysis, O’Reilly. (in particular chapter 9 on plotting and visualization)
    • Everything you need to know for working with the Pandas library and Python.
    • This book is available online in the SCU library.
  • VanderPlas (2016): Python Data Science Handbook, O’Reilly. (in particular chapter 4 on visualization with mathplotlib)
    • A complete walk through of Data Analysis in Python.
    • This book is available online.

The practice of data visualization:

  • Nussbaumer Knaflic (2015): Storytelling with data. Wiley.
    • This book focuses on the use of data visualization in the professional environment.
    • This book is available online in the SCU library.
  • Wexler, Shaffer, Cotgrave (2017): The Big Book of Dashboards, Wiley.
    • This is an excellent collection of dashboards and the reasoning behind them.

Technology

The hands-on elements in this course use Tableau Desktop, Jupyter/Python, R, and D3.js. I believe that experiencing data visualization through a variety of technology helps you to identify commonalities and important differences.

  • Tableau Desktop is an analytics platform for enterprise data.
  • The Jupyter/Python environment helps you to gather, clean, wrangle, and visualize data in a reproducible fashion.
  • The R environment also helps you to gather, clean, wrangle, and visualize data in a reproducible fashion.
  • D3.js is a JavaScript library for creating data products for the web.

In the class meetings, we will work with all of these technologies. In the assignments, you may choose your favorite set of technologies. In your assignments, you are also free to use other technologies. Please discuss such plans with me prior to using other technologies.

Learn how to install these technologies in the FAQ document of this course (accessible only with SCU ID). If you run into any issues, please write your issue or question also in the FAQ document of this course (accessible only with SCU ID).


PLEASE NOTE: I expect you to have Tableau Desktop, Jupyter/Python, and R Studio installed on your laptop.


It is my goal to spend the classroom time on conceptual and hands-on issues of data visualization. Therefore, we will only spend a minimal amount of time explaining how to use Tableau, Python, R, and JavaScript. I will introduce and explain all the code we need for the class meetings. If you want to use a technology in your assignments but do not know it yet, the following resources will help you to get up to speed.

Tableau:

  • https://www.tableau.com/learn/training - This is a great resource to get answers on “how-to” questions.
  • Stirrup et al. (2016): Tableau: Creating Interactive Data Visualizations, Packt. - This book is available online in the SCU library.

Python, Jupyter, R, and D3.js:

  • https://jakevdp.github.io/WhirlwindTourOfPython - A great introduction into Python
  • Rossant, C (2015): Learning IPython for interactive computing and data visualization, Packt. This book is available online in the SCU library.
  • Lander (2017) R for Everyone: Advanced Analytics and Graphics - A great introduction into R
  • Meeks (2015): D3.js in Action, Manning.

Communication

I am committed to your learning success. Please feel free to contact me with any questions regarding this course. If I am not able to help you myself, I will forward your request to someone who can.

  1. If you have general questions about course material, assignments, etc. please write them into this FAQ document (accessible only with SCU ID).
  2. Before you write an email, please read and comment in the FAQ document (accessible only with SCU ID).
  3. If you send me an email that contains questions of interest to the whole class, I will answer them in the FAQ document (accessible only with SCU ID).
  4. My office hours are Mondays and Wednesdays from 5:00pm to 6:00pm. Please make an appointment here. I am also available after each class.
  5. Please make an appointment whether you want to meet during office hours or outside of my office hours. A meeting request must have a specific agenda. I am available via phone, zoom, or face-to-face.
  6. I post all course material, course information, announcements, and updates on Camino. On Camino, you will also find the class recordings. Please make sure that your correct email address is listed in Camino so that you do not miss important information.
  7. I maintain a class log (accessible only with SCU ID) that contains all the links, resources, and whiteboard drawings that I use or create during the class meetings.

Class Meetings

Class meetings are Saturdays, 8:30 AM to 11:15 AM in Lucas Hall 210.

This course is centered around a practical and reflective approach to data visualization. Each class meeting will have the same structure:

  • 5 min Housekeeping & Problem Setting
  • 30 min Design principle (What are effective means to visualize data and why?)
  • 30 min Technology (How do we visualize data?)
  • 60 min Lab Session (Can you do it?)
  • 30 min Presentation (Can you explain and justify your results?)
  • 10 min Reflection & Wrap up

Each class meeting will focus on one specific design principle in data visualizations and one aspect of visualization technology. During the lab session, you have the opportunity to work on a data visualization case study that allows you to practice the application of the design principle and technology. At the end of each lab session, you may be asked to present your results.

Assignments

“What it boils down to is one per cent inspiration and ninety-nine per cent perspiration.” (Thomas Edison)

Your mastery of the learning objectives will be examined through contributions to a class reader, an individual project, and a team project. There will be no exams.

The following table links the learning objectives of this class with the assignments and shows the maximum number of points that you can achieve with each assignment towards the final grade.

Learning Objective Assignment Max. Points
Understand the conceptual and technical fundamentals of data visualization. Class Reader 30
Analyze and critique the persuasiveness and evidence of existing data visualizations Individual Project 40
Implement persuasive and evidence-based visualizations Team Project 30
Total 100

The final grade distribution is as follows.

Points Letter Grade
100-94 A
>94-90 A-
>90-87 B+
>87-84 B
>84-80 B-
>80-77 C+
>77-74 C
>74-70 C-
>70-0 F

My grading criteria are as follows:

  • A grades (4.0) reflect work that meets all assignment objectives at the highest possible level and sometimes goes beyond that. The submitted work is of superior quality and could be presented to the target audience with no or minimal revisions. Typically, no more than 40% of participants in a course receive an A grade.
  • B grades (3.0) reflect work that meets all assignment objectives at a level that is above average but not exceptional. The submitted work shows high levels of competency and could be presented to the target audience with some editing.
  • C grades (2.0) reflect work that meets all course objectives at an average level but is not exceeding expected standards. The submitted work lacks a clear in-depth understanding of the subject and could be presented to the target audience only with extensive editing. Typically, at least 5% of participants in a course receive a C grade.
  • F grades (0.0) reflect work that does not meet course objectives and is below minimum standards. Submissions are late without prior consultation with the instructor, miss the assignment objectives, or show a clear lack of learning progress. Also, repeated violations of the academic integrity standards result in an overall F grade.

I reserve the right to change the grading to accommodate special circumstances and opportunities. Any changes, however, will be discussed and announced in class and on Camino.

Class Reader

The class reader is the virtual extension of the classroom. You use the class reader to collaboratively develop a deeper understanding of the conceptual and technical fundamentals of data visualization.

Your objective is to contribute in a meaningful way to the class reader on a weekly basis. Meaningful contribution is defined as the following set of activities:

  • Add and annotate a unique reference to the class reader. The annotation should consist of a short summary of the key points, a critical analysis, and a personal reflection.
  • Write a component of the class reader that combines several annotated references in a useful manner.
  • Evaluate and critique an existing component of the class reader.
  • Substantially improve an existing component of the class reader.
  • Organize components into coherent and consistent structures (chapters, sub-chapters, tables, lists).

When contributing to the class reader, make sure that you understand the requirements of academic integrity as outlined below.

The tentative structure of the class reader is as follows:

  1. Fundamentals
    • Theoretical background of data visualization
    • Contemporary research results
  2. Case Studies
    • Description and replication of great examples of data visualization
  3. Patterns
    • Reusable solutions to everyday data visualization questions
    • Applied by multiple members of the course
  4. Ethics
    • Implications of (good and bad) data visualization
    • The role of data visualization in politics, society, and business

In the spirit of great examples of collaborative writing, we use github to organize the writing process. You will use branches, projects, issues, pull requests, and wikis to manage your work efficiently.

I will evaluate your weekly contribution to the class reader based on the following criteria.

Criteria Metrics Max. Points
Quantitative Activity Commits, Additions, Deletions, Issue Handling, Wiki Contributions 1
Qualitative Activity Quality of Content, Arguments, and Reflection 2
Total 3

Additionally, twice in the quarter (on May, 12 and June, 9), I will also evaluate you based on the following criteria.

Criteria Metrics Max. Points
Contributions to Organization Structure, Composition, and Logical flow 2
Contributions to Professionality Style, Form 1
Total 3

I will grade you based on the results in the class reader github repository on Saturday, 12:00pm each week.

Individual Project

You pursue two objectives with the individual project:

  1. Replicate and redesign an existing data product. This will allow you to learn from others, sharpen your critical perspective on data visualizations, reason about design decisions, and attempt to improve data visualizations.
  2. Develop a deceptive data product. This will allow you to sharpen your skills in detecting intentional and unintentional distortions to the data (You may not alter the data when developing your deceptive data product.).

The topic for the individual project is Gun Violence in the United States. We use the following data product: https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts (A PDF version is on Camino in case the website becomes unavailable or changes significantly).

The following table provides an overview of the deliverables for the individual project.

Project Phase Due Max. Points
Problem Statement April, 21 2018 (1:00 PM) 5
First Version “Redesign” May, 5 2018 (1:00 PM) 10
First Version “Deception” May, 19 2018 (1:00 PM) 10
Revised Versions June, 2 2018 (1:00 PM) 15
Total 40

PLEASE NOTE: It is vital for you to start early and discuss intermediate results with me. I will not accept late submission without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.


Problem Statement

The problem statement should outline your initial (and perhaps intuitive) analysis of the data product. Think about the following questions:

  • What does the literature tell you about the topic?
  • Where and why is the data product (in)effective?
  • What is your proposal to redesign the data product?
  • What are potential starting points for a deceptive version and why are they deceptive?
  • Have you obtained access to the mentioned data sources or identified replacements/potentially important additions?

I will evaluate your problem statement based on the following criteria.

Criteria Metrics Max. Points
Content Understandability (0.5), Completeness (0.5) 1
Persuasiveness Clarity (1), Argumentation (1) 2
Evidence Sources (1) 1
Style Professionality (0.5), Originality (0.5) 1
Total 5

First Versions

The first versions of your redesigned data product and your deceptive data product document your prototypes. Both versions should achieve the following:

  • Visualize three aspects of the original data in an interesting, non-trivial, and somewhat unexpected fashion.
  • Document the “Making-of” (Details of your development process, data wrangling steps, your reasoning, detours, literature, etc.)
  • Road-map with future features/enhancements.

I will evaluate both versions based on the following criteria:

Criteria Metrics Max. Points
Finding 1 Persuasiveness (1), Evidence (1) 2
Finding 2 Persuasiveness (1), Evidence (1) 2
Finding 3 Persuasiveness (1), Evidence (1) 2
Making-of Structure (1), Reproducibility (1) 2
Style Creativity (1), Professionality (1) 2
Total 10

Revised Versions

The revised versions of the redesigned data product and the deceptive data product document your individual mastery of the course. Both redesigned versions should substantially improve the first versions and include:

  • The final version of your data product.
  • A documentation of your data product.
  • A “Making-of” including a critique of your first versions that results in a revision.

The final version must be one reproducible and self-contained online deliverable (a Jupyter notebook, an R notebook, an Observable notebook, a webpage, etc.).

I will evaluate the result of this phase based on the following criteria:

Criteria Metrics Max. Points
Improvement Redesign Persuasiveness (1), Evidence (1), Structure (1), Reproducibility (1) 4
Polishing Redesign Application of Class Concepts (1), Professionality (1) 2
Improvement Deception Persuasiveness (1), Evidence (1), Structure (1), Reproducibility (1) 4
Polishing Deception Application of Class Concepts (1), Professionality (1) 2
Integration Creativity (1), Effort (1), Organization (1) 3
Total 15

Team Project

The objective of the team project is to collaboratively develop a data product. A data product tells a complex story using several data visualizations. You will work teams of up to five students. Each team is free to choose a topic of interest. The challenge of a team project is to organize your team, hold one another accountable, and complement your skills and interests. At the end of the team project, your teammates will evaluate your contributions to the project. This may influence your grade for the team project.

The following table provides an overview of the deliverables for the team project.

Project Phase Due Max. Points
Project Statement April, 28 2018 (1:00 PM) 5
Exploratory Data Analysis May, 12 2018 (1:00 PM) 5
First Version May, 26 2018 (1:00 PM) 5
Presentation June, 9 2018 (8:30 AM) 5
Revised Versions June, 14 2018 (11:59 PM) 10
Total 30

PLEASE NOTE: It is vital for you to start early and discuss intermediate results with me. I will not accept late submission without prior notice or without a doctor’s note. I am aware that sometimes life goes crazy but please notify me in advance and we will work it out.


Problem Statement

The problem statement should outline your team project and your project plan. Think about the following questions:

  • What is your area of interest?
  • Why is this an important area?
  • What you want to explore in this area?
  • What are important sources for background information on your area of interest?
  • Do you have a suitable number of data sources to explore your area of interest?
  • What is the general outline for your project?

I will evaluate your problem statement based on the following criteria.

Criteria Metrics Max. Points
Content Understandability (1), Completeness (1) 2
Persuasiveness Clarity (.5), Argumentation (.5) 1
Evidence Sources (1) 1
Style Professionality (0.5), Originality (0.5) 1
Total 5

Exploratory Data Analysis

During the exploratory data analysis your objective is twofold. First, you should collect, clean, and integrate data. Second, you establish a thorough understanding of the content and the limitations of your data.

The exploratory data analysis must:

  • be completely reproducible.
  • documented.
  • free of errors and warnings.

I will evaluate your exploratory data analysis based on the following criteria.

Criteria Metrics Max. Points
Data description Understandability (1), Completeness (1) 2
Data preparation & use Clarity (1), Explanations (1) 2
Style Professionality (1) 1
Total 10

First Version

In this phase you will develop the first version of your data product. You should achieve the following:

  • Develop a narrative that connects at at least three interesting, non-trivial, and somewhat unexpected aspects of your area of interest.
  • Document the “Making-of” (Details of your development process, data wrangling steps, your reasoning, detours, literature, etc.)
  • Road-map with future features/enhancements.

I will evaluate both versions based on the following criteria:

Criteria Metrics Max. Points
Narrative Persuasiveness (1), Evidence (2) 3
Style Creativity (1), Professionality (1) 2
Total 5

Presentation

The in-class presentation should achieve the following:

  • Showcase your improved narrative.
  • Align your narrative with an intended audience.
  • Road-map with future features/enhancements.

I will evaluate both versions based on the following criteria:

Criteria Metrics Max. Points
Progress Persuasiveness (2), Evidence (1) 3
Style Creativity (1), Professionality (1) 2
Total 5

Revised Version

The final data product must be online (e.g., on github) by the deadline. The final data product should consist of two items:

  • One reproducible and self-contained deliverable (a Jupyter notebook, an R notebook, an Observable notebook, a webpage, etc.).
  • A short video (< 90 seconds) that summarizes the key points of your data product.

I will evaluate the revised versions based on the following criteria:

Criteria Metrics Max. Points
Progress Improvements in Problem Statement (1), Improvements in Data Analysis (1), Improvements in Narrative (1), Improvements in Evidence (1) 4
Video Content (1), Effectiveness (1), Originality (1) 3
Professionality Style (1), Structure (1), Polishing (1) 3
Total 10

How to get an A in this course

I firmly believe that mastery of data visualization requires constant practice. You will ace this course if you:

  • Adhere to the academic integrity standards outlined below.
  • Be ready for class meetings, which means you have read the material and prepared for the case study.
  • Participate in the class discussions, ask questions, and share experiences.
  • Support your teammates.
  • Show intermediate results early and often.
  • Start early on the assignments, seek continuous feedback from me and other sources.
  • Continuously think about why you are doing something in your assignments. This is far more important than what you are doing.
  • Answer the ‘boss question’ before submitting any deliverable: Would you be comfortable to send your submission as is to your boss or a recruiter? If your answer is yes, please submit. If your answer is no, revise before you submit.

Course Schedule

The Design Principle column contains the introductory reading for the class meetings.

Week Class Meeting Foundations Design Principle Lab Session Reader Individual Project Team Project
1 April, 14 Introduction Overview Hello, viz world! - - -
2 April, 21 Analytic Design The Visualization Zoo Tableau R1 (3) IP1 (5) -
3 April, 28 Toulmin’s Argumentation Model Grammar of Graphics Tableau R2 (3) - TP1 (5)
4 May, 5 Goal, Question, Metrics (GQM) Data Semantics P4DA R3 (3) IP2 (10) -
5 May, 12 The Audience Model Data Actions Altair R4 (6) - TP2 (5)
6 May, 19 Situational Awareness Marks, Channels & Color R and Tidyverse R5 (3) IP3 (10) -
7 May, 26 The Data Pixel Ratio Rules of Thumb ggplot2 & Shiny R6 (3) - TP3 (5)
8 June, 2 The Truth Continuum Deception Techniques D3.js R7 (3) IP4 (15) -
9 June, 9 Review Review D3.js R8 (6) - TP4 (5)
10 June, 14 - - - - - TP5 (10)
Total = 100 points 30 40 30

Academic Integrity

The Academic Integrity pledge is an expression of the University’s commitment to fostering an understanding of and commitment to a culture of integrity at Santa Clara University. The Academic Integrity pledge, which applies to all students, states:

“I am committed to being a person of integrity. I pledge, as a member of the Santa Clara University community, to abide by and uphold the standards of academic integrity contained in the Student Conduct Code.”

You are expected to uphold the principles of this pledge for all work in this class. For more information about Santa Clara University’s academic integrity pledge and resources about ensuring academic integrity in your work, see www.scu.edu/academic-integrity.

In particular, I expect that you give credit to any material (including but not limited to journal articles, web article, blog posts, images, data sets, and any media) that you have used for completing any assignment in this class. Being able to give credit by referencing sources consistently and correctly is evidence of mastery of a topic. It shows that you are able to construct original arguments that are backed with verifiable evidence. Failing to give credit is a sign of an inadequate learning progress. It shows that you have not understood the topic well enough to formulate your own arguments in relation to already existing ideas.

During your work in this class, you will use, modify, or extend digital content that you have found online. You will also use libraries, APIs, code snippets, and data sets that have been created by others. In every piece of work (presentations, assignments, etc.), you must acknowledge work, source code, data sets, and any other content that was not produced by you. Acknowledgements must be easily identifiable, inseparable from your content, and must not violate licenses.

Failure to provide appropriate acknowledgements will result in an F grade for that assignment. Repeated failure to provide appropriate acknowledgements will result in an F grade for the entire course.

During the first class, we will discuss this digital content policy. After this class, I will strictly enforce this policy. If you have doubts, contact me.

Course Conduct

My responsibility

I will support you in your learning in this class and beyond to the best of my abilities. If I am not able to help you myself, I will identify someone who can. I will evaluate your contribution solely based on the standards set by this syllabus. Changes to the syllabus will be highlighted, discussed during class sessions, and will be published on Camino.

Your responsibility

By enrolling in this class, you agree to the requirements stated in this syllabus. You will operate with integrity in your dealings with me and your fellow students. You will engage the learning materials with appropriate attention and dedication and maintain their engagement when challenged by difficult learning activities. You will contribute to the learning of others and you will perform to standards set by this syllabus.

Mutual respect is the foundation of this course. No one will be criticized for being wrong. Appropriate conduct includes honesty, self-respect, respect for others, and compliance with university policies and standards. Computers in the classroom should be used only for completing course-related work and for taking notes; cell phones must be turned off or muted.

Attendance Policy

Please let me know via email during the first two weeks of the course if you have any conflicts between a course element (class meeting, assignment) and another vital commitment (another course, work, university-related extracurricular activities, religious commitments). At my discretion, I will you provide with alternative means to complete the course element.

I am aware that many of you have multiple commitments. You should attend at least 80 percent of all scheduled class meetings. If you miss more than 20 percent of scheduled classes, you will receive reduction by one letter grade.

University Policies

Disability Resources

If you have a disability for which accommodations may be required in this class, please contact Disabilities Resources (Benson Hall 216, 408-554-4109) as soon as possible to discuss your needs and register for accommodations with the University. If you have medical needs related to pregnancy, you may also be eligible for accommodations. If you have already arranged accommodations through Disabilities Resources, please discuss them with me during my office hours as soon as possible.

While I am happy to assist you, I am unable to provide accommodations until I have received verification from Disabilities Resources. If you are in doubt of whether you are eligible for accommodations, I encourage you to contact Disabilities Resources (Benson Hall 216, 408-554-4109). The Disabilities Resources office would be grateful for advance notice of at least two weeks.

Accommodations for Pregnancy and Parenting

In alignment with Title IX of the Education Amendments of 1972, and with the California Education Code, Section 66281.7, Santa Clara University provides reasonable accommodations to students who are pregnant, have recently experienced childbirth, and/or have medically needs related to childbirth. Pregnant and parenting students can often arrange accommodations by working directly with their instructors, supervisors, or departments. Alternatively, a pregnant or parenting student experiencing related medical conditions may request accommodations through Disabilities Resources (Benson Hall 216, 408-554-4109).

Discrimination and Sexual Misconduct (Title IX)

Santa Clara University upholds a zero-tolerance policy for discrimination, harassment and sexual misconduct. If you (or someone you know) have experienced discrimination or harassment, including sexual assault, domestic/dating violence, or stalking, I encourage you to tell someone promptly. For more information, please consult the University’s Gender-Based Discrimination and Sexual Misconduct Policy at http://bit.ly/2ce1hBb or contact the University’s EEO and Title IX Coordinator, Belinda Guthrie, at 408-554-3043, bguthrie@scu.edu. Reports may be submitted online through https://www.scu.edu/osl/report/ or anonymously through Ethicspoint https://www.scu.edu/hr/quick-links/ethicspoint/

Acknowledgement

This syllabus was inspired by Aleszu Bajak’s syllabus, Jeffrey Shaffer’s data visualization with Tableau course, and earlier versions of CS171 at Harvard.


  1. Loosely based on Naisbitt, J. 1982: Megatrends, Warner Books