Course Home | Documentation | Lab Hours/Tutoring | Projects | Quizzes | Schedule | Submit

Saint Louis University

Computer Science 1050
Introduction to Computer Science: Multimedia

Michael Goldwasser

Spring 2016

Dept. of Math & Computer Science

Programming Project 4

Data Visualization

Due: 11:59pm Monday, April 11, 2016


Contents:


Overview

For this assignment, you are to create an interactive visualization of some publicly available data set. You have great flexibility in the design of your project, however we will outline some formal requirements below, as well as offering advice on procuring a data set.

Your visualization could be in the form of a typical graph (e.g., bar chart, pie chart, line graph, scatter plot), or a combination of such representations, but it may also be more abstract, as with our word clouds, or more geometric or geographic in nature.


Collaboration Policy

For this assignment, you are allowed to work with one other student in developing a single piece of software to submit. If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

It is vital that both students contribute to the development of the project. Please make sure you adhere to the policies on academic integrity in this regard.


Technical Requirements

  1. Your project must be markedly different in content and style from examples we have provided in class. Thus your data set should involve something other than stock prices, ethnicity, or word frequency. In terms of visualization methodology, do not simply make a static pie chart of a data set or use our existing word cloud approach. However, you are welcome to use those styles as aspects of a more complex visualization if that is what best suits your illustration.

  2. Your sketch must include some form of user interaction. Examples of such interactions might include display of pop-up information when mousing over a portion of the graphics or using controls to alter the displayed slice of a data set (e.g., moving back and forth through a time series). However, you are free to develop your own form of user interaction with your visualization.

  3. You must use publicly available data and provide a citation to the source(s) as part of your submitted readme. Furthemore, it must be that your program retrieves the raw data using automation, either by downloading directly from an accessible URL or loading from a locally stored file. (This means that the raw data must not be transcribed directly within your source code.)

    In an ideal world, your program should be able to retrieve the data from the original source, without you having to manually customize the format of the data set. However, you are permitted to do some general transformations of the entire data set to be amenable to the features of common Processing functions for loading data (e.g., loadStrings, loadTable, loadXML, loadJSONArray). For example, if you find a data set that is originally an Excel file, you are welcome to export that information to a comma-separated format (csv), which can be more easily parsed by loadStrings or loadTable. Also, you are welcome to use an external tool to take a relevant slice of a larger data set, such as pulling Missouri data from a larger national data set (although it would still be better if you have a way to automate such filtering rules from within your Processing script).

  4. Your graphics must be designed with some sense of proper scalability. This pertains both to properly using the available width and height of the canvas, and making sure that your layout and size of graphical features are appropriate to the values in a particular data set (e.g., having the scale of a graph axes relative to the range of values that occur in a data set)

  5. Avoid embedding "magic" numbers directly in your source code if the relevant knowledge can be reasonably inferred in automated fashion.

    For example, if graphing prices for a certain stock, you should not hardcode the vertical scale based on your own knowledge that all prices were between 20 and 30. Instead, a program should determine the relevant minimum and maximum prices by examining the data and then determining the appropriate mapping of the scale to the window. Similarly, do not hardwire your code based on knowledge that your data set covers a range of 23 years. Instead, determine the number of years when parsing the input and use that information as a variable for the rest of the program

    One exception to this rule: You are allowed to make directly use of the general format for your data set. For example, it may be that you know that you need to pull specific data from column 4 and column 7 of the raw data set. You are welcome to rely on that type of domain knowledge, as it may be impossible to otherwise infer such knowledge. I will note that I still strongly recommend defining such magic constants by name in your program, such as

    int ZIP_CODE_COL = 4;

Beyond these more technical requirements, feel free to experiment. Many of our examples uses a rather "clinical" style. Feel free to be more artistic/creative. For example, if using data about urban development, rather than a bar chart that uses rectangles for bars, feel free to use an image of a skyscraper, where the height of the skyscraper varies to reflect the height of a "bar".


Procuring a Data Set

If you have a particular idea in mind, you can probably search directly for data sets that pertain to that domain. You may benefit from finding structured data sets that will be easy to load, such as looking for 'csv' files (comma-separated values), or 'tsv' (tab-separated values).

If you want to browse a wide variety of data sets, here are some valuable resources:

Depending on the format of your data set, you may need to do some extra reading on processor libraries (e.g., loadStrings, loadTable, loadXML, loadJSONArray).

If you end up with a data set and wish to visualize it relative to a U.S. map, please see the Processing documentation on loadShape and getChild. You may find existing SVG files for various maps, such as a blank US Map at commons.wikimedia.org/wiki/File:Blank_US_Map.svg


Grading Standards

The assignment is worth 70 points, which will be assessed as follows:

In addition to the above points, we will happily award extra credit points to those who go significantly beyond the minimal project requirements.


Submitting Your Assignment

Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.

For this project, we ask that you submit the following artifacts. If you prefer, you are welcome to create a single zip file of all materials and submit that zip file.


Michael Goldwasser
Last modified: Sunday, 10 January 2016