Assignments | Course Home | Documentation | Lab Hours/Tutoring | Schedule | Submit

Saint Louis University

Computer Science 144
Introduction to Computer Science: Multimedia

Michael Goldwasser

Spring 2015

Dept. of Math & Computer Science

Case Study: Word Clouds

For this case study, we use the Word Cloud application from chapter 7 of our text book, however we have chosen to redesign the codebase to use a more direct, procedural style.

Our source code is available as wordCloud.pde (or in printable form as wordCloud.pdf). A zip that includes our input files and stop words is available as wordCloud.zip. Some sample images produced by the software are included below.

Peter Pan Gettysburg Address Obama speech

Future Improvements (in-class activity)

Starting with our original codebase, make any or all of the following changes:

  1. Alter the random placement rule, to ensure that the entire word appears on the screen (not just its chosen anchor point).

  2. For any of the above intersection avoidance rules, require a fixed amount of clearence (measured in pixels) to assure nonzero separation between the placement of two words.

  3. Repeat the above challenge, but instead of requiring a fixed amount of separation between any two words, require that the separation be at least some percentage of the smaller of the two bounding boxes. (This rule can help separate some of the bigger words from each other, while allowing smaller words to later fill the gaps.)

  4. To introduce some variance to the spiral layout, start by picking random non-intersecting locations for the first handful of words and then use spiral rule to lay out the rest.

  5. With probability 0.5, rotate a word 90 degrees before rendering. (Note that with 90 degree rotation, we can still use a bounding box to avoid intersections.)

  6. Introduce a more meaningful use of colors, based either of word frequencies or the chosen geometry. For example, perhaps you could use some other image as a model for choosing colors based on a sampled (x,y) location.

  7. Remove our dependence on the kludgy "scale" global for picking font sizes. Instead, compute an entire layout of a spiral for some scale, determine the bounding box of the overall diagram, and then rerendered the spiral with an automatically computed scale so that it uses the actual sketch dimensions.

For further improvements, we wish to be able to do more pixel-level examination of the rendering of text in the given font (rather than the more broad bounding-box approximations of rendered text. For that reason, we offering a new codebase (version 2.0) which builds a separate PGraphics object for each word that can be examined.

Our source code is available as wordCloud2.pde, or as wordCloud2.zip, which includes our input files and stop words.

  1. Examine the PGraphics object to compute a tighter bounding box for the actual rendering of each word (rather than relying on the generic ascent and descent for the font metric).

  2. Rather than using bounding boxes to avoid intersections, do a more refined pixel-by-pixel examination of whether any pixels in a candidate placement for a word intersect pixels that have previously been used by other words.

  3. The overall scale factor in the current implementation is somewhat of a kludge. The problem is that until we complete a layout such as the spiral, we don't yet know precisely how big the overall image will be. To have the figure automatically scale to the canvas size, consider doing one preliminary layout on an offscreen PGraphics object, compute the bounding box of the used area, and then compute a scale factor for the fontsizes that can be used to redo the final layout.

  4. Experiment with other algorithms (beyond random and spiral) for laying out the text.

  5. Introduce more arbitrary amounts of rotation.

  6. Adapt the random placement to shape the overall diagram to approximate an arbitrary (polygonal) shape. For example, you could get a pyramid shape by chosing random locations, but rejecting those that are outside the shape.

  7. Think of something else to try...


Michael Goldwasser
CSCI 144, Spring 2015
Last modified: Monday, 30 March 2015
Assignments | Course Home | Documentation | Lab Hours/Tutoring | Schedule | Submit