Steganography

Computer Science 1050
Introduction to Computer Science: Multimedia

Steganography

Human Perception

Because we use 8 bits for each color channel in images, allowing red, green, and blue components to be expressed as numbers anywhere from 0 to 255, there are more than 16 million distinct colors that can be expressed. In reality, human perception is not likely to be strong enough to differentiate between each of those. In particular, the effective difference between say a red value of 131 and a red value of 132 is negligible.

Lossy Compression

Some common image file formats, such as jpeg, rely on what is known as lossy compression, employing techniques that can shrink the overall size of files, but by carefully trading off some of the accuracy in individual pixel's color values. The stanards allow for tuning the level of tradeoff between compression and quality. Given limitiation on human perception, this is often a reasonable tradeoff to make in the interest of reducing the size of the stored files.

Steganography

Another way we can advantage the precision of a saved image with the inherent lack of precision in the human perception is to encode secret information within the original image. This practice is known as steganography (Greek for "covered writing"), and is described further on Wikipedia and in an article by fellow SLU faculty member, Bryan Clair. This is closely related to a concept of digital watermarking, in which media files can be altered to add a destinctive "signature" that can later be used to recognize or authenticate a copy or derivative piece of work.

Implementation

To use steganography for image files, we must begin by using an image format, such as png, which is guaranteed to maintain a lossless representation of the information. That is, if we set a particular pixel's red component to 157, we need to be sure that when the file is saved and reloaded, that the pixel value will continue to be recorded as 157. Now we can make subtle changes to the precise color choices to encode additional information. As a simple rule, we can choose to let each color component of each pixel represent one bit of information by updating the color component to be an even number to represent bit value 0 and an odd number to represent bit value 1. In this way, a 1024x768 picture can already encode more than 2 million bits of other information, and we could get even more if we were willing to allow greater degredation of quality (not to mention use of the alpha channel).

Once you have a mechanism for encoding bits of information "hidden" within an image, those bits can be used to encode any form of digital information. They might represent another image, an ascii message, numbers, audio, or really any type of file that can be stored digitally (so long as you have enough hidden bits available). Of course, someone who wishes to decode the hidden information must understand precisely what conventions were use to encode the information.

In the remainder of this page, we provide a series of challenges that ask you to pull hidden information out of images using a variety of conventions. In order to do so, you will want to know about a few convenient programming techniques in Processing.

Reading low-order bits using the remainder (modulus) operator
Numbers stored on a computer are already stored as a collection of bits. In particular, nonnegative integers are stored using a binary (base-two) representation. While we are used to decimal (base-ten) representation, in which there is a "one's digit", a "ten's digit", a "hundred's digit", a binary system uses powers of two as the place values (a "one's bit", a "two's bit", a "four's bit", a "eight's bit"). The range of values from 0 to 255 used to represent individual colors in a typical RGB representation are each 8-bit values.

The lowest order bit is the "one's bit". So for example, the number 173 has a binary representation that ends with 1, while the number 172 has a binary representation that ends with 0. In fact, all odd numbers end with a 1 and all even numbers end with a 0. So a simple way to test that last bit is to determine if an integer is odd or even. In Processing, this is often accomplished by using the % operator, which formally computes the remainder when doing a division. The expression k % 2 says to divide k by 2 and return the remainder (which will be either 0 or 1).

Several lower order bits can be retrieved from a number by dividing by other powers of two. For example, expression k % 4 will result in a remainder of either 0, 1, 2, or 3, and those in turn represent the lowest order pair of bits in the original number (respectively, 00, 01, 10, or 11).
Creating numbers from individual bits
If you retrieve individual bits of information, it is often necessary to recompose them into larger chunks of information. A common thing is to use an integer to represent a series of bits. For example, if you have a process that results in the eight bits {0, 1, 1, 0, 0, 0, 1, 0}, there might desire to recompose those into an 8-bit integer (in this example, the value 98).

There are several ways to accomplish this, but one is to consider composing the integer value one bit at a time as follows. If you keep an integer variable tally, and initialize it to 0, then you can add one additional bit b to the rightside of the value using the assignment
```
      tally = 2 * tally + b;
      
```
For example, if you previously had the number represented by bits 1101, which is binary for the decimal value 13, the above assignment computes the value of 11011 as 27, because it is equal to 2*13 + 1 (with the new bit 1 added to the rightside).
Converting between numbers and text characters
The char data type in Processing represents a single character of text. That character is represented using the ASCII encoding scheme. ASCII is an acronym for the American Standard Code for Information Interchange, and it was developed in the 1960s so that different computers could use a common representation for how characters should be represented in binary. Internally, each ASCII character is represented using 8 bits, and we can commonly think of those same 8-bits as numbers from 0 to 255 (as we do with color values). What ASCII defines is simply which character of the alphabet is assigned to which 8-bit pattern. For example, the chracter "Z" is associated with the 8-bit pattern 01011010, which we also associate with the decimal value 90. Lower case "z" is assigned the equivalent of decimal value 122.

Conversion between 8-bit integers and ASCII characters can be performed easily in Processing (as it already does these conversions internally). In particular, if you declare a variable as a char you can assign it to the character assocaited with an integer value, using a syntax such as:
```
      char c = char(122);    // will be 'z'
      
```
In similar regard, if you have a character and attempt to store that character in an integer variable, its ASCII code value is what gets stored.
```
      int j = 'z';    // will store 122
      
```

In-class Challenges

Hidden Bitmaps
In our first challenge, we have hidden 3 secret bitmaps within the following 1024x768 pixel image. One image is hidden as the least-significant bit of the red channel. Another is hidden as the least-signficiant bit of the green channel, and a third is hidden as the least-signficant bit of the blue channel.

Hidden Color Image
In our second challenge, we have hidden a single color picture having equal size. Our mechanism was to use the two least-significant bits of each color channel of each pixel. A single pair of such bits represents a number 0, 1, 2, or 3. For each color channel, we map those values to actual colors of 0, 85, 170, 255, respectively.

Hidden Text
In our third challenge, we have hidden the full text of a Shakespearean play. Our mechanism is as follows. We consider the pixels of the image in row-major order, that is, going row-by-row starting at the top, and within each row from left-to-right. For each pixel, we consider the color components ordered as (r,g,b). We have hidden our text using the least-significant bit of each color component (but we did not choose to use the alpha channel). If you consider the single bit from the red, followed by green, followed by blue, and then continuing in this fashion for the next pixel, this gets a very long stream of bits. If you subsequently break that stream of bits into 8-bit chunks, each 8-bit chunk represents an ASCII characters. Finally, we will note that the play ends after 131,854 characters (we've filled the rest of the image in similar way with space characters).

Note: printing large amounts of text to the processing console might be too taxing on the software. Better is to write the output to a file (although please also make sure that you don't have an infinite loop that writes to a file). In Processing, there is a PrintWriter class that you may declare and initialize as

PrintWriter result = createWriter("sample.txt");

Then you can send output to the file sample.txt one character at a time using a syntax

result.print(ch);

where ch is a character variable. When done, you may do result.close(); to close the file.

Michael Goldwasser

CSCI 1050, Spring 2016
Last modified: Wednesday, 20 April 2016

Saint Louis University

Computer Science 1050
Introduction to Computer Science: Multimedia

Michael Goldwasser

Spring 2016

Dept. of Math & Computer Science

Steganography

Human Perception

Lossy Compression

Steganography

Implementation

In-class Challenges

Saint Louis University

Computer Science 1050 Introduction to Computer Science: Multimedia

Michael Goldwasser

Spring 2016

Dept. of Math & Computer Science

Steganography

Human Perception

Lossy Compression

Steganography

Implementation

In-class Challenges

Computer Science 1050
Introduction to Computer Science: Multimedia