Computer Science 180
Data Structures

Programming Assignment 04

Due: Monday, 5 March 2007, 8pm

Please make sure you adhere to the policies on academic integrity.

Please see the general programming webpage for details about the programming environment for this course, guidelines for programming style, and details on electronic submission of assignments.

The files you may need for this assignment can be downloaded here.

Collaboration Policy

For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

Please make sure you adhere to the policies on academic integrity in this regard.

Overview

Our goal for this assignment is to gain experience with the concept of a linked list while working outside of the more complete list abstraction provided by STL. Chapter 4.5 of our text introduces the concept of a singly-linked and doubly-linked list, portrayed based on the concept of a node structure. Typically, each node is implemented as its own structure and stored independently somewhere in memory (where they system handles the memory management).

We will take a very low-level approach to linked lists, based upon the presentation in which they are embedded directly into the contents of an underlying array in memory. For singly-linked lists, this method was demonstrated by an Applet in which each "node" of the list was represented by two cells of the array, the "element" and the "next" reference. The same idea can be used to represent a doubly-linked list, however using three consecutive array cells for each "node" of the list (previous pointer, element, next pointer).

The precise convention for ordering those three pieces of information is up to the designer, but must remain consistent. In the following example, we chose to use the three ordered as (prevPtr, element, nextPtr) and our "pointers" are specifically the index of the corresponding element cell. With a singly-linked list, it was imperative that we have external knowledge of where the list starts ("the head"). For a doubly-linked lists, we typically want to have that information as well as the corresponding "tail" of the list. In the following example, the head element is at cell 7 and the tail at cell 10. That said, here is a dump of memory contents for a sample list.

Array Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Array Contents 13 S 10 18 X 18 18 E 13 1 Y 18 7 A 1 18 X 18

Array Index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
Array Contents	13	S	10	18	X	18	18	E	13	1	Y	18	7	A	1	18	X	18

The contents of the list, when displayed naturally as a sequence, appears as: E <--> A <--> S <--> Y Notice that there are some portions of the memory which are not currently being used within this list. Also, you will notice that for this example we used the value 18 as a "NULL" pointer, namely because that was clearly identifiable as an invalid index for this sized array.

This assignment provides us with an opportunity to learn a larger lesson about memory management within programming languages. One of the benefits of using a linked list data structure, versus an array-based data structure, is that there is no a priori capacity constraint placed on the data structure. Of course, there really is some eventual limit on available memory for any true computer system (though perhaps it is quite large).

Memory for a system can be viewed as one large, consecutive array of memory cells. When programming in C++, or most other high-level languages, the systems takes care of most of the details of managing this memory. For example, when a program issues a command, new Node(), the system determines the available memory cells which can be used for storing that node. For this programming assignment, you will be doing the low-level memory management. We will have you allocate an (admittedly small) array of memory at the onset, and to pretend that this is the full extent of the memory which may be used. You will write code for maintaining a doubly-linked list directly within an array of memory cells.

A Text Editor

As a sample application of linked lists, you will be implementing an editor class which serves as internal data structure which might support a very basic text editor. At a high-level, the state of the editor consists of a sequence of characters and a particular position within that sequence which we denote as a "cursor". Most of the public behaviors either move the cursor or edit the text at or near the cursor (or both). To be consistent with editors (and with C++'s conventions), there is a position for each character of the actual text as well as one symbolic position, denoted as "end", which represents the position after the final actual character.

Brief documentation on the editor class design follows below (more detailed documentation is on a separate page).

Public Member Functions
	editor (size_t maxText=10)
	constructs a new editor instance.
void	forward ()
	Advances the cursor, if not already at the end.
void	backward ()
	Move cursor backward, if not already at the beginning.
void	begin ()
	Moves cursor to the beginning.
void	end ()
	Moves cursor to the end.
void	assign (char c)
	Reassigns character at current cursor position.
void	insert (char c)
	Inserts a new character immediate BEFORE current cursor position.
void	erase ()
	Erases the character at the current cursor position (if any).
std::string	toString () const
	Generates a string based on the implicit contents of the editor.
void	rawDump (std::ostream &out) const
	Create an appropriate vertical dump of your entire memory.
Private Attributes
std::vector< size_t >	memory
	We will use a vector of size_t objects as our model of memory.
size_t	head
	The head of the linked list.
size_t	tail
	The tail of the linked list.
size_t	cursor
	The cursor position within the text.

Our Advice

Using a vector to represent memory.

Though we could use a raw array to represent memory for this assignment, we strongly suggest the use of a vector. We still want to have a fixed capacity, which should be initialized at the time the editor is constructor.

The advantage of using a vector is that it performs error-checking on the index when you rely upon the memory.at(i) method for access rather than indexing as memory[i]. This will be quite helpful for this assignment, as there is great risk of accidentally using an index which is not actually part of the dedicated memory.

Also, you will notice that we are declaring a vector with size_t as the data type for elements of the vector. In our earlier example, we used indices into memory to represent the "pointers" from node to node, yet we used characters to represent the data. In our implementation, we are taking advantage of the flexibility in converting between those underlying types. If you are using a memory location which you intend to treat as an index, you may simply use statements such as

memory.at(i) = j;

If you want to store a character into a given memory cell, you may simply use the syntax

memory.at(i) = c;

However when you retrieve a value from memory, the presumption is that it is a size_t value. So if you were to write:

cout << memory.at(i);

it would be displayed as a number. If you really want to use it as a character, for example when printing it or when adding it to a string, you must explicitly cast it to a char using a syntax such as

cout << char(memory.at(i));

Be careful near the extremes.

Section 4.5 of The book gives initial examples of the general principle when inserting into or deleting from a doubly-linked list. Those examples show what we think of as the "typical" case, when the change is being made to the interior part of a presumably long list.

It turns out that the trickiest cases often end up being when manipulations take place very near the head or tail of the list. They later revisit those cases in Section 4.7, but even then they somewhat sidestep the issue by relying upon separate methods such as push_back and pop_back when recognizing such a case.

When writing your own code, think very carefully about cases that cause the head or tail of the list to change.

Memory management.

For this assignment, a vector of the appropriate size can be requested as the editor is initialized and the contents of that vector can be set as you wish. From that point on, all of the memory management should be done by you, implicitly. That is, you should not call new or delete when you are adding or removing nodes from the list.

When the insert method is called, it is up to you to find a block of three available cells in the underlying memory. The key will be for you to differentiate between those cells which are in use and those cells which are available. One approach is by making sure to mark unused notes with a recognizable "prev" and "next" values. Then you could scan through all possible blocks looking for such a configuration (of course you must also make sure that it is not the head of a list with one node). Since our editor has a fixed capacity, you should explicitly throw a bad_alloc exception if the user attempts to add a character when the editor is already at full capacity. Note: this exception is thrown with the syntax throw bad_alloc(); the constructor does not accept a string message, as is the case for other exceptions.

When deleting a node from a list, you will also want to make sure to "deallocate" it by this convention so that it will later be recognized as a vacancy.

Raw dump

You will notice that we included a method called, rawDump, in the editor design. When something goes wrong with your code (which is quite likely during development), it may be very helpful to "look into" memory and see precisely what is there. You might then compare that view to your observations.

The format is up to you, though we have in mind something like the following textual portrayal of the memory configuration shown earlier in this assignment description.

0: 13
1: S
2: 10
3: 18
4: X
5: 18
6: 18
7: E
8: 13
et cetera

This kind of a method is really a debugging tool. It would never be a public method in a production-quality class, but it may prove worthwhile for you and so we are requiring it (it also will help us when grading).

Use of the Driver

test_editor is a text-based, menu-driven program for testing your editor implementation. The menu allows you to call any combination of supported behaviors.

The menu options are as follows:

(f)   forward
(b)   backward
(<)   begin
(>)   end
(a)   assign value
(i)   insert
(e)   erase
(t)   toString

(d)   raw dump

(n)   reinitialize new editor
(q)   quit

The driver will prompt for any additional information and automatically report the return value it receives from your code or any exceptions that were raised, when appropriate. By default, the program takes input from the keyboard. However the driver has an optional feature which allows it to read input from a file rather than from a keyboard. The file should have the identical characters which you would use if you were typing them yourself on the keyboard. This feature simply allows you to test and retest your program on a particular input without having to retype the input each time you want to run the program. It is also convenient at times when running with a debugger to have the input read from a file rather than from the keyboard. If the program reaches the end of the file without exiting, then it will revert back to reading input from the keyboard for the remainder of the program.

To use this feature, you must specify the exact filename (including any suffix), as a single argument at runtime. Details on how to provide such arguments are given in the general programming webpage.

Here is a sample input file, appropriately formatted (by the way, if your program is implemented correctly, the three strings printed by this sample should be "lid", "lied", "linked").

Files We Are Providing

All such files can be downloaded here, or can be copied on turing by typing

cp -Rp ~goldwasser/public/csci180/Program/editor .

Files to Submit

Source Code
Please submit your version of editor.h and editor.cpp. These are the only files which you should be modifying.
Test Input
Please submit a single file, inputfile, which can be used with the driver discussed above for testing everyone's code.

Your file may use at most 100 commands. If your input file contains more than the prescribed number of commands, we will simply truncate the test.

Though our driver supports rawDump, you may not use this as part of your official tests, because there is no requirement that everyone's internal configuration be the same; just the outward behaviors.
Readme File
For each assignment, you are to submit a separate file "readme" as specified in the general programming webpage.

Grading Standards

The assignment is worth 10 points. Eight points will be awarded based on our own evaluation of your assignment and the readme file. One additional point will be awarded fractionally based on how well your program performs on other students' test inputs. The final point will be awarded fractionally based on how well your test input fools other students' flawed programs.

Extra Credit (2 points possible)

Preface

We do not want your extra credit attempt to possibly interfere with the correctness of the required program. For this reason, the files editor.h and editor.cpp should be used only for the required portion.

If you wish to attempt either part of the extra credit, please submit separate files named, editorExtra.h and editorExtra.cpp. You should copy your originals to serve as a model and you should still name the class editor within those files. (you will, however make sure to change your editorExtra.cpp file so that it explicitly includes editorExtra.h rather than editor.h).

Once you have done this, you may build the extra-credit version of the driver by typing make extra, which produces a separate executable, test_extra.

First Challenge (1 point)

We want you to revisit the issue of memory management, and specifically how you locate "available" space when needed for an insertion. In the standard version of this assignment, we allowed you to scan through the memory looking for such availability. That technique is unnecessarily inefficient, requiring linear time in the worst case.

A better approach is to keep track of all the currently available nodes in what is known as a free list. Essentially, we intentionally link all available locations together into a secondary linked list (using the same conventions). Then by keeping track of one end of the free list, we can always find an available cell in O(1) time (presuming that capacity is not entirely exhausted).

Second Challenge (1 point)

Provide support for cutting and pasting portions of the text. In particular, the original outline we provided for the editor class includes three additional functions, setMark, cut and paste. If done well, each of these methods can be executed in O(1) time.

Setting the mark records the current cursor position and remembers that mark until a subsequent cut takes place.

When cut is called (assuming that the mark has been previously set), the portion of text from the mark to the current cursor should be removed. More specifcally, there are two cases. If the current cursor is to the right of the initial mark, then the operation should remove a slice of the text, starting with the mark going up to but not including the cursor position. Alternatively, if the current cursor is to the left of the initial mark, the removed slice should start at the cursor and go up to but not including the mark. Also in this second case, the resulting cursor should be set to the location of the mark (it cannot remain unchanged because the character that had been at the cursor was just cut).

When a portion is cut, it should be preserved internally in what we will call the "cut buffer" (you should leave the nodes linked to each other and remember the beginning and end of that deleted slice). If the user later calls "paste", the most recently deleted slice should be re-inserted immediate BEFORE the current cursor position (and the cut-buffer set back to empty; you will not support pasting multiple copies).

When a deleted slice is in the cut-buffer, this inherently reduces you maximum capacity. That is okay. For the sake of this assignment, you may throw a bad_alloc whenever an insert is attempted while all nodes of memory are being used either on the true text or the cut buffer.

However, if the user performs a cut at a time when the existing cut-buffer is non-empty, you should explicitly place all nodes of that previous cut-buffer back onto the free list to avoid a "memory leak". Notice that because those nodes are already linked to each other, you should be able to perform this step in O(1) time as well!. Then the newly cut piece can be preserved within the cut-buffer.

Further documentation on the expected semantics is embedded directly in the source code. If you have any questions, just ask.

Michael Goldwasser

CSCI 180, Spring 2007
Last modified: Monday, 05 March 2007

Saint Louis University

Computer Science 180
Data Structures

Michael Goldwasser

Spring 2007

Dept. of Math & Computer Science

Programming Assignment 04

Due: Monday, 5 March 2007, 8pm

Contents:

Collaboration Policy

Overview

A Text Editor

Public Member Functions

Private Attributes

Our Advice

Using a vector to represent memory.

Be careful near the extremes.

Memory management.

Raw dump

Use of the Driver

Files We Are Providing

Files to Submit

Grading Standards

Extra Credit (2 points possible)

Preface

First Challenge (1 point)

Second Challenge (1 point)

Computer Science 180 Data Structures

Spring 2007

Programming Assignment 04

Due: Monday, 5 March 2007, 8pm

Contents:

Public Member Functions

Private Attributes

Using a vector to represent memory.

Be careful near the extremes.

Memory management.

Raw dump

Extra Credit (2 points possible)

Preface

First Challenge (1 point)

Second Challenge (1 point)

Computer Science 180
Data Structures