Introduction to Trees

A "tree" is a discrete structure that serve as important model for a variety of hierarchical data sets that need to be represented and processed in computer science (and in bioinformatics). Here are just a handful of common uses of trees in modeling data:

File System

Organizational Chart

Parse Tree

Family Tree

Phylogeny

Terminology

CS term Description Biologists term

node A representation of a single entity within the tree

edge A connection representing a relationship between two nodes

root The (topmost) node from which an entire tree eminates

parent The immediate ancestor of a node in a tree

external node (leaf) a node without any subsequent children tip

internal node a node without any subsequent children node

child The immediate descendants of a node in a tree

ancestor Any of the nodes "above" a given node (i.e., toward the root)

descendant Any of the nodes "below" a given node (i.e., away from the root)

branch a path between an ancestor and one of its descendants

subtree the portion of a tree including a node and all of its descendants clade

CS term	Description	Biologists term
node	A representation of a single entity within the tree
edge	A connection representing a relationship between two nodes
root	The (topmost) node from which an entire tree eminates
parent	The immediate ancestor of a node in a tree
external node (leaf)	a node without any subsequent children	tip
internal node	a node without any subsequent children	node
child	The immediate descendants of a node in a tree
ancestor	Any of the nodes "above" a given node (i.e., toward the root)
descendant	Any of the nodes "below" a given node (i.e., away from the root)
branch	a path between an ancestor and one of its descendants
subtree	the portion of a tree including a node and all of its descendants	clade

Representation and Computation

Trees are inherently recursive and so our representation of trees, and our functions for processing trees, will also be recursive.

To represent trees, we will begin by considering a special class of trees known as binary trees in which each internal node of a tree has precisely two children. (Although the techniques we use can typically be extended to more general trees with arbitrary branching factors.)

Our textbook recommends a relatively simple representation using Python's tuples (this is a built-in structure that is similar to a list, but immutable). The basic format used is a triple,

(label, leftsubtree,  rightsubtree)

By convention, we will use a representation where if a node doesn't have any children, we will use empty tuples, such as

('C', (), ())

By this convention, a tree that might be represented graphically as

  B
 /
A
 \
  C

would be represented by the recursive structure

('A', ('B', (), ()), ('C', (), ()) )

Michael Goldwasser

Last modified: Monday, 19 March 2018