A "tree" is a discrete structure that serve as important model for a variety of hierarchical data sets that need to be represented and processed in computer science (and in bioinformatics). Here are just a handful of common uses of trees in modeling data:
CS term | Description | Biologists term |
---|---|---|
node | A representation of a single entity within the tree | |
edge | A connection representing a relationship between two nodes | |
root | The (topmost) node from which an entire tree eminates | |
parent | The immediate ancestor of a node in a tree | |
external node (leaf) | a node without any subsequent children | tip |
internal node | a node without any subsequent children | node |
child | The immediate descendants of a node in a tree | |
ancestor | Any of the nodes "above" a given node (i.e., toward the root) | |
descendant | Any of the nodes "below" a given node (i.e., away from the root) | |
branch | a path between an ancestor and one of its descendants | |
subtree | the portion of a tree including a node and all of its descendants | clade |
Trees are inherently recursive and so our representation of trees, and our functions for processing trees, will also be recursive.
To represent trees, we will begin by considering a special class of trees known as binary trees in which each internal node of a tree has precisely two children. (Although the techniques we use can typically be extended to more general trees with arbitrary branching factors.)
Our textbook recommends a relatively simple representation using Python's tuples (this is a built-in structure that is similar to a list, but immutable). The basic format used is a triple,
(label, leftsubtree, rightsubtree)By convention, we will use a representation where if a node doesn't have any children, we will use empty tuples, such as
('C', (), ())
By this convention, a tree that might be represented graphically as
B / A \ Cwould be represented by the recursive structure
('A', ('B', (), ()), ('C', (), ()) )