Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Introduction to Trees

A "tree" is a discrete structure that serve as important model for a variety of hierarchical data sets that need to be represented and processed in computer science (and in bioinformatics). Here are just a handful of common uses of trees in modeling data:

File System

Organizational Chart

Parse Tree

Family Tree

Phylogeny


Terminology

CS termDescriptionBiologists term
nodeA representation of a single entity within the tree
edgeA connection representing a relationship between two nodes
rootThe (topmost) node from which an entire tree eminates
parentThe immediate ancestor of a node in a tree
external node (leaf)a node without any subsequent childrentip
internal nodea node without any subsequent childrennode
childThe immediate descendants of a node in a tree
ancestorAny of the nodes "above" a given node (i.e., toward the root)
descendantAny of the nodes "below" a given node (i.e., away from the root)
brancha path between an ancestor and one of its descendants
subtreethe portion of a tree including a node and all of its descendantsclade


Tree Representation

Trees are inherently recursive and so our representation of trees, and our functions for processing trees, will also be recursive.

To represent trees, we will begin by considering a special class of trees known as binary trees in which each internal node of a tree has precisely two children. (Although the techniques we use can typically be extended to more general trees with arbitrary branching factors.)

We choose a relatively simple representation using Python's tuples (this is a built-in structure that is similar to a list, but immutable). The basic format used is a triple,

(label, firstsubtree,  secondsubtree)
By convention, we will use a representation where if a node doesn't have any children, we will use empty tuples, such as
('C', (), ())

By this convention, a phylogenetic tree that might be represented graphically as


would be represented by the recursive structure

('A',
     ('B', (), ()),
     ('C', (), ())
)
although to Python, the whitespace and indentation is not actually important in this context, so this could actually be viewed more streamlined as:
('A', ('B', (), ()), ('C', (), ()) )
or even without the spaces as
('A',('B',(),()),('C',(),()))
As a more complex example, the following tree

would be represented by the recursive structure
('A',
     ('B',(),()),
     ('C',
          ('D',
               ('E',(),()),
               ('F',(),())
          ),
          ('G', (), ())
     )
)
which is equally valid in Python as
('A', ('B',(),()), ('C', ('D', ('E',(),()), ('F',(),()) ), ('G', (), ()) ) )

Michael Goldwasser
Last modified: Sunday, 03 March 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring