Introduction to Database Design


Major Steps in Database Design

  1. Requirements Analysis: Talk to the potential users! Understand what data is to be stored, and what operations and requirements are desired.

  2. Conceptual Database Design: Develop a high-level description of the data and constraints (we will use the ER data model)

  3. Logical Database Design: Convert the conceptual model to a schema in the chosen data model of the DBMS. For a relational database, this means converting the conceptual to a relational schema (logical schema).

  4. Schema Refinement: Look for potential problems in the original choice of schema and try to redesign.

  5. Physical Database Design: Direct the DBMS into choice of underlying data layout (e.g., indexes and clustering) in hopes of optimizing the performance.

  6. Applications and Security Design: How will the underlying database interact with surrounding applications.


Entity-Relationship Data Model (ER)

  • entity: An entity is a real-world object or concept which is distinguishable from other objects. It may be something tangible, such as a particular student or building. It may also be somewhat more conceptual, such as CS A-341, or an email address.

  • attributes: These are used to describe a particular entity (e.g. name, SS#, height).

  • domain: Each attribute comes from a specified domain (e.g., name may be a 20 character string; SS# is a nine-digit integer)

  • entity set: a collection of similar entities (i.e., those which are distinguished using the same set of attributes. As an example, I may be an entity, whereas Faculty might be an entity set to which I belong. Note that entity sets need not be disjoint. I may also be a member of Staff or of Softball Players.

  • key: a minimal set of attributes for an entity set, such that each entity in the set can be uniquely identified. In some cases, there may be a single attribute (such as SS#) which serves as a key, but in some models you might need multiple attributes as a key ("Bob from Accounting"). There may be several possible candidate keys. We will generally designate one such key as the primary key.

  • ER diagrams

    It is often helpful to visualize an ER model via a diagram. There are many variant conventions for such diagrams; we will adapt the one used in the text.

    diagram conventions

  • An entity set is drawn as a rectangle.

  • Attributes are drawn as ovals.

  • Attributes which belong to the primary key are underlined.
  • diagram example (Figure 2.1 of text)


    Relationships

    A relationship is an association among two or more entities. The relationship must be uniquely identified by the participating entities.

    A relationship can also have descriptive attributes, to record additional information about the relationship (as opposed to about any one participating entity).

    For example, I am an entity, as is the Department of Math/CS. A relationship exists in that I work in that department.

    Similarly, a relationship set is a set of "similar" relationships (the similarity is based on the type of underlying entities involved in each such relationship) For example, if you have an entity set Employees and another entity set Departments, you might define a relationship set Works_In which associates members of those two entity sets.

    In the ER diagrams, we will draw a relationship set is drawn as a shaded diamond:

    Figure 2.2 of text:

    Ternary Relationship Set

    A relationship set need not be an association of precisely two entities; it can involve three or more when applicable. Here is another example from the text, in which a store has multiple locations.

    Using several entities from same entity set

    A relationship might associate several entities from the same underlying entity set, such as in the following example, Reports_To. In this case, an additional role indicator (e.g., "supervisor") is used in the diagram to further distinguish the two similar entities.


    Specifying additional constraints

    If you took a 'snapshot' of the relationship set at some instant in time, we will call this an instance. It can be diagramed separately, as in Figure 2.3 of the text.

    A (binary) relationship set can further be classified as either

  • many-to-many
  • one-to-many
  • one-to-one
  • based on whether an individual entity from one of the underlying sets is allowed to be in more than one such relationship at a time. The above figure contains a many-to-many relationship, as departments may employ more than one person at a time, and an individual person may be employed by more than one department.

    Sometimes, an additional constraint exists for a given relationship set, that any entity from one of the associated sets appears in at most one such relationship. For example, consider a relationship set "Manages" which associates departments with employees. If a department cannot have more than one manager, this is an example of a one-to-many relationship set (it may be that an individual manages multiple departments).

    This type of constraint is called a key constraint. It is represented in the ER diagrams by drawing an arrow from an entity set E to a relationship set R when each entity in an instance of E appears in at most one relationship in (a corresponding instance of) R.

    An instance of this relationship is given in Figure 2.7.

    If both entity sets of a relationship set have key constraints, we would call this a "one-to-one" relationship set. In general, note that key constraints can apply to relationships between more than two entities, as in the following example.

    An instance of this relationship:


    Participation Constraints

    Recall that a key constraint requires that each entity of a set be required to participate in at most one relationship. Dual to this, we may ask whether each entity of a set be required to participate in at least one relationship.

    If this is required, we call this a total participation constraint; otherwise the participation is partial. In our ER diagrams, we will represent a total participation constraint by using a thick line.


    Weak Entities

    There are times you might wish to define an entity set even though its attributes do not formally contain a key (recall the definition for a key).

    Usually, this is the case only because the information represented in such an entity set is only interesting when combined through an identifying relationship set with another entity set we call the identifying owner.

    We will call such a set a weak entity set, and insist on the following:

  • The weak entity set must exhibit a key constraint with respect to the identifying relationship set.
  • The weak entity set must have total participation in the identifying relationship set.
  • Together, this assures us that we can uniquely identify each entity from the weak set by considering the primary key of its identifying owner together with a partial key from the weak entity.

    In our ER diagrams, we will represent a weak entity set by outlining the entity and the identifying relationship set with dark lines. The required key constraint and total participation are diagrammed with our existing conventions. We underline the partial key with a dotted line.


    Class Hierarchies

    As with object-oriented programming, it is often convenient to classify an entity sets as a subclass of another. In this case, the child entity set inherits the attributes of the parent entity set. We will denote this scenario using an "ISA" triangle, as in the following ER diagram:

    Furthermore, we can impose additional constraints on such subclassing. By default, we will assume that two subclasses of an entity set are disjoint. However, if we wish to allow an entity to lie in more than one such subclass, we will specify an overlap constraint. (e.g. "Contract_Emps OVERLAPS Senior_Emps")

    Dually, we can ask whether every entity in a superclass be required to lie in (at least) one subclass. By default we will not assume not, but we can specify a covering constraint if desired. (e.g. "Motorboats AND Cards COVER Motor_Vehicles")


    Aggregation

    Thus far, we have defined relationships to be associations between two or more entities. However, it sometimes seems desirable to define a new relationship which associates some entity with some other existing relationship. To do this, we will introduce a new feature to our model called aggregation. We identifying an existing relationship set by enclosing it in a larger dashed box, and then we will allow it to participate in another relationship set.

    A motivating example follows:


    Conceptual Design with the ER Model

    It is most important to recognize that there is more than one way to model a given situation. Our next goal is to start to compare the pros and cons of common choices.
  • Should a concept be modeled as an entity or an attribute?

  • Consider the scenario, if we want to add address information to the Employees entity set? We might choose to add a single attribute address to the entity set. Alternatively, we could introduce a new entity set, Addresses and then a relationship associating employees with addresses. What are the pros and cons?

    Adding a new entity set is more complex model. It should only be done when there is need for the complexity. For example, if some employees have multiple address to be associated, then the more complex model is needed. Also, representing addresses as a separate entity would allow a further breakdown, for example by zip code or city.

  • What if we wanted to modify the Works_In relationship to have both a start and end date, rather than just a start date. We could add one new attribute for the end date; alternatively, we could create a new entity set Duration which represents intervals, and then the Works_In relationship can be made ternary (associating an employee, a department and an interval). What are the pros and cons?

    If the duration is described through descriptive attributes, only a single such duration can be modeled. That is, we could not express an employment history involving someone who left the department yet later returned.


  • Should a concept be modeled as an entity or a relationship?

    Consider a situation in which a manager controls several departments. Let's presume that a company budgets a certain amount (budget) for each department. Yet it also wants managers to have access to some discretionary budget (dbudget). There are two corporate models. A discretionary budget may be created for each individual department; alternatively, there may be a discretionary budget for each manager, to be used as she desires.

    Which scenario is represented by the following ER diagram? If you want the alternate interpretation, how would you adjust the model?


  • Should we use binary or ternary relationships?

    Consider the following ER diagram, representing insurance policies owned by employees at a company. Each employee can own several polices, each policy can be owned by several employees, and each dependent can be covered by several policies.

    What if we wish to model the following additional requirements:

  • A policy cannot be owned jointly by two or more employees.
  • Every policy must be owned by some employee.
  • Dependents is a weak entity set, and each dependent entity is uniquely identified by taking pname in conjunction with the policyid of a policy entity (which, intuitively, covers the given dependent).
  • The best way to model this is to switch away from the ternary relationship set, and instead use two distinct binary relationship sets.


  • Should we use aggregation?

    Consider again the following ER diagram:

    If we did not need the until or since attributes. In tihs case, we could model the identical setting using the following ternary relationship:

    Let's compare these two models. What if we wanted to add an additional constraint to each, that each sponsorship (of a project by a department) be monitored by at most one employee. Can you add this constraint to either of the above models?


  • Michael Goldwasser