Lecture notes chapter 6: Logical database design

Chapter 6: Logical database design and the relational model

Logical database design is process of transforming the conceptual data model (as in chapter 3) into logical data model. The relational data model is most commonly used.

The objective of logical database design is to translate the conceptual design (reminder -- which represents an organization's requirements for data) into logical database design that can be implemented on a chosen database.

The relational data model

First introduced in 1970 by E.F. Codd.

System R was the prototype

Ingress was the first academically oriented RDBMS (consuming and painfully slow)

Today there are many RDBMS: Oracle, Sybase, Microsoft sequel, FoxPro, etc. -- illicit examples

The relational model is based on mathematical theory, so it is well understood.

The relational data model consists of the following three components:

1. Data structure -- data are organized in the form of tables with rows and columns.

2. Data manipulation -- operations are used to manipulate the data stored in the relations (SQL).

3. Data integrity -- allows for data integrity to be maintained as specified in the business rules.

Relational data structure -- page 209

A relation (or table) is a named, two-dimensional table of data. Each relation consists of a set of named columns and arbitrary number of unnamed rows. An attribute of (as defined in chapter 3) is a named column of a relation. Figure 6 -- 1 is an instance of the employee one relation. It defines structure rather than data. EMPLOYEE1( Emp_ID, Name, Dept_name, Salary)

Relational keys. The primary key is an attribute (or composite) that uniquely identifies each row in a relation. In chapter 3 this is termed an identifier.

Foreign key is an attribute in the relation that serves as a primary key of another relation in the same database.

Primary and foreign keys are powerful relational tools and a database that both define business rules and maintain integrity. The better defined the better your database and your data will become.

**** Try to constrain everything ***

Properties of relations -- page 210

Relations have several properties, which distinguish them from nonrelational tables:

Each relation or table has a unique name in the database. Within a user's schema, Oracle enforces this rule.
An entry at the intersection of each row and column is atomic. There are no multivalued attributes in a relation. See figure 6-2.
Each row is unique; note to rows and a relation are identical.
Each attribute or column within a table has a unique name. Oracle enforces this rule in a table.
The sequence of columns is insignificant.
The sequence of rows is insignificant.

Go over page 211 figure 6-2. Discuss why it might not be the best solution.

A relational database consists of any number of relations. The structure is as described through the use of a conceptual schema, and there are two common methods for expressing one: short text statements, or graphical representation. Go over figure 6 -- 3.

Integrity constraints -- page 213

The relational data model has several types of constraints whose purpose is to maintain the accuracy and integrity of the data in the database. They are scratch that the types of constraints are: domain constraints, entity integrity constraints, referential integrity constraints, and operational constraints.

Domain constraints

A domain is the set of values that may be assigned to an attribute. The domain definition usually consists of the following: name, meaning, datatype, size, and allowable values or range (if applicable). This translates into the column name, datatype, size, and check constraint (if applicable). Go over examples of check constraints in a table.

Entity integrity constraints, or primary keys

The entity integrity rule is designed to ensure that every relation has primary key, and that the data values for the primary key are valid -- in particular that every primary key attribute is not NULL.

Entity integrity also means that when a particular attribute or column cannot be assigned a data value when the row is created, that it be assigned a null value which indicates the absence of a value. Two situations why this might occur:

1. The column data value is not known when the row is created.

2. The column data value may be computed or filled in at a later point in time.

Illustrate why using null values instead of blanks are important.

In fact go over the example of characters to nulls.

The entity integrity rule states that no primary key attribute may be NULL.

Referential integrity constraints, or foreign keys -- page 214

In the relational data model associations between tables are defined through the use of foreign keys. Note that primary keys must exist before the foreign key may be entered. Illustrate through use of two tables.

A referential integrity constraint is a role that maintains consistency among the rows of two relations or tables. The foreign key in one relation must match the primary key in another relation, else it must be NULL.

Operational constraints -- page 215

In chapter 4 but worth mentioning. Operational constraints are business rules that cannot be enforced using the above relational or check constraints. There are other ways to enforce these rules -- and Oracle they are called triggers.

Normalization -- Page 232

Normalization is a process of composing relations with anomalies to produce smaller, well-structured relations. It is a formal process for deciding which attributes should be grouped together in a relation or table. Previously we learned how to map entity relationship diagram to relations. Before the physical design of the database, normalization will allow us to validate a logical design, such that it satisfies certain constraints, which avoid unnecessary duplication of data.

Normalization is performed in stages, each which corresponds to a normal form (figure 622 Page 234). A normal form is a state of a relation that results from application of simple rules regarding functional dependencies to the relation. These are the normal forms that are currently used in practice:

1. First normal form -- any multivalued attributes have been removed.

2. Second normal form -- any partial functional dependencies have been removed.

Third normal form -- any transitive dependencies have been removed.

Functional dependencies and keys -- Page 235

Normalization is based on the analysis of functional dependencies. A functional dependency is a constraint between two attributes or two sets of attributes. Go over the four examples on Page 235; Social Security number is one I think is bad because the address is not functional dependent.

The attribute on the left-hand side of the arrow in a functional dependency is called a determinant. If

A candidate key is an attribute, or combination of attributes that uniquely identifies a row in a relation. A candidate key must satisfy the following properties:

Unique identification -- for every row value of the key must uniquely identify that row.
No attribute in the key may be deleted without destroying uniqueness of the key.

Go over example at the bottom paragraph for employee 1 and employee to on Page 236.

Note that employee two has a composite key for the primary key.

Note that a candidate key is always a determinant while a determinant may not be a candidate key.

Basic normal forms -- Page 237

First normal form

The relation is in first normal form if it contains no multivalued attributes. Example figure 6 --2A.

Second normal form

A relation is in second normal form if it is in first normal form and every non- key attribute is fully functionally dependent on the primary key. A relation that is in first normal form will be in second normal form if any of the following are true:

The primary key consists of only one attribute.
No non-key attributes exist in the relation (all primary key).
Every non-key attribute is functionally dependent on the full set of primary key attributes; there are no partial functional dependencies.

A sample Page 238 decomposes a relation, employee two, to satisfy second normal form.

Third normal form

If a relation isn't third normal form if it is in second normal form and no transitive dependencies exist.

The transitive dependency is in a relation is a functional dependency between two or more non-key attributes. Easy sample on figure 6 -- 24 -- Page 238, 6 -- 25 on Page 239.

Go over the insertion deletions and modification anomalies.

See how nice a third normal form figure 6 -- 25 makes things.

Merging relations -- Page 241

There may be a need that some point in a project to merge relations -- a typical need is to create a view of two or more relations. You must understand the meaning of the data in each relation.

There are four problems, which might arise in view integration.

Synonyms are two or more attributes having different names but the same meaning.

Example is two tables with primary keys to same the named differently due to differing department views.

Homonym is when an attribute might have more than one meaning.

Transitive dependencies are combining to third normal form relations and a dependency arises from the combination.

Supertypes we are not discussing.