Notes on Chapter 2 of Elmasri & Navathe (5th ed.)

Still Under Construction

2.1 Data Models, Schemas, and Instances

One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are irrelevant to database users.

A data model —a collection of concepts that can be used to describe the conceptual/logical structure of a database— provides the necessary means to achieve this abstraction.

By structure is meant the data types, relationships, and constraints that should hold for the data.

Most data models also include a set of basic operations for specifying retrievals/updates.

Object-oriented data models include the idea of objects having behavior (i.e., applicable methods) being stored in the database (as opposed to purely "passive" data).

According to C.J. Date (one of the leading database experts), a data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data; the operators allow us to model its behavior.

In the relational data model, data is viewed as being organized in two-dimensional tables comprised of tuples of attribute values. This model has operations such as Project, Select, and Join.

A data model is not to be confused with its implementation, which is a physical realization on a real machine of the components of the abstract machine that together constitute that model.

Logical vs. physical!!

There are other well-known data models that have been the basis for database systems. The best-known models pre-dating the relational model are the hierarchical (in which the entity types form a tree) and the network (in which the entity types and relationships between them form a graph).

Categories of Data Models (based on degree of abstractness):

2.1.2: Schemas, Instances, and Database State

One must distinguish between the description of a database and the database itself. The former is called the database schema, which is specified during design and is not expected to change often. (See Figure 2.1, p. 32, for schema diagram for relational UNIVERSITY database.)

In a typical database, frequently changes are made to data items stored therein. The data in the database at a particular time is called the state of the database, or a snapshot.

Application requirements change occasionally, which is one of the reasons why software maintenance is important. On such occasions, a change to a database's schema may be called for. An example would be to add a Date_of_Birth field/attribute to the STUDENT table. Making changes to a database schema is known as schema evolution. Most modern DBMS's support schema evolution operations that can be applied while a database is operational.


2.2 DBMS Architecture and Data Independence

2.2.1: Three-Schema Architecture: (See Figure 2.2, page 34.) This idea was first described by the ANSI/SPARC committee in late 1970's. The goal is to separate (i.e., insert layers of "insulation" between) user applications and the physical database. C.J. Date points out that it is an ideal that few, if any, real-life DBMS's achieve fully.

Users (including application programs) submit queries that are expressed with respect to the external level. It is the responsibility of the DBMS to transform such a query into one that is expressed with respect to the internal level (and to transform the result, which is at the internal level, into its equivalent at the external level).

Example: Select students with GPA > 3.5.

Q: How is this accomplished?
A: By virtue of mappings between the levels:

Data independence is the capacity to change the schema at one level of the architecture without having to change the schema at the next higher level. We distinguish between logical and physical data independence according to which two adjacent levels are involved. The former refers to the ability to change the conceptual schema without changing the external schema. The latter refers to the ability to change the internal schema without having to change the conceptual.

For an example of physical data independence, suppose that the internal schema is modified (because we decide to add a new index, or change the encoding scheme used in representing some field's value, or stipulate that some previously unordered file must be ordered by a particular field). Then we can change the mapping between the conceptual and internal schemas in order to avoid changing the conceptual schema itself.

Not surprisingly, the process of transforming data via mappings can be costly (performance-wise), which is probably one reason that real-life DBMS's don't fully implement this 3-schema architecture.


2.3 Database Languages and Interfaces

A DBMS supports a variety of users and must provide appropriate languages and interfaces for each category of users.

DBMS Languages

The above description represents some kind of ideal. In real-life, at least so far, the de facto standard DBMS language is SQL (Standard Query Language), which has constructs to support the functions needed by DDL, VDL, and DML languages. (Early versions of SQL had features in support of SDL functions, but no more.)

2.3.1 DBMS Languages

menu-based, forms-based, gui-based, natural language, special purpose for parametric users, for DBA.

2.3.2 DBMS Interfaces


2.4 Database System Environment

See Figure 2.3, page 41.


2.5 Centralized and Client/Server Architectures for DBMS's
2.6 Classification of DBMS's

Based upon