Notes on Chapter 2 of Elmasri & Navathe (5th ed.)

Still Under Construction

2.1 Data Models, Schemas, and Instances

One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are irrelevant to database users.

A data model —a collection of concepts that can be used to describe the conceptual/logical structure of a database— provides the necessary means to achieve this abstraction.

By structure is meant the data types, relationships, and constraints that should hold for the data.

Most data models also include a set of basic operations for specifying retrievals/updates.

Object-oriented data models include the idea of objects having behavior (i.e., applicable methods) being stored in the database (as opposed to purely "passive" data).

According to C.J. Date (one of the leading database experts), a data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data; the operators allow us to model its behavior.

In the relational data model, data is viewed as being organized in two-dimensional tables comprised of tuples of attribute values. This model has operations such as Project, Select, and Join.

A data model is not to be confused with its implementation, which is a physical realization on a real machine of the components of the abstract machine that together constitute that model.

Logical vs. physical!!

There are other well-known data models that have been the basis for database systems. The best-known models pre-dating the relational model are the hierarchical (in which the entity types form a tree) and the network (in which the entity types and relationships between them form a graph).

Categories of Data Models (based on degree of abstractness):

high-level/conceptual: (e.g., ER model of Chapter 7 (Chapter 3 in older editions of the text)) provides a view close to the way users would perceive data; it uses concepts such as
- entity: real-world object or concept (e.g., student, employee, course, department, event)
- attribute: some property of interest describing an entity (e.g., height, age, color)
- relationship: an interaction among entities (e.g., a works-on relationship between an employee and a project; an enrolled-in relationship between a student and a course section)
representational/implementational: intermediate level of abstractness; example is relational data model (or the network model alluded to earlier). Also called record-based model.
low-level/physical: gives details as to how data is stored in computer system, such as record formats, orderings of records, access paths (indexes). (See Chapters 17-18.)

2.1.2: Schemas, Instances, and Database State

One must distinguish between the description of a database and the database itself. The former is called the database schema, which is specified during design and is not expected to change often. (See Figure 2.1, p. 32, for schema diagram for relational UNIVERSITY database.)

In a typical database, frequently changes are made to data items stored therein. The data in the database at a particular time is called the state of the database, or a snapshot.

Application requirements change occasionally, which is one of the reasons why software maintenance is important. On such occasions, a change to a database's schema may be called for. An example would be to add a Date_of_Birth field/attribute to the STUDENT table. Making changes to a database schema is known as schema evolution. Most modern DBMS's support schema evolution operations that can be applied while a database is operational.

2.2 DBMS Architecture and Data Independence

2.2.1: Three-Schema Architecture: (See Figure 2.2, page 34.) This idea was first described by the ANSI/SPARC committee in late 1970's. The goal is to separate (i.e., insert layers of "insulation" between) user applications and the physical database. C.J. Date points out that it is an ideal that few, if any, real-life DBMS's achieve fully.

internal level: has an internal/physical schema that describes the physical storage structure of the database using a low-level data model)

conceptual level: has a conceptual schema describing the (logical) structure of the whole database for a community of users. It hides physical storage details, concentrating upon describing entities, data types, relationships, user operations, and constraints. Can be described using either high-level or implementational data model.

external/view level: includes a number of external schemas (or user views), each of which describes part of the database that a particular category of users is interested in, hiding rest of database. Can be described using either high-level or implementational data model. (In practice, usually described using same model as is the conceptual schema.)

Users (including application programs) submit queries that are expressed with respect to the external level. It is the responsibility of the DBMS to transform such a query into one that is expressed with respect to the internal level (and to transform the result, which is at the internal level, into its equivalent at the external level).

Example: Select students with GPA > 3.5.

Q: How is this accomplished?
A: By virtue of mappings between the levels:

external/conceptual mapping (providing logical data independence)
conceptual/internal mapping (providing physical data independence)

Data independence is the capacity to change the schema at one level of the architecture without having to change the schema at the next higher level. We distinguish between logical and physical data independence according to which two adjacent levels are involved. The former refers to the ability to change the conceptual schema without changing the external schema. The latter refers to the ability to change the internal schema without having to change the conceptual.

For an example of physical data independence, suppose that the internal schema is modified (because we decide to add a new index, or change the encoding scheme used in representing some field's value, or stipulate that some previously unordered file must be ordered by a particular field). Then we can change the mapping between the conceptual and internal schemas in order to avoid changing the conceptual schema itself.

Not surprisingly, the process of transforming data via mappings can be costly (performance-wise), which is probably one reason that real-life DBMS's don't fully implement this 3-schema architecture.

2.3 Database Languages and Interfaces

A DBMS supports a variety of users and must provide appropriate languages and interfaces for each category of users.

DBMS Languages

DDL (Data Definition Language): used (by the DBA and/or database designers) to specify the conceptual schema.
SDL (Storage Definition Language): used for specifying the internal schema
VDL (View Definition Language): used for specifying the external schemas (i.e., user views)
DML (Data Manipulation Language): used for performing operations such as retrieval and update upon the populated database

The above description represents some kind of ideal. In real-life, at least so far, the de facto standard DBMS language is SQL (Standard Query Language), which has constructs to support the functions needed by DDL, VDL, and DML languages. (Early versions of SQL had features in support of SDL functions, but no more.)

2.3.1 DBMS Languages

menu-based, forms-based, gui-based, natural language, special purpose for parametric users, for DBA.

2.3.2 DBMS Interfaces

Menu-based interfaces for web clients or browsing
Forms-based interfaces
GUI's
Natural Language Interfaces
Speech Input and Output
Interfaces for parametric users
Interfaces for the DBA

2.4 Database System Environment

See Figure 2.3, page 41.

2.5 Centralized and Client/Server Architectures for DBMS's

2.6 Classification of DBMS's

Based upon

underlying data model (e.g., relational, object, object-relational, network)
multi-user vs. single-user
centralized vs. distributed
cost
general-purpose vs. special-purpose
types of access path options