The focus on coding and debugging in CS1 has the danger of giving students the false impression that they are the main activities involved in developing software.
Actually, coding and debugging typically occupy only about 20% of the time required to develop "production" software.
Why the difference between the classroom and the real world? Size! (And, as a corollary, complexity.)
Inevitably, the programs that you worked on in CS1 were very small and simple. This is only to be expected, for a beginner cannot be expected to tackle a large, complex program! But in the real world, programs tend to be much bigger (because the problems they solve are much more complex).
See examples in the special interest box on page 4 pointing out the sizes, in source lines of code (SLOC), of several popular software products. Mozilla Web Browser: 300K, Boeing flight control: 5M, Windows XP: 40M, Red Hat Linux: 30M, Debian GNU/Linux : 231M !
Figure 1-1 (page 3) categorizes programs into several sizes (trivial, small, medium, ...) by SLOC and time to develop. (Even 10K is small??!)
Consider the difference between building a birdhouse and building a skyscraper. Or between setting up a lemonade stand and establishing a multi-national corporation. It is analogous to the differences between building a trivial program and a large program: in building a trivial program, you can keep all the implementational details in (the front of) your mind, you can remember what you've finished, what is in progress, and what still needs to be done. You don't need a blueprint or a management plan: You can just start coding (just like you can start hammering to make a birdhouse). And you make corrections/adjustments (when you run into problems) "on the fly".
Unfortunately, developing a large software system (effectively) requires a much more disciplined approach. So, even though it will not be practical for you to develop a large (or even a small one, by the authors' standards) software system in this course, we are going to try to approach the task as though we were.
For anything larger than a very small program, a great deal of preparation is required before coding begins. The analogy is made to building a house, where it would be foolish to begin simply by picking up hammer, nails, and lumber. Rather, you should have a blueprint, and a plan in mind for the order in which various tasks should be carried out.
Also, much work occurs after coding is over, including testing, documentation, and maintenance.
Various experts have different opinions on exactly how to describe the life cycle of software, and in what order things ought to be done, but the general consensus is that the following are the essential activities. (Recall the idealistic waterfall model.)
The goal of this activity is to produce a problem specification (or software requirements) document that describes the intended behavior of the software.
To achieve this, it is necessary to do requirements analysis, which is the task of determining, with some clarity, the problem that the software is intended to solve. (If the eventually-developed software solves the "wrong" problem, it will be of less use than if it solves the "right" problem.)
In a sense, the specification document serves as a contract between the stakeholders (those for whom the software is being built) and the developers.
For various reasons, including the ones listed next, it is often difficult to produce a good specification document.
The stakeholders are incapable of describing their desires accurately (e.g., by being too vague or omitting important details), or they may not agree with each other, or they may change their minds as time goes on.
One technique used to address this problem is rapid prototyping, which refers to quickly building a (typically only partially-functional) model of a potential solution and letting stakeholders observe it or use it. (The most important aspect of the prototype is usually the user interface.) The idea is to get reactions and feedback from the stakeholders, who, through the experience, may come to a better understanding of what they really want.
Visual Basic is a language that's good for building prototypes quickly.
The document must describe how the software behaves even in unexpected/unusual circumstances, which may be greater in number than expected circumstances. (Hence, a significant portion of the document may be devoted to describing what happens in situations that rarely, if ever, arise.)
Natural language (e.g., English) is a poor notation for writing clear, unambiguous specifications.
The alternative is to use a formal specification language, of which many have been designed and studied by mostly academic researchers, but which have not come into widespread use among practitioners. People argue about the reasons for this. Some advocates of formal languages think that practitioners (and their managers) are at fault for being unwilling to invest the time and effort needed to learn how to use a formal notation. Some detractors of formal languages think that they are over-hyped by the academics. Another problem is that very few stakeholders will be in a position to understand a formal language, and hence the (formal) specification document could not serve as a contract between stakeholders and developers.
As an example to illustrate why it is easy, in a natural language, to give an insufficiently precise specification, consider this as a specification for a method that does a search in an array for a given value (adapted from page 8):
Given an array A (containing elements of type T) and a value X of type T, it returns the position, within A, at which X occurs, or, if no such position exists, it returns the position of the element in A that is closest in value to X.
There are several uncertainties here! Among them are that it is not clear what value should be returned if there are multiple occurrences of X in A. Also, depending upon exactly what T is, it may not be clear how to measure the "closeness" of two values, and hence left open is the question of how to decide which value to return in the case that X does not occur in A. For that matter, even if there is a precise meaning of "closeness", there could be two or more distinct values in A that are equally close to X. In that case, which one is to be returned? Finally, what should be returned in the case that the length of A is zero?
Here, we specify the behavior of a collection of software components that will suffice to solve the problem, culminating in a software design document. By a component is meant a self-contained and semi-independent unit of software. In Java, this would typically be a class.
The purpose of this is to help manage and organize the upcoming tasks of detailed design and coding. In effect, the design is a high-level solution whose details must be supplied later.
Inevitably, one uses some strategy by which to divide and conquer the problem; that is, you decompose it into subproblems and assign one (or more) module(s) or components of the software system to deal with each subproblem. A primary goal is to make the components as independent from one another as possible, as this will simplify the remaining work.
But how to do the decomposition into separate components? Traditionally, this was done via top-down design (TDD), in which one decomposes the problem along functional lines. That is, one focuses upon the various actions/activities/functions that the program must carry out and tries to make each component responsible for carrying out one of them (or a set of related ones).
Often a component will be judged to be worthy of further decomposition, and so on and on, so that we end up with a tree of components. (See Figure 1-6, page 11.)
In recent years, there has been a shift to a different approach, referred to as object-oriented design. Here, in figuring out how to decompose the problem, you think not so much in terms of the necessary actions, but rather in terms of the kinds of entities that are involved in the problem and that perform the actions (or have actions/operations applied to them).
In an OO design, each different kind of entity gives rise to a distinct software component called a class. A class acts as a template, or blueprint, for entities of the kind that the class describes. In particular, a class describes the operations that are applicable to entities of that kind (and what other kinds of data must be supplied in order to invoke such an operation). In describing an operation, we should indicate what conditions must be satisfied in order for it to be "legally" invoked (pre-condition) and what conditions it is guaranteed to produce (post-condition) (assuming that it was legally invoked). (The pre- and post-conditions serve as a contract between the class and its clients. For a more thorough discussion, see Wikipedia's entry on Programming by Contract.)
Advocates of OO claim that their way of decomposing a problem tends to lead to components that are much more independent of one another. Each one provides services (via its methods), but the internal details of how they work (or regarding what data items comprise the state of an object) is not needed by their clients.
On the other hand, the TDD approach often leads to software in which "knowledge" of implementation details needs to be spread throughout the components, thereby making it difficult to make changes to the software (due to there being many interdependencies among the components).
See the banking simulation example (Figures 1-5 and 1-6 for the TDD approach and Figure 1-7 for the OO design). (In OO approach, you have classes for Customer, WaitingLine, etc., whereas in TD approach, you have components based upon the various activities that occur (arrival, departure, transaction).
More generally than the section title suggests, this is where we decide how to implement each class. That means choosing data structures that suffice to represent the state of an object of that class and algorithms to carry out the operations (in Java, methods) that are applicable to it.
Of course, we endeavor to organize the data so as to promote efficiency in terms of running time and memory usage (although often these two concerns act against one another). In general, how data is structured has a much larger effect upon performance than do details of coding.
As an example, consider the WaitingLine class identified in the OO design of the banking simulation. One of its operations is to add an item to the end (modeling the event of a customer queueing up at the rear of the line)
In Java, this would be expressed by a method, perhaps with the following specification (adapted from page 14):
/** precondition: none (true)
* postcondition:
* if the line is full, a LineFullException has been thrown;
* otherwise, the specified customer has been put onto
* the end of the waiting line.
*/
public void putAtEnd( Customer c ) throws LineFullException;
The example in the book compares using an array to implement a WaitingLine object versus using a linked list. The linked list will take about twice as much space (well, that depends upon what kind of objects are in the line).
Another point of comparison: If the array is fixed in size, it means that the waiting line is bounded, whereas with a linked list, it is not.
Another one: Suppose the array is extended each time it "overflows". Should we extend it by one element each time? By 10? By doubling the length of the array? (Or multiplying its length by 1.5?)
How about the removeFromFront() method? Should we slide all the elements at positions 1 through n "to the left"? Or use a wraparound idea?
As noted earlier, these are the activities typically emphasized in CS 1 courses.
In this book, the language used is Java. Much of this section is devoted to describing the history and motivation for Java and to tout it as being a "true" and well-crafted OO language, unlike, say, C++, which is the result of grafting OO features to an imperative language, C.
Among the strengths of Java are that its library of standard classes is comprehensive (Java 1.5's standard library has 166 packages including a total of 3279 classes), which promotes software reuse ---the practice of using (or adapting) already-existing software rather than developing it from scratch. Not surprisingly, this leads to higher software productivity, which is a measure of how much software a programmer produces in a given time period).
Software reuse is also encouraged by the very nature of the modules/components that OO programming involves, the class, which is the embodiment of an abstract data type (ADT). Bertrand Meyer argues that one of the most significant advances in software engineering is the merging of the concepts of module and ADT. (That is, each module should correspond to an ADT, which embodies a universe of values and a collection of operations that can be applied to each member of that universe.)
Among the features embodied in Java (and its standard library) are ones that support the following ideas:
Humans are error-prone. This is especially true regarding delicate tasks such as programming that require precision and impeccable logic! Hence, when a piece of software has been written, it is not realistic to assume that it is correct (i.e., works as intended, meets its specification). Faulty software causes problems, ranging from the annoying (such as when a machine has to be rebooted) to the costly (such as the billion dollar Ariane 5 bug described on page 17) to the deadly (such as the Therac machine that overdosed people undergoing radiation therapy for cancer).
It follows that measures should be taken to determine if a piece of software is correct, and, if it is not, to correct it.
One way of doing this is via empirical testing, in which the software is run against a number of carefully crafted input test cases. If the program produces correct results, we have some evidence suggesting that it works correctly, at least in the vast majority of cases. If not, we have found error(s). (Note that exhaustive testing (i.e., testing the program against all possible inputs) is impossible, or at least impractical, in almost all circumstances, simply because the number of possible inputs is huge, or possibly even infinite.)
Dijkstra: "Testing can reveal the presence of errors but cannot prove the absence thereof."
Empirical testing typically occurs at three levels. One is unit testing, which refers to the testing of a single program unit (e.g., a method).
The next is integration testing, in which we test a group of interacting program units (such as a class, or several classes whose objects send messages to each other). Separate groups are then typically put into a larger group and tested. Eventually, all the components are tested as a single group.
The third level of testing is beta testing or acceptance testing, where the software is put into a "realistic" environment (similar to where it is intended to run when finally put "into production").
An orthogonal characterization of testing has to do with whether the test data is chosen according to the specifications only (black box testing), or with knowledge of the code (clear box testing).
A much different approach for attempting to ensure the correctness of software is program verification, which refers to doing mathematical proofs of correctness. Many who advocate this approach (including Dijkstra) recommend that, rather than attempting to do such proofs after the fact, one should use a calculational programming approach in which a program's proof of correctness and its code are developed "hand-in-hand". In fact, the proof often guides the production of code, or helps us to calculate the code.
We can illustrate this by developing a program to compute the sum of the elements in an array. (Need concept of loop invariant.)
In an academic setting, once a program has been completed and graded, it is typically never used again. In the real world, however, programs are used for years, sometimes decades.
It also includes technical documentation, among which is the specification, design document, a description of data structures and algorithms used in the implementation, source code, and a description of the testing and acceptance procedures to which the software was subjected. Much of this stuff is important to maintenance programmers (see below), who, in order to make changes to the existing source code, must first understand it.
The Javadoc facility (available for several other languages, too) makes it easy to produce API documentation that looks exactly like that for Java's standard library.
During the (possibly long) lifetime of a software system, it is likely that changes will be made, either to correct errors that were not found during testing (called corrective maintenance) or to change (or enhance) the behavior of the system in order to meet the changing desires of its users (or the marketplace) (called adaptive maintenance). Another category is perfective maintenance, which refers to changes made simply to improve the software (e.g., make it run faster, use less memory, or simplify its structure).
The main point of the chapter is that coding is just one aspect of software development, which also includes specification, design, planning, testing, documentation, and support. Figure 1-15 (page 30) lists approximate percentages of time spent on these various activities.