It may help to first review how quantities of memory/storage are measured. A bit (which is contraction for "binary digit") is the smallest unit of data, a single 0 or 1. Recall that computers represent data (of all kinds, including numbers, characters, images, and audio) using 0's and 1's. A byte is a unit of eight bits. A kilobyte (KB) is 210 (approximately one thousand, or 103) bytes, a megabyte (MB) is 220 (approximately one million, or 106) bytes, a gigabyte (GB) is 230 (approximately one billion, or 109) bytes, and a terabyte (TB) is 240 (approximately one trillion, or 1012) bytes.
One could reasonably ask the question
The premise of the question is that the programs that a computer executes, and the data that those programs manipulate, must have some physical manifestation on some kind of storage medium (or, plural, media) that is part of the computer system (or accessible by it, at least). This gives rise to the next question:
The answer may be a bit more involved than you would expect, because there are a perhaps surprisingly large number of different kinds of storage devices. The outline below seeks to identify these and to provide a logical way of organizing them.
In order to keep the electronic circuitry of the processor at a reasonable level of complexity, the number of registers is quite small, typically no more than a few dozen.
Some processors have multiple levels of cache (usually referred to as L1 and L2, for example), with L1 being faster (access within a few clock cycles) but having lower capacity (e.g., tens of KB) than L2 (access to which requires tens of cycles).
The term "random" is meant to suggest that the time required to access any particular memory location in RAM is independent of which memory location was accessed most recently. (This is in contrast to accessing the "data" on a VHS or audio cassette tape, which are "sequential" (rather than "random") storage devices. Suppose, for example, that a VHS tape is fully rewound; then to get to the fifth hour of video stored on that tape, you must fast forward past the first four hours. On the other hand, if the tape were already at the beginning of the fourth hour, you could get to the fifth hour by fast forwarding past only one hour of video.
Regarding the interplay between cache and RAM: Roughly speaking, whenever the CPU needs to fetch the data occupying some particular memory cell in RAM, first it looks in cache to see if a copy is already there. If so, it accesses that copy in a fraction of the time that would have been required to access the corresponding cell in RAM. If not, it accesses the desired cell of RAM; also, anticipating that that same cell of RAM will need to be accessed again in the near future, the CPU copies that cell's contents (as well as that of a block of neighoring cells) into cache (replacing some block of data items that hasn't been accessed recently).
The introduction of cache is a relatively new development, motivated by the fact that (as processor and memory technology has advanced over the years) the ratio between the time needed to transfer data between RAM and a register and the time needed to perform an operation on data (that is necessarily already in a register) has been steadily growing, to the point where, without cache, the CPU would be spending the vast majority of its time waiting for data to be transferred between RAM and registers. This phenomenon is sometimes referred to as the processor-memory bottleneck.
The term transitory can be used to describe the kinds of main memory listed so far. This term is apt because their intended purpose is not to store anything for long, but rather to provide fast access to data (and instructions) currently being used (i.e., related to applications currently running and whatever data they are using). Because there is no need to store data in main memory permanently (see exception below), and because it is cheaper to do so, registers, cache, and RAM are designed to be volatile, meaning that, absent a constant application of electrical power, they lose their contents relatively quickly.
Among the types of secondary storage media are these:
As for why there are so many varieties of storage devices, it boils down mostly to considerations of cost, mobility (removability), and advances in technology.
As a general rule of thumb (and not surprisingly), the cost of memory/storage (in dollars per unit of storage) varies with the "speed" of the storage device: the faster the device, the higher the cost (per MB). For example, main memory costs much more than an equal quantity of space in secondary storage, by a factor in the hundreds. (In early 2008, space on a hard disk cost about $0.25 per GB, but RAM was $25 per GB, making RAM about 100 times as expensive.) Hence, even if RAM were designed to be non-volatile (and hence suitable for storing data on a long-term basis) it would be prohibitively expensive to replace hundreds of gigabytes of hard disk storage space with an equal quantity of RAM. (Do the calculation: 100GB of RAM would cost about $2500 compared to $25 for 100GB of hard disk space. Hence, substituting non-volatile RAM for hard disk would increase the cost of a PC substantially!)
On the other hand, if, in an effort to minimize storage costs, we reduced main memory to a bare-bones level, we would find that performance would suffer terribly, because it would be necessary to store/retrieve data onto/from secondary storage more often, and the ratio between access times to secondary storage and main memory is on the order of hundreds of thousands to one.
What is virtual memory?
The purpose of RAM is to store the programs that are currently running and the data that those programs are processing, so that instructions and data needed by the CPU can be copied into registers (where the CPU can actually make use of them) quickly.
However, it often happens that RAM is not large enough to fit all running programs and their data. To alleviate this situation, many modern operating systems implement virtual memory, which basically means that some secondary storage (typically some segment of a hard disk) is used for holding portions of programs/data that, logically speaking, are considered to be in RAM. Whenever the CPU attempts to fetch an instruction or data item from virtual memory, first it must be confirmed that that instruction or data item is actually in RAM. If it is, access proceeds normally. If not, whatever "page" of virtual memory holds the desired item must first be "swapped" into RAM (from the disk), replacing some page of data that was already there, which itself gets written to the disk. The effect of virtual memory, then, is to, in a sense, make RAM seem much larger than it actually is.
The relationship between RAM and virtual memory is analagous to that between cache and RAM.
Among the benefits of virtual memory is that it gives programmers the freedom to develop programs without worrying too much about how much memory a program (or its data) will occupy. From the user's point of view, it makes it possible to run lots of programs simultaneously without having to worry about the machine "crashing" due to a lack of space in RAM.
Sometimes, when virtual memory is being stretched to its limit, page swapping becomes so frequent that system performance degrades very badly. (Note that swapping one page for another is a very time-consuming operation, which, in the context of an electronic digital computer, can mean a duration of as little as a hundredth or thousandth of a second.) This is referred to as thrashing. Of course, the larger RAM's capacity, the less will be the need for swapping and hence the less likely that thrashing will occur. If bouts of thrashing are adversely affecting you, don't buy a faster CPU. Rather, install more RAM!!