1. Preliminaries
This chapter introduces the basic elements
of a C++ program. We will use simple examples to show the structure of C++
programs and the way they are compiled. Elementary concepts such as constants,
variables, and their storage in memory will also be discussed.
The
following is a cursory description of the concept of programming for the
benefit of those who are new to the subject.
Programming
A digital computer is a useful tool for
solving a great variety of problems.
A solution to a problem is called an algorithm;
it describes the sequence of steps to be performed for the problem to be
solved. A simple example of a problem and an algorithm for it would be:
Problem: Sort
a list of names in ascending lexicographic order.
Algorithm: Call
the given list list1; create an empty
list, list2, to hold the sorted list.
Repeatedly find the ‘smallest’ name in list1,
remove it from list1, and make it the
next entry of list2, until list1 is empty.
An algorithm is expressed in abstract
terms. To be intelligible to a computer, it needs to be expressed in a language
understood by it. The only language really understood by a computer is its own machine language. Programs expressed in
the machine language are said to be executable.
A program written in any other language needs to be first translated to the
machine language before it can be executed.
A
machine language is far too cryptic to be suitable for the direct use of
programmers. A further abstraction of this language is the assembly language which provides mnemonic names for the
instructions and a more intelligible notation for the data. An assembly
language program is translated to machine language by a translator called an assembler.
Even
assembly languages are difficult to work with. High-level languages such as C++
provide a much more convenient notation for implementing algorithms. They
liberate programmers from having to think in very low-level terms, and help
them to focus on the algorithm instead. A program written in a high-level
language is translated to assembly language by a translator called a compiler. The assembly code produced by
the compiler is then assembled to produce an executable program.
A Simple C++ Program
Listing 1.1 shows our first C++ program, which when run, simply outputs the
message Hello World.
1
2
3
4
5
|
#include <iostream.h>
int main (void)
{
cout
<< "Hello World\n";
}
|
Annotation
1 This
line uses the preprocessor directive #include to include the contents
of the header file iostream.h in the program. Iostream.h is a standard C++ header file and contains definitions for input
and output.
2 This
line defines a function called main. A function may have zero
or more parameters; these always
appear after the function name, between a pair of brackets. The word void appearing between the
brackets indicates that main has no parameters. A function may also have a return type; this always appears before the function name. The
return type for main is int (i.e., an integer number). All C++ programs must have exactly one main function. Program execution
always begins from main.
3 This
brace marks the beginning of the body of main.
4 This
line is a statement. A statement is
a computation step which may produce a value. The end of a statement is always
marked with a semicolon (;). This statement causes the string
"Hello World\n" to be sent to the cout output stream. A string
is any sequence of characters enclosed in double-quotes. The last character in
this string (\n) is a newline character which is similar to a carriage return on a
type writer. A stream is an object which performs input or output. Cout is the standard output
stream in C++ (standard output usually means your computer monitor screen). The
symbol << is an output operator
which takes an output stream as its left operand and an expression as its right operand, and causes the value of the latter
to be sent to the former. In this case, the effect is that the string "Hello World\n" is
sent to cout, causing it to be printed on the computer monitor screen.
5 This
brace marks the end of the body of main. ¨
Compiling a Simple C++ Program
Dialog 1.1 shows how the program in Listing 1.1 is compiled and run in a typical UNIX environment. User input
appears in bold and system response in plain. The UNIX command line
prompt appears as a dollar symbol ($).
Dialog
1.1
1
2
3
4
|
$
CC hello.cc
$
a.out
Hello World
$
|
Annotation
1 The
command for invoking the AT&T C++ translator in a UNIX environment is CC. The argument to this command
(hello.cc) is the name of the file which contains the program. As a
convention, the file name should end in .c, .C, or .cc. (This ending may be
different in other systems.)
2 The
result of compilation is an executable file which is by default named a.out. To run the program, we
just use a.out as a command.
3 This
is the output produced by the program.
4 The
return of the system prompt indicates that the program has completed its
execution.
The
CC
command accepts a variety of useful options. An option appears as -name, where name is the name of the option
(usually a single letter). Some options take arguments. For example, the output
option (-o) allows you to specify a name for the executable file produced by
the compiler instead of a.out. Dialog 1.Error!
Bookmark not defined.
illustrates the use of this option by specifying hello as the name of the
executable file.
Dialog
1.2
1
2
3
4
|
$
CC hello.cc -o hello
$
hello
Hello World
$
|
Although
the actual command may be different depending on the make of the compiler, a
similar compilation procedure is used under MS-DOS. Windows-based C++ compilers
offer a user-friendly environment where compilation is as simple as choosing a
menu command. The naming convention under MS-DOS and Windows is that C++ source
file names should end in .cpp. ¨
How C++ Compilation Works
Compiling a C++ program involves a number
of steps (most of which are transparent to the user):
First, the C++ preprocessor goes over the program text
and carries out the instructions specified by the preprocessor directives
(e.g., #include). The result is a modified program text which no longer contains
any directives. (Chapter 12 describes the preprocessor in detail.)
Then, the C++ compiler translates the program code.
The compiler may be a true C++ compiler which generates native (assembly or
machine) code, or just a translator which translates the code into C. In the
latter case, the resulting C code is then passed through a C compiler to
produce native object code. In either case, the outcome may be incomplete due
to the program referring to library routines which are not defined as a part of
the program. For example, Listing 1.1 refers to the << operator which is actually defined in a separate IO library.
Finally, the linker completes the object code by
linking it with the object code of any library modules that the program may
have referred to. The final result is an executable file.
Figure 1.1 illustrates the above steps for both a C++ translator and a C++
native compiler. In practice all these steps are usually invoked by a single
command (e.g., CC) and the user will not even see the intermediate files generated.
¨
Variables
A variable is a symbolic name for a memory
location in which data can be stored and subsequently recalled. Variables are
used for holding data values so that they can be utilized in various
computations in a program. All variables have two important attributes:
A type which is established when the variable is defined (e.g.,
integer, real, character). Once defined, the type of a C++ variable cannot be
changed.
A value which can be changed by assigning a new value to the
variable. The kind of values a variable can assume depends on its type. For
example, an integer variable can only take integer values (e.g., 2, 100, -12).
Listing 1.2 illustrates the uses of some simple variable.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
#include <iostream.h>
int main (void)
{
int workDays;
float workHours, payRate, weeklyPay;
workDays
= 5;
workHours
= 7.5;
payRate
= 38.55;
weeklyPay
= workDays * workHours * payRate;
cout
<< "Weekly Pay = ";
cout
<< weeklyPay;
cout
<< '\n';
}
|
Annotation
4 This
line defines an int (integer)
variable called workDays, which will represent the number of working days in a week. As a
general rule, a variable is defined by specifying its type first, followed by
the variable name, followed by a semicolon.
5 This
line defines three float (real) variables which, respectively, represent the work hours per
day, the hourly pay rate, and the weekly pay. As illustrated by this line,
multiple variables of the same type can be defined at once by separating them
with commas.
6 This
line is an assignment statement. It
assigns the value 5 to the variable workDays. Therefore, after this statement is executed, workDays denotes the value 5.
7 This
line assigns the value 7.5 to the variable workHours.
8 This
line assigns the value 38.55 to the variable payRate.
9 This
line calculates the weekly pay as the product of workDays, workHours, and payRate (* is the multiplication
operator). The resulting value is stored in weeklyPay.
10-12 These
lines output three items in sequence: the string "Weekly Pay = ", the
value of the variable weeklyPay, and a newline character.
When run, the program will produce the
following output:
Weekly Pay = 1445.625
When
a variable is defined, its value is undefined
until it is actually assigned one. For example, weeklyPay has an undefined value
(i.e., whatever happens to be in the memory location which the variable denotes
at the time) until line 9 is executed. The assigning of a value to a variable
for the first time is called initialization.
It is important to ensure that a variable is initialized before it is used in
any computation.
It
is possible to define a variable and initialize it at the same time. This is
considered a good programming practice, because it pre-empts the possibility of
using the variable prior to it being initialized. Listing 1.3 is a revised version of Listing 1.2 which uses this technique. For all intents and purposes, the two
programs are equivalent.
1
2
3
4
5
6
7
8
9
10
11
|
#include <iostream.h>
int main (void)
{
int workDays = 5;
float workHours = 7.5;
float payRate = 38.55;
float weeklyPay = workDays * workHours * payRate;
cout
<< "Weekly Pay = ";
cout
<< weeklyPay;
cout
<< '\n';
}
|
¨
Simple Input/Output
The most common way in which a program
communicates with the outside world is through simple, character-oriented
Input/Output (IO) operations. C++ provides two useful operators for this
purpose: >> for input and << for output. We have already seen examples of output using <<. Listing 1.4 also illustrates the use of >> for input.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
#include <iostream.h>
int main (void)
{
int workDays = 5;
float workHours = 7.5;
float payRate, weeklyPay;
cout
<< "What is the hourly pay rate? ";
cin
>> payRate;
weeklyPay
= workDays * workHours * payRate;
cout
<< "Weekly Pay = ";
cout
<< weeklyPay;
cout
<< '\n';
}
|
Annotation
7 This
line outputs the prompt What
is the hourly pay rate? to seek user input.
8 This
line reads the input value typed by the user and copies it to payRate. The input operator >> takes an input stream as its left operand (cin is the standard C++ input
stream which corresponds to data entered via the keyboard) and a variable (to
which the input data is copied) as its right operand.
9-13 The
rest of the program is as before.
When run, the program will produce the
following output (user input appears in bold):
What is the hourly pay rate? 33.55
Weekly Pay = 1258.125
Both
<< and >> return their left operand as their result, enabling multiple input
or multiple output operations to be combined into one statement. This is
illustrated by Listing 1.5 which now allows the input of both the daily work hours and the
hourly pay rate.
1
2
3
4
5
6
7
8
9
10
|
#include <iostream.h>
int main (void)
{
int workDays = 5;
float workHours, payRate, weeklyPay;
cout
<< "What are the work hours and the hourly pay rate? ";
cin
>> workHours >> payRate;
weeklyPay
= workDays * workHours * payRate;
cout
<< "Weekly Pay = " << weeklyPay << '\n';
}
|
Annotation
7 This
line reads two input values typed by the user and copies them to workHours and payRate, respectively. The two
values should be separated by white space (i.e., one or more space or tab
characters). This statement is equivalent to:
(cin >> workHours) >> payRate;
Because
the result of >> is its left operand, (cin
>> workHours) evaluates to cin which is then used as the
left operand of the next >> operator.
9 This
line is the result of combining lines 10-12 from Listing 1.4. It outputs "Weekly
Pay = ", followed by the value of weeklyPay, followed by a newline
character. This statement is equivalent to:
((cout << "Weekly Pay = ") << weeklyPay) <<
'\n';
Because
the result of << is its left operand, (cout
<< "Weekly Pay = ") evaluates to cout which is then used as the left operand of
the next << operator, etc.
When run, the program will produce the
following output:
What are the work hours and the hourly pay rate?
7.5
33.55
Weekly Pay = 1258.125
¨
Comments
A comment is a piece of descriptive text
which explains some aspect of a program. Program comments are totally ignored
by the compiler and are only intended for human readers. C++ provides two types
of comment delimiters:
Anything after // (until the end of the line on
which it appears) is considered a comment.
Anything enclosed by the
pair /*
and */
is considered a comment.
Listing 1.6 illustrates the use of both forms.
1
2
3
4
5
6
7
8
9
10
11
12
13
|
#include <iostream.h>
/* This program calculates the weekly
gross pay for a worker,
based on the total number of hours worked and the hourly pay
rate. */
int main (void)
{
int workDays = 5; // Number of work days per week
float workHours = 7.5; // Number of work hours per day
float payRate = 33.50; // Hourly pay rate
float weeklyPay; //
Gross weekly pay
weeklyPay
= workDays * workHours * payRate;
cout
<< "Weekly Pay = " << weeklyPay << '\n';
}
|
Comments
should be used to enhance (not to hinder) the readability of a program. The
following two points, in particular, should be noted:
A comment should be easier
to read and understand than the code which it tries to explain. A confusing or
unnecessarily-complex comment is worse than no comment at all.
Over-use of comments can
lead to even less readability. A program which contains so much comment that
you can hardly see the code can by no means be considered readable.
Use of descriptive names
for variables and other entities in a program, and proper indentation of the
code can reduce the need for using comments.
The best guideline for how to use comments
is to simply apply common sense.
¨
Memory
A computer provides a Random Access Memory
(RAM) for storing executable program code as well as the data the program
manipulates. This memory can be thought of as a contiguous sequence of bits, each of which is capable of
storing a binary digit (0 or 1).
Typically, the memory is also divided into groups of 8 consecutive bits (called
bytes). The bytes are sequentially
addressed. Therefore each byte can be uniquely identified by its address (see Figure 1.2).
The
C++ compiler generates executable code which maps data entities to memory
locations. For example, the variable definition
int salary = 65000;
causes the compiler to allocate a few bytes
to represent salary. The exact number of bytes allocated and the method used for the
binary representation of the integer depends on the specific C++
implementation, but let us say two bytes encoded as a 2’s complement integer.
The compiler uses the address of the
first byte at which salary is
allocated to refer to it. The above assignment causes the value 65000 to be
stored as a 2’s complement integer in the two bytes allocated (see Figure 1.3).
While
the exact binary representation of a data item is rarely of interest to a
programmer, the general organization of memory and use of addresses for
referring to data items (as we will see later) is very important.
¨
Integer Numbers
An integer
variable may be defined to be of type short, int, or long. The only difference is
that an int uses more or at least the same number of bytes as a short, and a long uses more or at least the same number of
bytes as an int. For example, on the
author’s PC, a short uses 2 bytes, an int also 2 bytes, and a long 4 bytes.
short age
= 20;
int salary
= 65000;
long price
= 4500000;
By
default, an integer variable is assumed to be signed (i.e., have a signed
representation so that it can assume positive as well as negative values).
However, an integer can be defined to be unsigned by using the keyword unsigned in its definition. The
keyword signed is also allowed but is redundant.
unsigned short age
= 20;
unsigned int salary
= 65000;
unsigned long price
= 4500000;
A
literal integer (e.g., 1984) is always assumed to be of
type int,
unless it has an L or l suffix, in which case it is treated as a long. Also, a literal integer
can be specified to be unsigned using the suffix U or u. For example:
1984L 1984l 1984U 1984u 1984LU 1984ul
Literal
integers can be expressed in decimal, octal, and hexadecimal notations. The
decimal notation is the one we have been using so far. An integer is taken to
be octal if it is preceded by a zero (0), and hexadecimal if it is preceded by a 0x or 0X. For example:
92 //
decimal
0134 //
equivalent octal
0x5C //
equivalent hexadecimal
Octal numbers use the base 8, and can
therefore only use the digits 0-7. Hexadecimal numbers use the base 16, and therefore use the letter A-F (or a-f) to represent, respectively,
10-15. Octal and hexadecimal numbers are calculated as follows:
0134 = 1 × 82 + 3 × 81 + 4 × 80 = 64 +
24 + 4 = 92
0x5C = 5 × 161 + 12 × 160 = 80 + 12 =
92
¨
Real Numbers
A real
variable may be defined to be of type float or double. The latter uses more
bytes and therefore offers a greater range and accuracy for representing real
numbers. For example, on the author’s PC, a float uses 4 and a double uses 8 bytes.
float interestRate
= 0.06;
double pi
= 3.141592654;
A literal
real (e.g., 0.06) is always assumed to be of type double, unless it has an F or f suffix, in which case it is
treated as a float, or an L or l suffix, in which case it is treated as a long double. The latter uses
more bytes than a double for better accuracy (e.g., 10 bytes on the author’s PC). For
example:
0.06F 0.06f 3.141592654L 3.141592654l
In
addition to the decimal notation used so far, literal reals may also be
expressed in scientific notation. For
example, 0.002164 may be written in the scientific notation as:
2.164E-3 or 2.164e-3
The letter E (or e) stands for exponent. The scientific notation is
interpreted as follows:
2.164E-3 =
2.164 × 10-3
¨
Characters
A character
variable is defined to be of type char. A character variable occupies a single
byte which contains the code for the
character. This code is a numeric value and depends on the character coding system being used (i.e., is machine-dependent).
The most common system is ASCII (American Standard Code for Information
Interchange). For example, the character A
has the ASCII code 65, and the character a
has the ASCII code 97.
char ch
= 'A';
Like
integers, a character variable may be specified to be signed or unsigned. By
the default (on most systems) char means signed char. However, on some systems it may mean unsigned char. A signed
character variable can hold numeric values in the range -128 through 127. An
unsigned character variable can hold numeric values in the range 0 through 255.
As a result, both are often used to represent small integers in programs (and
can be assigned numeric values like integers):
signed char offset
= -88;
unsigned char row
= 2, column = 26;
A
literal character is written by
enclosing the character between a pair of single quotes (e.g., 'A'). Nonprintable characters
are represented using escape sequences. For example:
'\n' //
new line
'\r' //
carriage return
'\t' //
horizontal tab
'\v' //
vertical tab
'\b' //
backspace
'\f' //
formfeed
Single and double quotes and the backslash
character can also use the escape notation:
'\'' //
single quote (')
'\"' //
double quote (")
'\\' //
backslash (\)
Literal
characters may also be specified using their numeric code value. The general
escape sequence \ooo (i.e., a backslash followed by up to three octal digits) is used
for this purpose. For example (assuming ASCII):
'\12' //
newline (decimal code = 10)
'\11' //
horizontal tab (decimal code = 9)
'\101' //
'A' (decimal code = 65)
'\0' //
null (decimal code = 0)
¨
Strings
A string is a consecutive sequence (i.e., array) of characters which are
terminated by a null character. A string
variable is defined to be of type char* (i.e., a pointer to character). A pointer is simply the address of a memory
location. (Pointers will be discussed in Chapter 5). A string variable,
therefore, simply contains the address of where the first character of a string
appears. For example, consider the definition:
char *str
= "HELLO";
Figure 1.4 illustrates how the string variable str and the string "HELLO" might appear
in memory.
A
literal string is written by
enclosing its characters between a pair of double quotes (e.g., "HELLO"). The compiler
always appends a null character to a literal string to mark its end. The
characters of a string may be specified using any of the notations for
specifying literal characters. For example:
"Name\tAddress\tTelephone" // tab-separated words
"ASCII character 65: \101" // 'A' specified as '101'
A
long string may extend beyond a single line, in which case each of the
preceding lines should be terminated by a backslash. For example:
"Example to show \
the use of backslash for \
writing a long string"
The backslash in this context means that
the rest of the string is continued on the next line. The above string is
equivalent to the single line string:
"Example to show the use of backslash for
writing a long string"
A
common programming error results from confusing a single-character string
(e.g., "A") with a single character (e.g., 'A'). These two are not equivalent. The former consists of
two bytes (the character 'A' followed by the character '\0'), whereas the latter consists of a single
byte.
The
shortest possible string is the null string ("") which simply
consists of the null character. ¨
Names
Programming languages use names to refer to
the various entities that make up a program. We have already seen examples of
an important category of such names (i.e., variable names). Other categories
include: function names, type names, and macro names, which will be described
later in this book.
Names
are a programming convenience, which allow the programmer to organize what
would otherwise be quantities of plain data into a meaningful and
human-readable collection. As a result, no trace of a name is left in the final
executable code generated by a compiler. For example, a temperature variable eventually
becomes a few bytes of memory which is referred to by the executable code by
its address, not its name.
C++
imposes the following rules for creating valid names (also called identifiers). A name should consist of
one or more characters, each of which may be a letter (i.e., 'A'-'Z' and
'a'-'z'), a digit (i.e., '0'-'9'), or an underscore character ('_'), except
that the first character may not be a digit. Upper and lower case letters are
distinct. For example:
salary // valid identifier
salary2 // valid identifier
2salary // invalid identifier (begins with a digit)
_salary // valid identifier
Salary // valid but distinct from salary
C++
imposes no limit on the number of characters in an identifier. However, most
implementation do. But the limit is usually so large that it should not cause a
concern (e.g., 255 characters).
Certain
words are reserved by C++ for specific purposes and may not be used as
identifiers. These are called reserved
words or keywords and are
summarized in Table 1.1:
asm
|
continue
|
float
|
new
|
signed
|
try
|
auto
|
default
|
for
|
operator
|
sizeof
|
typedef
|
break
|
delete
|
friend
|
private
|
static
|
union
|
case
|
do
|
goto
|
protected
|
struct
|
unsigned
|
catch
|
double
|
if
|
public
|
switch
|
virtual
|
char
|
else
|
inline
|
register
|
template
|
void
|
class
|
enum
|
int
|
return
|
this
|
volatile
|
const
|
extern
|
long
|
short
|
throw
|
while
|
¨
Exercises
1.1 Write a program
which inputs a temperature reading expressed in Fahrenheit and outputs its
equivalent in Celsius, using the formula:
Compile and run the program. Its behavior
should resemble this:
Temperature in Fahrenheit: 41
41 degrees Fahrenheit = 5 degrees Celsius
1.2 Which of the
following represent valid variable definitions?
int n =
-100;
unsigned int i = -100;
signed int = 2.9;
long m = 2, p = 4;
int 2k;
double x = 2 * m;
float y = y * 2;
unsigned double z = 0.0;
double d = 0.67F;
float f = 0.52L;
signed char = -1786;
char c = '$' + 2;
sign char h = '\111';
char *name = "Peter Pan";
unsigned char *num = "276811";
1.3 Which of the
following represent valid identifiers?
identifier
seven_11
_unique_
gross-income
gross$income
2by2
default
average_weight_of_a_large_pizza
variable
object.oriented
1.4 Define variables
to represent the following entities:
Age of a person.
Income of an employee.
Number of words in a
dictionary.
A letter of the alphabet.
A greeting message.