C++ Programming by Sharam Hekmat (chapter 1) Preliminaries



 1.       Preliminaries
 
This chapter introduces the basic elements of a C++ program. We will use simple examples to show the structure of C++ programs and the way they are compiled. Elementary concepts such as constants, variables, and their storage in memory will also be discussed.
       The following is a cursory description of the concept of programming for the benefit of those who are new to the subject.

Programming

A digital computer is a useful tool for solving a great variety of problems. A solution to a problem is called an algorithm; it describes the sequence of steps to be performed for the problem to be solved. A simple example of a problem and an algorithm for it would be:

Problem:           Sort a list of names in ascending lexicographic order.
Algorithm:         Call the given list list1; create an empty list, list2, to hold the sorted list. Repeatedly find the ‘smallest’ name in list1, remove it from list1, and make it the next entry of list2, until list1 is empty.

An algorithm is expressed in abstract terms. To be intelligible to a computer, it needs to be expressed in a language understood by it. The only language really understood by a computer is its own machine language. Programs expressed in the machine language are said to be executable. A program written in any other language needs to be first translated to the machine language before it can be executed.
       A machine language is far too cryptic to be suitable for the direct use of programmers. A further abstraction of this language is the assembly language which provides mnemonic names for the instructions and a more intelligible notation for the data. An assembly language program is translated to machine language by a translator called an assembler.
       Even assembly languages are difficult to work with. High-level languages such as C++ provide a much more convenient notation for implementing algorithms. They liberate programmers from having to think in very low-level terms, and help them to focus on the algorithm instead. A program written in a high-level language is translated to assembly language by a translator called a compiler. The assembly code produced by the compiler is then assembled to produce an executable program.

A Simple C++ Program

Listing 1.1 shows our first C++ program, which when run, simply outputs the message Hello World.

Listing 1.1
1

2
3
4
5
#include <iostream.h>

int main (void)
{
    cout << "Hello World\n";
}

Annotation
1     This line uses the preprocessor directive #include to include the contents of the header file iostream.h in the program. Iostream.h is a standard C++ header file and contains definitions for input and output.
2     This line defines a function called main. A function may have zero or more parameters; these always appear after the function name, between a pair of brackets. The word void appearing between the brackets indicates that main has no parameters. A function may also have a return type; this always appears before the function name. The return type for main is int (i.e., an integer number). All C++ programs must have exactly one main function. Program execution always begins from main.
3     This brace marks the beginning of the body of main.
4     This line is a statement. A statement is a computation step which may produce a value. The end of a statement is always marked with a semicolon (;). This statement causes the string "Hello World\n" to be sent to the cout output stream. A string is any sequence of characters enclosed in double-quotes. The last character in this string (\n) is a newline character which is similar to a carriage return on a type writer. A stream is an object which performs input or output. Cout is the standard output stream in C++ (standard output usually means your computer monitor screen). The symbol << is an output operator which takes an output stream as its left operand and an expression as its right operand, and causes the value of the latter to be sent to the former. In this case, the effect is that the string "Hello World\n" is sent to cout, causing it to be printed on the computer monitor screen.
5     This brace marks the end of the body of main.                                         ¨

Compiling a Simple C++ Program

Dialog 1.1 shows how the program in Listing 1.1 is compiled and run in a typical UNIX environment. User input appears in bold and system response in plain. The UNIX command line prompt appears as a dollar symbol ($).

Dialog 1.1
1
 2
3
4 
$  CC hello.cc
$  a.out
Hello World
$

Annotation
1     The command for invoking the AT&T C++ translator in a UNIX environment is CC. The argument to this command (hello.cc) is the name of the file which contains the program. As a convention, the file name should end in .c, .C, or .cc. (This ending may be different in other systems.)
2     The result of compilation is an executable file which is by default named a.out. To run the program, we just use a.out as a command.
3     This is the output produced by the program.
4     The return of the system prompt indicates that the program has completed its execution.
       The CC command accepts a variety of useful options. An option appears as -name, where name is the name of the option (usually a single letter). Some options take arguments. For example, the output option (-o) allows you to specify a name for the executable file produced by the compiler instead of a.out. Dialog 1.Error! Bookmark not defined. illustrates the use of this option by specifying hello as the name of the executable file.

Dialog 1.2
1
 2
3
4 
$  CC hello.cc -o hello
$  hello
Hello World
$

       Although the actual command may be different depending on the make of the compiler, a similar compilation procedure is used under MS-DOS. Windows-based C++ compilers offer a user-friendly environment where compilation is as simple as choosing a menu command. The naming convention under MS-DOS and Windows is that C++ source file names should end in .cpp.   ¨

How C++ Compilation Works

Compiling a C++ program involves a number of steps (most of which are transparent to the user):
    First, the C++ preprocessor goes over the program text and carries out the instructions specified by the preprocessor directives (e.g., #include). The result is a modified program text which no longer contains any directives. (Chapter 12 describes the preprocessor in detail.)
    Then, the C++ compiler translates the program code. The compiler may be a true C++ compiler which generates native (assembly or machine) code, or just a translator which translates the code into C. In the latter case, the resulting C code is then passed through a C compiler to produce native object code. In either case, the outcome may be incomplete due to the program referring to library routines which are not defined as a part of the program. For example, Listing 1.1 refers to the << operator which is actually defined in a separate IO library.
    Finally, the linker completes the object code by linking it with the object code of any library modules that the program may have referred to. The final result is an executable file.
Figure 1.1 illustrates the above steps for both a C++ translator and a C++ native compiler. In practice all these steps are usually invoked by a single command (e.g., CC) and the user will not even see the intermediate files generated.
                                                                                                                ¨

Variables

A variable is a symbolic name for a memory location in which data can be stored and subsequently recalled. Variables are used for holding data values so that they can be utilized in various computations in a program. All variables have two important attributes:
    A type which is established when the variable is defined (e.g., integer, real, character). Once defined, the type of a C++ variable cannot be changed.
    A value which can be changed by assigning a new value to the variable. The kind of values a variable can assume depends on its type. For example, an integer variable can only take integer values (e.g., 2, 100, -12).
Listing 1.2 illustrates the uses of some simple variable.

Listing 1.2
1

2
3
4
5

6
7
8
9
10
11
12
13 
#include <iostream.h>

int main (void)
{
    int     workDays;
    float   workHours, payRate, weeklyPay;

    workDays = 5;
    workHours = 7.5;
    payRate = 38.55;
    weeklyPay = workDays  * workHours * payRate;
    cout << "Weekly Pay = ";
    cout << weeklyPay;
    cout << '\n';
}

Annotation
4     This line defines an int (integer) variable called workDays, which will represent the number of working days in a week. As a general rule, a variable is defined by specifying its type first, followed by the variable name, followed by a semicolon.
5     This line defines three float (real) variables which, respectively, represent the work hours per day, the hourly pay rate, and the weekly pay. As illustrated by this line, multiple variables of the same type can be defined at once by separating them with commas.
6     This line is an assignment statement. It assigns the value 5 to the variable workDays. Therefore, after this statement is executed, workDays denotes the value 5.
7     This line assigns the value 7.5 to the variable workHours.
8     This line assigns the value 38.55 to the variable payRate.
9     This line calculates the weekly pay as the product of workDays, workHours, and payRate (* is the multiplication operator). The resulting value is stored in weeklyPay.
10-12     These lines output three items in sequence: the string "Weekly Pay = ", the value of the variable weeklyPay, and a newline character.
When run, the program will produce the following output:

Weekly Pay = 1445.625

       When a variable is defined, its value is undefined until it is actually assigned one. For example, weeklyPay has an undefined value (i.e., whatever happens to be in the memory location which the variable denotes at the time) until line 9 is executed. The assigning of a value to a variable for the first time is called initialization. It is important to ensure that a variable is initialized before it is used in any computation.
       It is possible to define a variable and initialize it at the same time. This is considered a good programming practice, because it pre-empts the possibility of using the variable prior to it being initialized. Listing 1.3 is a revised version of Listing 1.2 which uses this technique. For all intents and purposes, the two programs are equivalent.

Listing 1.3
1

2
3
4
5
6
7

8
9
10
11
#include <iostream.h>

int main (void)
{
    int     workDays = 5;
    float   workHours = 7.5;
    float   payRate = 38.55;
    float   weeklyPay = workDays * workHours * payRate;

    cout << "Weekly Pay = ";
    cout << weeklyPay;
    cout << '\n';
}

                                                                                                                          ¨

Simple Input/Output

The most common way in which a program communicates with the outside world is through simple, character-oriented Input/Output (IO) operations. C++ provides two useful operators for this purpose: >> for input and << for output. We have already seen examples of output using <<. Listing 1.4 also illustrates the use of >> for input.

Listing 1.4
1

2
3
4
5
6

7
8

9
10
11
12
13
#include <iostream.h>

int main (void)
{
    int     workDays = 5;
    float   workHours = 7.5;
    float   payRate, weeklyPay;

    cout << "What is the hourly pay rate? ";
    cin >> payRate;

    weeklyPay = workDays * workHours * payRate;
    cout << "Weekly Pay = ";
    cout << weeklyPay;
    cout << '\n';
}

Annotation
7     This line outputs the prompt What is the hourly pay rate? to seek user input.
8     This line reads the input value typed by the user and copies it to payRate. The input operator >> takes an input stream as its left operand (cin is the standard C++ input stream which corresponds to data entered via the keyboard) and a variable (to which the input data is copied) as its right operand.
9-13       The rest of the program is as before.
When run, the program will produce the following output (user input appears in bold):

What is the hourly pay rate? 33.55
Weekly Pay = 1258.125

       Both << and >> return their left operand as their result, enabling multiple input or multiple output operations to be combined into one statement. This is illustrated by Listing 1.5 which now allows the input of both the daily work hours and the hourly pay rate.


Listing 1.5
1

2
3
4
5

6
7

8
9
10
#include <iostream.h>

int main (void)
{
    int     workDays = 5;
    float   workHours, payRate, weeklyPay;

    cout << "What are the work hours and the hourly pay rate? ";
    cin >> workHours >> payRate;

    weeklyPay = workDays * workHours * payRate;
    cout << "Weekly Pay = " << weeklyPay << '\n';
}

Annotation
7     This line reads two input values typed by the user and copies them to workHours and payRate, respectively. The two values should be separated by white space (i.e., one or more space or tab characters). This statement is equivalent to:
              (cin >> workHours) >> payRate;
       Because the result of >> is its left operand, (cin >> workHours) evaluates to cin which is then used as the left operand of the next >> operator.
9     This line is the result of combining lines 10-12 from Listing 1.4. It outputs "Weekly Pay = ", followed by the value of weeklyPay, followed by a newline character. This statement is equivalent to:
              ((cout << "Weekly Pay = ") << weeklyPay) << '\n';
       Because the result of << is its left operand, (cout << "Weekly Pay = ") evaluates to cout which is then used as the left operand of the next << operator, etc.
When run, the program will produce the following output:

What are the work hours and the hourly pay rate? 7.5  33.55
Weekly Pay = 1258.125

                                                                                                                          ¨

Comments

A comment is a piece of descriptive text which explains some aspect of a program. Program comments are totally ignored by the compiler and are only intended for human readers. C++ provides two types of comment delimiters:
    Anything after // (until the end of the line on which it appears) is considered a comment.
    Anything enclosed by the pair /* and */ is considered a comment.
Listing 1.6 illustrates the use of both forms.

Listing 1.6
1

2
3
4

5
6
7
8
9
10

11
12
13
#include <iostream.h>

/* This program calculates the weekly gross pay for a worker,
   based on the total number of hours worked and the hourly pay
   rate. */

int main (void)
{
    int     workDays = 5;       // Number of work days per week
    float   workHours = 7.5;        // Number of work hours per day
    float   payRate = 33.50;        // Hourly pay rate
    float   weeklyPay;          // Gross weekly pay

    weeklyPay = workDays * workHours * payRate;
    cout << "Weekly Pay = " << weeklyPay << '\n';
}

       Comments should be used to enhance (not to hinder) the readability of a program. The following two points, in particular, should be noted:
    A comment should be easier to read and understand than the code which it tries to explain. A confusing or unnecessarily-complex comment is worse than no comment at all.
    Over-use of comments can lead to even less readability. A program which contains so much comment that you can hardly see the code can by no means be considered readable.
    Use of descriptive names for variables and other entities in a program, and proper indentation of the code can reduce the need for using comments.
The best guideline for how to use comments is to simply apply common sense.
                                                                                                                        ¨

Memory

A computer provides a Random Access Memory (RAM) for storing executable program code as well as the data the program manipulates. This memory can be thought of as a contiguous sequence of bits, each of which is capable of storing a binary digit (0 or 1). Typically, the memory is also divided into groups of 8 consecutive bits (called bytes). The bytes are sequentially addressed. Therefore each byte can be uniquely identified by its address (see Figure 1.2).



       The C++ compiler generates executable code which maps data entities to memory locations. For example, the variable definition

int salary = 65000;

causes the compiler to allocate a few bytes to represent salary. The exact number of bytes allocated and the method used for the binary representation of the integer depends on the specific C++ implementation, but let us say two bytes encoded as a 2’s complement integer. The compiler uses the address of the first byte at which salary is allocated to refer to it. The above assignment causes the value 65000 to be stored as a 2’s complement integer in the two bytes allocated (see Figure 1.3).




       While the exact binary representation of a data item is rarely of interest to a programmer, the general organization of memory and use of addresses for referring to data items (as we will see later) is very important.

                                                                                                                          ¨

Integer Numbers

An integer variable may be defined to be of type short, int, or long. The only difference is that an int uses more or at least the same number of bytes as a short, and a long uses more or at least the same number of bytes as an int. For example, on the author’s PC, a short uses 2 bytes, an int also 2 bytes, and a long 4 bytes.

short   age = 20;
int     salary = 65000;
long    price = 4500000;

       By default, an integer variable is assumed to be signed (i.e., have a signed representation so that it can assume positive as well as negative values). However, an integer can be defined to be unsigned by using the keyword unsigned in its definition. The keyword signed is also allowed but is redundant.

unsigned short  age = 20;
unsigned int        salary = 65000;
unsigned long   price = 4500000;

       A literal integer (e.g., 1984) is always assumed to be of type int, unless it has an L or l suffix, in which case it is treated as a long. Also, a literal integer can be specified to be unsigned using the suffix U or u. For example:
   
1984L   1984l   1984U   1984u   1984LU  1984ul

       Literal integers can be expressed in decimal, octal, and hexadecimal notations. The decimal notation is the one we have been using so far. An integer is taken to be octal if it is preceded by a zero (0), and hexadecimal if it is preceded by a 0x or 0X. For example:

92      // decimal
0134    // equivalent octal
0x5C    // equivalent hexadecimal

Octal numbers use the base 8, and can therefore only use the digits 0-7. Hexadecimal numbers use the base 16, and therefore use the letter A-F (or a-f) to represent, respectively, 10-15. Octal and hexadecimal numbers are calculated as follows:

     0134 = 1 × 82 + 3 × 81 + 4 × 80 = 64 + 24 + 4 = 92
     0x5C = 5 × 161 + 12 × 160 = 80 + 12 = 92
                                                                 ¨

Real Numbers

A real variable may be defined to be of type float or double. The latter uses more bytes and therefore offers a greater range and accuracy for representing real numbers. For example, on the author’s PC, a float uses 4 and a double uses 8 bytes.

float   interestRate = 0.06;
double  pi = 3.141592654;

A literal real (e.g., 0.06) is always assumed to be of type double, unless it has an F or f suffix, in which case it is treated as a float, or an L or l suffix, in which case it is treated as a long double. The latter uses more bytes than a double for better accuracy (e.g., 10 bytes on the author’s PC). For example:
   
0.06F   0.06f   3.141592654L        3.141592654l

       In addition to the decimal notation used so far, literal reals may also be expressed in scientific notation. For example, 0.002164 may be written in the scientific notation as:

2.164E-3        or      2.164e-3

The letter E (or e) stands for exponent. The scientific notation is interpreted as follows:

2.164E-3 = 2.164 × 10-3

                                                                                                                          ¨

Characters

A character variable is defined to be of type char. A character variable occupies a single byte which contains the code for the character. This code is a numeric value and depends on the character coding system being used (i.e., is machine-dependent). The most common system is ASCII (American Standard Code for Information Interchange). For example, the character A has the ASCII code 65, and the character a has the ASCII code 97.

char    ch = 'A';

       Like integers, a character variable may be specified to be signed or unsigned. By the default (on most systems) char means signed char. However, on some systems it may mean unsigned char. A signed character variable can hold numeric values in the range -128 through 127. An unsigned character variable can hold numeric values in the range 0 through 255. As a result, both are often used to represent small integers in programs (and can be assigned numeric values like integers):

signed char     offset = -88;
unsigned char   row = 2, column = 26;

       A literal character is written by enclosing the character between a pair of single quotes (e.g., 'A'). Nonprintable characters are represented using escape sequences. For example:

'\n'    // new line
'\r'    // carriage return
'\t'    // horizontal tab
'\v'    // vertical tab
'\b'    // backspace
'\f'    // formfeed

Single and double quotes and the backslash character can also use the escape notation:

'\''    // single quote (')
'\"'    // double quote (")
'\\'    // backslash (\)

       Literal characters may also be specified using their numeric code value. The general escape sequence \ooo (i.e., a backslash followed by up to three octal digits) is used for this purpose. For example (assuming ASCII):

'\12'   // newline (decimal code = 10)
'\11'   // horizontal tab (decimal code = 9)
'\101'  // 'A' (decimal code = 65)
'\0'    // null (decimal code = 0)
                                                                                                                          ¨

Strings

A string is a consecutive sequence (i.e., array) of characters which are terminated by a null character. A string variable is defined to be of type char* (i.e., a pointer to character). A pointer is simply the address of a memory location. (Pointers will be discussed in Chapter 5). A string variable, therefore, simply contains the address of where the first character of a string appears. For example, consider the definition:

char    *str = "HELLO";

Figure 1.4 illustrates how the string variable str and the string "HELLO" might appear in memory.




       A literal string is written by enclosing its characters between a pair of double quotes (e.g., "HELLO"). The compiler always appends a null character to a literal string to mark its end. The characters of a string may be specified using any of the notations for specifying literal characters. For example:

"Name\tAddress\tTelephone"      // tab-separated words
"ASCII character 65: \101"      // 'A' specified as '101'

       A long string may extend beyond a single line, in which case each of the preceding lines should be terminated by a backslash. For example:

"Example to show \
the use of backslash for \
writing a long string"

The backslash in this context means that the rest of the string is continued on the next line. The above string is equivalent to the single line string:

"Example to show the use of backslash for writing a long string"

       A common programming error results from confusing a single-character string (e.g., "A") with a single character (e.g., 'A'). These two are not equivalent. The former consists of two bytes (the character 'A' followed by the character '\0'), whereas the latter consists of a single byte.
       The shortest possible string is the null string ("") which simply consists of the null character.       ¨

Names

Programming languages use names to refer to the various entities that make up a program. We have already seen examples of an important category of such names (i.e., variable names). Other categories include: function names, type names, and macro names, which will be described later in this book.
       Names are a programming convenience, which allow the programmer to organize what would otherwise be quantities of plain data into a meaningful and human-readable collection. As a result, no trace of a name is left in the final executable code generated by a compiler. For example, a temperature variable eventually becomes a few bytes of memory which is referred to by the executable code by its address, not its name.
       C++ imposes the following rules for creating valid names (also called identifiers). A name should consist of one or more characters, each of which may be a letter (i.e., 'A'-'Z' and 'a'-'z'), a digit (i.e., '0'-'9'), or an underscore character ('_'), except that the first character may not be a digit. Upper and lower case letters are distinct. For example:

salary      // valid identifier
salary2     // valid identifier
2salary     // invalid identifier (begins with a digit)
_salary     // valid identifier
Salary      // valid but distinct from salary

       C++ imposes no limit on the number of characters in an identifier. However, most implementation do. But the limit is usually so large that it should not cause a concern (e.g., 255 characters).
       Certain words are reserved by C++ for specific purposes and may not be used as identifiers. These are called reserved words or keywords and are summarized in Table 1.1:

Table 1.1     C++ keywords.
asm
continue
float
new
signed
try
auto
default
for
operator
sizeof
typedef
break
delete
friend
private
static
union
case
do
goto
protected
struct
unsigned
catch
double
if
public
switch
virtual
char
else
inline
register
template
void
class
enum
int
return
this
volatile
const
extern
long
short
throw
while

                                                                                                                          ¨

Exercises

1.1              Write a program which inputs a temperature reading expressed in Fahrenheit and outputs its equivalent in Celsius, using the formula:
      
Compile and run the program. Its behavior should resemble this:

Temperature in Fahrenheit: 41
41 degrees Fahrenheit = 5 degrees Celsius

1.2              Which of the following represent valid variable definitions?

int n = -100;
unsigned int i = -100;
signed int = 2.9;
long m = 2, p = 4;
int 2k;
double x = 2 * m;
float y = y * 2;
unsigned double z = 0.0;
double d = 0.67F;
float f = 0.52L;
signed char = -1786;
char c = '$' + 2;
sign char h = '\111';
char *name = "Peter Pan";
unsigned char *num = "276811";

1.3              Which of the following represent valid identifiers?

identifier
seven_11
_unique_
gross-income
gross$income
2by2
default
average_weight_of_a_large_pizza
variable
object.oriented

1.4              Define variables to represent the following entities:
    Age of a person.
    Income of an employee.
    Number of words in a dictionary.
    A letter of the alphabet.
           A greeting message.