|
|
This tutorial describes a C++ character string datatype, called a String, that behaves like the built-in datatypes of C. The syntax and semantics are similar to built-in types, and performance is comparable to what would be expected of built-in types.
The conventional representation of a string in C is a null-terminated array of characters. A variable that refers to a string is a character pointer that points to the first element of the array. This arrangement, while simple in concept and nearly ideal for the lowest level of software, has two outstanding disadvantages in practice: (1) equal character pointers point at the same chunk of memory, so strings are shared, and (2) there is no management of string storage. This has led to the existence of a variety of not-very-satisfactory techniques for handling strings in C, and to much annoyance for C programmers.
A C++ String is simply a sequence of 0 or more characters. It is not necessarily null-terminated; therefore, any value that fits into a char (even 0) can be anywhere in the String. Strings do their own storage management; they do not share memory, (this is not strictly true in the implementation, but true from a user's point of view) and they are automatically extensible. The implementation of Strings relies on C++'s constructors and destructors, member functions, and overloaded operators to encapsulate storage management and provide a more natural syntax for declaring, manipulating, and using sequences of characters. The syntax and semantics of operations on Strings are modeled after that for fixed size objects. Thus, assignment is by value, (the apparently required copy is avoided or at least postponed in the implementation) and Strings can be used as function arguments and result types. As usual in C, changes to the formal argument in the called function do not affect the actual argument in the caller.
There are functions and overloaded operators for writing String expressions. Also, there are versions of some of the functions in Sections 2 and 3 of the UNIX® System manual that can be called with String instead of character pointer arguments.
A programmer can declare and use pointers to Strings in the usual way (and with the usual risks of dangling pointers), but most of the performance improvement usually associated with pointer usage is already built into the datatypes. For example, when a function is called with a String as argument, a reference count is incremented, but no String copy occurs. Ordinary arrays of Strings are also available.
The following example shows how the String datatype is used.
It is a function
that takes a char,
c
, and a String, in_String
,
as arguments, and returns a copy of the String with all instances of
c
removed.
1: String 2: remove(char c, String in_String) 3: { 4: String out_String; 5: char temp; 6: while ( in_String.getX(temp) ) 7: if ( temp != c ) 8: out_String += temp; 9: return out_String; 10: }
In this example, line 1 defines the return type of the function, line 2 defines
the name of the function and its arguments and their types,
and lines 4 and 5 are
automatic variable declarations.
In line 6, the getX
function removes the first char from in_String
,
assigns it to temp
, and returns 1, as long as
in_String
is non-empty.
The postfix ``X'' is a lexical convention identifying functions that
assign to their first argument. They normally return 1, indicating
success, or 0, indicating failure.
When in_String
is empty, the getX
function returns 0, ending the while
loop.
In line 7,
temp
is compared to c
,
and if it is different, the +=
operator in line
8 adds it to the end of
out_String
.
Thus, in line 9 out_String
is the desired result, and the return statement returns it to the caller.
The rest of this tutorial describes Strings from the point of view of a user (programmer).