|
|
The ANSI C functions declared in the <ctype.h> header file classify or convert character-coded integer values according to type and conversion information in the program's locale. All the classification functions except isdigit and isxdigit can return nonzero (true) for single-byte supplementary code set characters when the LC_CTYPE category of the current locale is other than ``"C"''. In a Spanish locale, isalpha('n[~]') should be true. Similarly, the case conversion functions toupper and tolower will appropriately convert any single-byte supplementary code set characters identified by the isalpha function.
The point of these functions is to let you determine a character's type or case without reference to its numeric value in a given code set. Whereas a program written for a US ASCII environment might test whether a character is printable with the code
if ( c <= 037 || c == 0177 )a codeset-independent program will use isprint:
if ( !isprint(c) )Similarly,
c = toupper(c);will do the same thing as
if( c >= 'a' && c <= 'z') c += 'a' -'A';without relying on the fact that upper- and lower case characters are numerically contiguous in the US ASCII code set.
The <ctype.h> functions are almost always macros that are implemented using table lookups indexed by the character argument. Their behavior is changed by resetting the table(s) to the new locale's values, so there should be no performance impact. The classification functions are described on the ctype(3C) manual page, the conversion functions on the conv(3C) page. Both single- and multibyte character classification and conversion routines are declared in the <wchar.h> header, and described on the pages wctype(3C) and wconv(3C). Note that the multibyte routines are not part of the ANSI C standard, nor are the single-byte functions isascii and toascii.
In some C language implementations, character variables
that are not explicitly declared signed
or unsigned
are treated as nonnegative quantities with a range
typically from 0 to 255.
In other implementations, they are treated as signed quantities
with a range typically from -128 to 127.
When a signed object of type char
is converted to a wider integer,
the machine is obliged to propagate the sign,
which is encoded in the high-order bit of the new integer object.
If the character
variable holds an eight-bit character with the high-order bit set,
the sign bit will be propagated the full width of an
object of type int
or long
, producing a negative value.
You can avoid this problem
(which typically occurs with the ctype functions)
by declaring as unsigned
any object of type char
that is liable to be converted
to a wider integer.
In the example we showed earlier, for instance,
the declaration of the character pointer as of type unsigned char
would guarantee that on any implementation
the values pointed at will be nonnegative.
A related problem arises when characters are used as indices into arrays and tables. If a table has been defined to contain only 128 possible characters, the amount of allocated memory will be exceeded if an eight-bit character whose value is greater than 127 is used as an index. Moreover, if the character is signed, the index may be negative.
The solution, at least when dealing with 8-bit code sets,
is obviously to increase the size of the table
from the 7-bit maximum of 128 to the 8-bit maximum of 256.
And again, to declare the object that will hold the
character as type unsigned char
.