DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 
Complying with standard C

Encoding variations

The encoding schemes come in two variations:

  1. Where each multibyte character is self-identifying, therefore, any multibyte character can simply be inserted between any pair of multibyte characters. (The encoding used by the ANSI C compiler is one of these types; each byte of a non-single-byte character has the high-order bit set.)

  2. Where the presence of special ``shift bytes'' changes the interpretation of subsequent bytes. An example is the method used by most fancy character terminals to get in and out of line drawing mode. For programs written in multibyte characters with a shift-state-dependent encoding, ANSI C has the additional requirement that each comment, string literal, character constant, and header name must both begin and end in the unshifted state.

Next topic: Wide characters
Previous topic: ``Asianization'' means multibyte characters

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004