DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

Adobe CIDs and glyphs in CJK TrueType font

Table of contents

This article is written by Taiji Yamada <taiji@aihara.co.jp>. He takes full responsibility for the wording and content of this article.

For other information, see the Ghostscript overview.


Overview

This note provides information on utilization of CJK (Chinese, Japanese and Korean) TrueType fonts (TTF) as CIDFoneType 2 (Type11) CID-keyed fonts from the viewpoint of its validity and limitation. In order to compose CIDFontType 2 font from CJK TTF on the fly, Ghostscript uses not only standard CMaps (mapping from character encoding to CID, we call them ToCID CMaps in following) but also ToUnicode CMaps (mapping from CID to Unicode) and ToCode CMaps (mapping from CID to non-Unicode character encodings) which are freely distributed with Acrobat Reader by Adobe Systems Incorporated. For detail of ToUnicode CMaps, refer

Adobe Systems Incorporated, "PDF Reference, Third Edition, Version 1.4", p. 368 http://partners.adobe.com/asn/developer/acrosdk/docs/filefmtspecs/PDFReference.zip
Adobe Systems Incorporated, "ToUnicode Mapping File Tutorial", Technical Note #5411
http://partners.adobe.com/asn/developer/pdfs/tn/5411.ToUnicode.pdf
ToUnicode and ToCode CMaps are designed for text-searching in PDF, and you should not expect Adobe ToUnicode CMap as a one-to-one map between Adobe glyph collection and glyph/character in Unicode. The current algorithm of Ghostscript to map CIDs to TTF glyph ID completely depends on Adobe CMaps, TrueType cmap and GSUB tables.

The current revision of on-the-fly CIDFontType 2 technology supports the following kinds Registry-Ordering (RO) of CID-keyed fonts:

	[RO]
	Adobe-CNS1
	Adobe-GB1
	Adobe-Japan1
	Adobe-Japan2
	Adobe-Korea1

and doesn't support the following kinds of CID-keyed fonts:
	[RO]
	Adobe-CNS2
	Adobe-HongKong1
	Adobe-Korea2
	Adobe-Vietnam1

The current revision can handle the following kinds (Encoding in cmap table) of TrueType fonts as CID-keyed fonts:
	[Encoding]	[RO]
	Unicode		Adobe-*
	ShiftJIS	Adobe-Japan1
	PRC		Adobe-GB1
	Big5		Adobe-CNS1
	Wansung		Adobe-Korea1
	Johab		Adobe-Korea1

and doesn't support UCS-4 Encoding TrueType fonts for the present. In the case of Unicode Encoding, RO can be detected by reading ``Code Page Character Range'' of OS/2 table of TTF as follows:
	[Encoding]	[Code Page]		[RO]
	Unicode		Japanese		Adobe-Japan1
			Simplified Chinese	Adobe-GB1
			Korean Wansung		Adobe-Korea1
			Traditional Chinese	Adobe-CNS1
			Korean Johab		Adobe-Korea1

For each combination of RO and TTF Encoding, following Adobe CMaps are applied:
	[RO-Encoding]		[Supplement limit]
		[used CMap]		- [Comment]

	Adobe-CNS1-Big5		0
		Adobe-CNS1-ETen-B5	- ToCode CMap
		ETen-B5-V		- ToCID CMap
		ETen-B5-H		- ToCID CMap

	Adobe-CNS1-Unicode	3
		Adobe-CNS1-UCS2		- ToUnicode CMap
		UniCNS-UCS2-V		- ToCID CMap
		UniCNS-UCS2-H		- ToCID CMap

	Adobe-GB1-PRC		2
		Adobe-GB1-GBK-EUC	- ToCode CMap
		GBK-EUC-V		- ToCID CMap
		GBK-EUC-H		- ToCID CMap

	Adobe-GB1-Unicode	4
		Adobe-GB1-UCS2		- ToUnicode CMap
		UniGB-UCS2-V		- ToCID CMap
		UniGB-UCS2-H		- ToCID CMap

	Adobe-Japan1-ShiftJIS	2
		Adobe-Japan1-90ms-RKSJ	- ToCode CMap
		90ms-RKSJ-V		- ToCID CMap
		90ms-RKSJ-H		- ToCID CMap

	Adobe-Japan1-Unicode	4
		Adobe-Japan1-UCS2	- ToUnicode CMap
		UniJIS-UCS2-V		- ToCID CMap
		UniJIS-UCS2-H		- ToCID CMap

	Adobe-Japan2-Unicode	0
		UniHojo-UCS2-V		- ToCID CMap
		UniHojo-UCS2-H		- ToCID CMap

	Adobe-Korea1-Johab	1
		KSC-Johab-V		- ToCID CMap
		KSC-Johab-H		- ToCID CMap

	Adobe-Korea1-Unicode	2
		Adobe-Korea1-UCS2	- ToUnicode CMap
		UniKS-UCS2-V		- ToCID CMap
		UniKS-UCS2-H		- ToCID CMap

	Adobe-Korea1-Wansung	1
		Adobe-Korea1-KSCms-UHC	- ToCode CMap
		KSCms-UHC-V		- ToCID CMap
		KSCms-UHC-H		- ToCID CMap

where Supplement values are denoted as the limit determined by the maximum CID in used CMaps.

The Glyph Substitution table (GSUB) of TTF, Single Substitution Format 2 is read for vertically-used glyphs in CIDs. The current revision doesn't handle any other formats of GSUB, so handling ligatures and variants as CID-keyed fonts might be tasks to be solved in future.

In recent CID-keyed fonts, pre-rotated Latin glyphs are defined, but the current revision merely maps to normal Latin glyphs. Ghostscript cannot handle them at present.

Adobe CIDs that current Ghostscript can fills by CJK TTF

Following tables and comments provide the details of validity and limitation for the individual kinds of CID-keyed fonts composed from generally-circulated and Unicode TrueType fonts at the current revision. Naturally, these results of glyphs lacking are affected by TrueType fonts you use.

Adobe-CNS1 CID-keyed font composed from Traditional Chinese Unicode TTF
-----------------------------------------------------------------------
[ROS]		[CID range]	[Comment]
Adobe-CNS1-0	    0-  505	96,97,124-127,228,260 are lacking
		  506-  561	no problem
		  562-  594	all glyphs are lacking
		  595-13645	no problem
		13646-13748	13646,13647 are lacking
		13749-13998	13996-13998 are lacking
		13999-14098	some glyphs are lacking
Adobe-CNS1-1	14099-17407	lots of glyphs are lacking (*1)
Adobe-CNS1-2	17408-17600	17503,17504 are lacking (*2)
Adobe-CNS1-3	17601-17605	17603 is lacking
		17606-18845	lots of glyphs are lacking (*3)
Adobe-CNS1-4	18846-18961	all glyphs assignment is impossible (*4)
(*1) HK GCCS
(*2) not pre-rotated
(*3) HK SCS
(*4) HK SCS (unused in UniCNS-UCS-2 CMap, though used in UniCNS-UTF8,
UniCNS-UTF16, UniCNS-UTF32, also ETHK-B5, needless to say HKscs-B5)


Adobe-GB1 CID-keyed font composed from Simplified Chinese Unicode TTF
---------------------------------------------------------------------
[ROS]		[CID range]	[Comment]
Adobe-GB1-0	    0-  939	99,695,698,737,935,938 are lacking
		  940- 7702	no problem
		 7703- 7716	7705,7708 are incorrect
Adobe-GB1-1	 7717- 9896	no problem
Adobe-GB1-2	 9897-22126	no problem
Adobe-GB1-3	22127-22352	22347,22350,22352 are lacking (*1)
Adobe-GB1-4	22353-22427	all glyphs are not available (*2)
		22428-29058	all glyphs are not available (*3)
		29059-29063	all glyphs are not available (*4)
(*1) not pre-rotated
(*2) additional Hiragana and Katakana, extended Bopomofo glyphs
(*3) the Unified Han Ideographs Extension A
(*4) pre-rotated glyphs


Adobe-Japan1 CID-keyed font composed from Japanese Unicode TTF
--------------------------------------------------------------
[ROS]		[CID range]	[Comment]
Adobe-Japan1-0	    0- 1124	lots of glyphs are lacking or incorrect:
				96-98,127,128,130-133,135-137,226,326,
				390,396,422,424,502,506-509,512,513,515,
				606,607,632
		 1125- 7477	no problem
		 7478- 7632	7478 is lacking and 7608,7609 are incorrect
		 7633- 8004	lots of glyphs are lacking or incorrect (*4)
		 8005- 8283	lots of glyphs are lacking or incorrect:
				8008,8053,8059-8061,8091,8102-8111,8166-8181,
				8189,8190,8227-8229,8260
Adobe-Japan1-1	 8284- 8358	lots of glyphs are lacking or incorrect:
				8295-8297,8300-8302,
				8306,8307,8321,8322,8325,8326
Adobe-Japan1-2	 8359- 8717	no problem
		 8718- 8719	8718 is lacking and 8719 is incorrect
Adobe-Japan1-3	 8720- 9353	some glyphs are lacking or incorrect (*1)
Adobe-Japan1-4	 9354- 9737	some glyphs are lacking or incorrect (*2)
		 9738-13319 	lots of glyphs are lacking or incorrect (*3)
		13320-15443	all glyphs are variants or lacking (*4)
(*1) not pre-rotated
(*2) not italic form
(*3) many ligature, pre-rotated, pre-rotated and italic form glyphs
(*4) lots of variants are assigned substitutes


Adobe-Japan2 CID-keyed font composed from Japanese Unicode TTF
--------------------------------------------------------------
[ROS]		[CID range]	[Comment]
Adobe-Japan2-0	    0- 6067	no problem


Adobe-Korea1 CID-keyed font composed from Korean Unicode TTF
------------------------------------------------------------
[ROS]		[CID range]	[Comment]
Adobe-Korea1-0	    0-  357	some glyphs are lacking or incorrect:
				61,97,100,104,111,227
		  358- 3435	no problem
		 3436- 8055	no problem
		 8056- 8190	lots of glyphs are lacking or incorrect:
				8059,8061,8075,8083-8085,8089,8091,8093,8190
		 8191- 9332	no problem
Adobe-Korea1-1	 9333-18154	perhaps no problem, but cannot check (*)
Adobe-Korea1-2	18155-18351	some glyphs are lacking
(*) Technical Note on Adobe-Korea1-1,2 has not been published yet[6].

The current mapping algorithm based on ToCID CMaps and ToUnicode CMaps still has problems. The gs-cjk project[7] is considering how to settle the matters.

References

  1. Microsoft Corporation, "OpenType specification" http://www.asia.microsoft.com/typography/otspec/
  2. Adobe Systems Incorporated, "Adobe-CNS1-4 Character Collection for CID-Keyed Fonts", Technical Note #5080 http://partners.adobe.com/asn/developer/pdfs/tn/5080.Adobe-CNS1-4.pdf
  3. Adobe Systems Incorporated, "Adobe-GB1-4 Character Collection for CID-Keyed Fonts", Technical Note #5079 http://partners.adobe.com/asn/developer/pdfs/tn/5079.Adobe-GB1-4.pdf
  4. Adobe Systems Incorporated, "Adobe-Japan1-4 Character Collection for CID-Keyed Fonts", Technical Note #5078 http://partners.adobe.com/asn/developer/pdfs/tn/5078.Adobe-Japan1-4.pdf
  5. Adobe Systems Incorporated, "Adobe-Japan2-0 Character Collection for CID-Keyed Fonts", Technical Note #5097 http://partners.adobe.com/asn/developer/pdfs/tn/5097.Adobe-Japan2-0.pdf
  6. Adobe Systems Incorporated, "Adobe-Korea1-0 Character Collection for CID-Keyed Fonts" Technical Note #5093 http://partners.adobe.com/asn/developer/pdfs/tn/5093.Adobe-Korea1-0.pdf
  7. Taiji Yamada, "Tips on PostScript" http://www.aihara.co.jp/~taiji/tops/
  8. "gs-cjk project" http://www.gyve.org/gs-cjk/

Copyright © 2001 Taiji Yamada <taiji@aihara.co.jp> and gs-cjk project.

Copyright © 2002 artofcode LLC. All rights reserved.

This file is part of GNU Ghostscript. See the GNU General Public License (the "License") for full details of the terms of using, copying, modifying, and redistributing GNU Ghostscript.

Ghostscript version 7.07, 17 May 2003