|
|
This article is written by Taiji Yamada <taiji@aihara.co.jp>. He takes full responsibility for the wording and content of this article.
For other information, see the Ghostscript overview.
This note provides information on utilization of CJK (Chinese, Japanese and Korean) TrueType fonts (TTF) as CIDFoneType 2 (Type11) CID-keyed fonts from the viewpoint of its validity and limitation. In order to compose CIDFontType 2 font from CJK TTF on the fly, Ghostscript uses not only standard CMaps (mapping from character encoding to CID, we call them ToCID CMaps in following) but also ToUnicode CMaps (mapping from CID to Unicode) and ToCode CMaps (mapping from CID to non-Unicode character encodings) which are freely distributed with Acrobat Reader by Adobe Systems Incorporated. For detail of ToUnicode CMaps, refer
Adobe Systems Incorporated, "PDF Reference, Third Edition, Version 1.4", p. 368 http://partners.adobe.com/asn/developer/acrosdk/docs/filefmtspecs/PDFReference.zipToUnicode and ToCode CMaps are designed for text-searching in PDF, and you should not expect Adobe ToUnicode CMap as a one-to-one map between Adobe glyph collection and glyph/character in Unicode. The current algorithm of Ghostscript to map CIDs to TTF glyph ID completely depends on Adobe CMaps, TrueType cmap and GSUB tables.
Adobe Systems Incorporated, "ToUnicode Mapping File Tutorial", Technical Note #5411
http://partners.adobe.com/asn/developer/pdfs/tn/5411.ToUnicode.pdf
The current revision of on-the-fly CIDFontType 2 technology
supports the following kinds Registry-Ordering (RO) of CID-keyed
fonts:
[RO]
Adobe-CNS1
Adobe-GB1
Adobe-Japan1
Adobe-Japan2
Adobe-Korea1
and doesn't support the following kinds of CID-keyed fonts:
[RO]
Adobe-CNS2
Adobe-HongKong1
Adobe-Korea2
Adobe-Vietnam1
The current revision can handle the following kinds (Encoding in cmap
table) of TrueType fonts as CID-keyed fonts:
[Encoding] [RO]
Unicode Adobe-*
ShiftJIS Adobe-Japan1
PRC Adobe-GB1
Big5 Adobe-CNS1
Wansung Adobe-Korea1
Johab Adobe-Korea1
and doesn't support UCS-4 Encoding TrueType fonts for the present. In
the case of Unicode Encoding, RO can be detected by reading ``Code
Page Character Range'' of OS/2 table of TTF as follows:
[Encoding] [Code Page] [RO]
Unicode Japanese Adobe-Japan1
Simplified Chinese Adobe-GB1
Korean Wansung Adobe-Korea1
Traditional Chinese Adobe-CNS1
Korean Johab Adobe-Korea1
For each combination of RO and TTF Encoding, following Adobe CMaps are
applied:
[RO-Encoding] [Supplement limit]
[used CMap] - [Comment]
Adobe-CNS1-Big5 0
Adobe-CNS1-ETen-B5 - ToCode CMap
ETen-B5-V - ToCID CMap
ETen-B5-H - ToCID CMap
Adobe-CNS1-Unicode 3
Adobe-CNS1-UCS2 - ToUnicode CMap
UniCNS-UCS2-V - ToCID CMap
UniCNS-UCS2-H - ToCID CMap
Adobe-GB1-PRC 2
Adobe-GB1-GBK-EUC - ToCode CMap
GBK-EUC-V - ToCID CMap
GBK-EUC-H - ToCID CMap
Adobe-GB1-Unicode 4
Adobe-GB1-UCS2 - ToUnicode CMap
UniGB-UCS2-V - ToCID CMap
UniGB-UCS2-H - ToCID CMap
Adobe-Japan1-ShiftJIS 2
Adobe-Japan1-90ms-RKSJ - ToCode CMap
90ms-RKSJ-V - ToCID CMap
90ms-RKSJ-H - ToCID CMap
Adobe-Japan1-Unicode 4
Adobe-Japan1-UCS2 - ToUnicode CMap
UniJIS-UCS2-V - ToCID CMap
UniJIS-UCS2-H - ToCID CMap
Adobe-Japan2-Unicode 0
UniHojo-UCS2-V - ToCID CMap
UniHojo-UCS2-H - ToCID CMap
Adobe-Korea1-Johab 1
KSC-Johab-V - ToCID CMap
KSC-Johab-H - ToCID CMap
Adobe-Korea1-Unicode 2
Adobe-Korea1-UCS2 - ToUnicode CMap
UniKS-UCS2-V - ToCID CMap
UniKS-UCS2-H - ToCID CMap
Adobe-Korea1-Wansung 1
Adobe-Korea1-KSCms-UHC - ToCode CMap
KSCms-UHC-V - ToCID CMap
KSCms-UHC-H - ToCID CMap
where Supplement values are denoted as the limit determined by the
maximum CID in used CMaps.
The Glyph Substitution table (GSUB) of TTF, Single Substitution Format 2 is read for vertically-used glyphs in CIDs. The current revision doesn't handle any other formats of GSUB, so handling ligatures and variants as CID-keyed fonts might be tasks to be solved in future.
In recent CID-keyed fonts, pre-rotated Latin glyphs are defined, but the current revision merely maps to normal Latin glyphs. Ghostscript cannot handle them at present.
Following tables and comments provide the details of validity and
limitation for the individual kinds of CID-keyed fonts composed from
generally-circulated and Unicode TrueType fonts at the current
revision. Naturally, these results of glyphs lacking are affected by
TrueType fonts you use.
Adobe-CNS1 CID-keyed font composed from Traditional Chinese Unicode TTF
-----------------------------------------------------------------------
[ROS] [CID range] [Comment]
Adobe-CNS1-0 0- 505 96,97,124-127,228,260 are lacking
506- 561 no problem
562- 594 all glyphs are lacking
595-13645 no problem
13646-13748 13646,13647 are lacking
13749-13998 13996-13998 are lacking
13999-14098 some glyphs are lacking
Adobe-CNS1-1 14099-17407 lots of glyphs are lacking (*1)
Adobe-CNS1-2 17408-17600 17503,17504 are lacking (*2)
Adobe-CNS1-3 17601-17605 17603 is lacking
17606-18845 lots of glyphs are lacking (*3)
Adobe-CNS1-4 18846-18961 all glyphs assignment is impossible (*4)
(*1) HK GCCS
(*2) not pre-rotated
(*3) HK SCS
(*4) HK SCS (unused in UniCNS-UCS-2 CMap, though used in UniCNS-UTF8,
UniCNS-UTF16, UniCNS-UTF32, also ETHK-B5, needless to say HKscs-B5)
Adobe-GB1 CID-keyed font composed from Simplified Chinese Unicode TTF
---------------------------------------------------------------------
[ROS] [CID range] [Comment]
Adobe-GB1-0 0- 939 99,695,698,737,935,938 are lacking
940- 7702 no problem
7703- 7716 7705,7708 are incorrect
Adobe-GB1-1 7717- 9896 no problem
Adobe-GB1-2 9897-22126 no problem
Adobe-GB1-3 22127-22352 22347,22350,22352 are lacking (*1)
Adobe-GB1-4 22353-22427 all glyphs are not available (*2)
22428-29058 all glyphs are not available (*3)
29059-29063 all glyphs are not available (*4)
(*1) not pre-rotated
(*2) additional Hiragana and Katakana, extended Bopomofo glyphs
(*3) the Unified Han Ideographs Extension A
(*4) pre-rotated glyphs
Adobe-Japan1 CID-keyed font composed from Japanese Unicode TTF
--------------------------------------------------------------
[ROS] [CID range] [Comment]
Adobe-Japan1-0 0- 1124 lots of glyphs are lacking or incorrect:
96-98,127,128,130-133,135-137,226,326,
390,396,422,424,502,506-509,512,513,515,
606,607,632
1125- 7477 no problem
7478- 7632 7478 is lacking and 7608,7609 are incorrect
7633- 8004 lots of glyphs are lacking or incorrect (*4)
8005- 8283 lots of glyphs are lacking or incorrect:
8008,8053,8059-8061,8091,8102-8111,8166-8181,
8189,8190,8227-8229,8260
Adobe-Japan1-1 8284- 8358 lots of glyphs are lacking or incorrect:
8295-8297,8300-8302,
8306,8307,8321,8322,8325,8326
Adobe-Japan1-2 8359- 8717 no problem
8718- 8719 8718 is lacking and 8719 is incorrect
Adobe-Japan1-3 8720- 9353 some glyphs are lacking or incorrect (*1)
Adobe-Japan1-4 9354- 9737 some glyphs are lacking or incorrect (*2)
9738-13319 lots of glyphs are lacking or incorrect (*3)
13320-15443 all glyphs are variants or lacking (*4)
(*1) not pre-rotated
(*2) not italic form
(*3) many ligature, pre-rotated, pre-rotated and italic form glyphs
(*4) lots of variants are assigned substitutes
Adobe-Japan2 CID-keyed font composed from Japanese Unicode TTF
--------------------------------------------------------------
[ROS] [CID range] [Comment]
Adobe-Japan2-0 0- 6067 no problem
Adobe-Korea1 CID-keyed font composed from Korean Unicode TTF
------------------------------------------------------------
[ROS] [CID range] [Comment]
Adobe-Korea1-0 0- 357 some glyphs are lacking or incorrect:
61,97,100,104,111,227
358- 3435 no problem
3436- 8055 no problem
8056- 8190 lots of glyphs are lacking or incorrect:
8059,8061,8075,8083-8085,8089,8091,8093,8190
8191- 9332 no problem
Adobe-Korea1-1 9333-18154 perhaps no problem, but cannot check (*)
Adobe-Korea1-2 18155-18351 some glyphs are lacking
(*) Technical Note on Adobe-Korea1-1,2 has not been published yet[6].
The current mapping algorithm based on ToCID CMaps and ToUnicode CMaps still has problems. The gs-cjk project[7] is considering how to settle the matters.
Copyright © 2001 Taiji Yamada <taiji@aihara.co.jp> and gs-cjk project.
Copyright © 2002 artofcode LLC. All rights reserved.
This file is part of GNU Ghostscript. See the GNU General Public License (the "License") for full details of the terms of using, copying, modifying, and redistributing GNU Ghostscript.
Ghostscript version 7.07, 17 May 2003