##Adobe File Version: 1.000 #======================================================================= # FTP file name: CHINTRAD.TXT # # Contents: Map (external version) from Mac OS Chinese # Traditional encoding to Unicode 2.1 # # Copyright: (c) 1996-1999 by Apple Computer, Inc., all rights # reserved. # # Contact: charsets@apple.com # # Changes: # # b02 1999-Sep-22 Update contact e-mail address. Matches # internal utom<b2>, ufrm<b3>, and Text # Encoding Converter version 1.5. # n07 1998-Feb-05 Just rewrite initial header comments and # reorder so all one-byte characters are # first; no mapping changes. Matches internal # utom<n7>, ufrm<n8> and Text Encoding # Converter version 1.3. # n03 1996-Aug-22 Matches internal ufrm<n1>. # n00 1996-Jul-31 # # Standard header: # ---------------- # # Apple, the Apple logo, and Macintosh are trademarks of Apple # Computer, Inc., registered in the United States and other countries. # Unicode is a trademark of Unicode Inc. For the sake of brevity, # throughout this document, "Macintosh" can be used to refer to # Macintosh computers and "Unicode" can be used to refer to the # Unicode standard. # # Apple makes no warranty or representation, either express or # implied, with respect to these tables, their quality, accuracy, or # fitness for a particular purpose. In no event will Apple be liable # for direct, indirect, special, incidental, or consequential damages # resulting from any defect or inaccuracy in this document or the # accompanying tables. # # These mapping tables and character lists are subject to change. # The latest tables should be available from the following: # # <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> # <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/> # # For general information about Mac OS encodings and these mapping # tables, see the file "README.TXT". # # Format: # ------- # # Three tab-separated columns; # '#' begins a comment which continues to the end of the line. # Column #1 is the Mac OS Chinese Traditional code (in hex as 0xNN # or 0xNNNN) # Column #2 is the corresponding Unicode or Unicode sequence (in # hex as 0xNNNN or 0xNNNN+0xNNNN). Sequences of up to 2 # Unicode characters are used here. # Column #3 is a comment containing the Unicode name. # In some cases an additional comment follows the Unicode name. # # The entries are in Mac OS Chinese Traditional code order. # All one-byte characters are at the beginning of the first section. # # Some of these mappings require the use of corporate characters. # See the file "CORPCHAR.TXT" and notes below. # # Control character mappings are not shown in this table, following # the conventions of the standard UTC mapping tables. However, the # Mac OS Chinese Traditional encoding uses the standard control # characters at 0x00-0x1F and 0x7F. # # Notes on Mac OS Chinese Traditional: # ------------------------------------ # # This table covers the Mac OS Chinese Traditional encoding used in # Mac OS versions 7.1 and later, including the Chinese Language Kit. # The Mac OS Chinese Traditional encoding is based on Big 5, but it # changes the high-byte range and adds a few one-byte characters. # # For Mac OS Chinese Traditional, two-byte characters have # first/lead/high byte in the range 0xA1-0xFC, and second/trail/low # byte in the range 0x40-0x7E or 0xA1-0xFE. # # 1. Standard Big 5 # # Some of the information below comes from Ken Lunde's document # "CJK.INF Version 2.1", available at # <ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf>. # # Big 5 is not a formal standard, but rather a de facto industry # standard in Taiwan, with a few variants. It includes: # # a) One-byte ASCII characters # # b) Two-byte characters # # In standard Big 5, two-byte characters have first/lead/high # byte in the range 0xA1-0xFE, and second/trail/low byte in the # range 0x40-0x7E or 0xA1-0xFE. These include: # # - 0xA140-0xA3BF, various punctuation, symbol, number, separator, # and letter characters (plus a few ideographs mixed in). # - 0xA440-0xC67E, "ideographic" characters (5401 level 1 Hanzi) # - 0xC940-0xF9D5, "ideographic" characters (7652 level 2 Hanzi) # # Some versions of Big 5 include a basic extension set: # # - 0xC6A1-0xC7FC, Hiragana, Katakana & Cyrillic letters, and # circled & parenthesized digits # # The ETEN version of Big 5 (perhaps the most widely used) # includes a different extension set: # # - 0xC6A1-0xC8D3?, Hiragana, Katakana & Cyrillic letters, # circled & parenthesized digits, lowercase Roman numerals, # classic radicals, fractions, and various symbols # - 0xF9D6-0xF9FE?, more box drawing elements, more Hanzi # # 2. Mac OS Chinese Traditional changes and additions # # The Apple implementation does not include either of the extension # sets described above. In addition, it shortens the high-byte range # so the first/lead/high bytes of two-byte characters are limited to # the range 0xA1-0xFC. Finally, it adds the following one-byte # characters: # # 0x80 REVERSE SOLIDUS, alternate # 0x81 height-metric character (see below) # 0x82 width-metric character (see below) # 0xA0 NO-BREAK SPACE # 0xFD COPYRIGHT SIGN # 0xFE TRADE MARK SIGN # 0xFF HORIZONTAL ELLIPSIS # # The two characters at 0x81 and 0x82 are somewhat special. These # are one-byte characters whose glyphs have the same metrics as the # glyphs for the two-byte characters. This way application developers # can use QuickDraw functions such as CharWidth to determine the # metrics of the two-byte character glyphs in a particular font. # 0x81 a character whose glyph has the height of a two-byte # character glyph. # 0x82 a character whose glyph has the advance width of a two- # byte character glyph. Note: For old-style (FBIT/FDEF) # bitmap fonts, the width of this glyph is *half* the width # of the two-byte character glyphs. # # Unicode mapping issues and notes: # --------------------------------- # # 1. Problems with UTC mappings # # The Unicode mappings for the Big 5 characters are based on the # Big-5 mapping table provided by the Unicode Consortium (UTC), # dated 11 February 1994, which was created by Glenn Adams and # John Jenkins. That table is Copyright 1991-1994 by Unicode, Inc. # # However, in that table Glenn Adams and John Jenkins note that it is # "currently impossible to provide round-trip compatibility between # BIG5 and Unicode." Not all of the characters in Big 5 correspond to # distinct, single Unicode characters. # # The UTC table does not provide any mappings for several Big 5 # characters, because of conflicts with the mappings for other # characters. As listed in the comments with the UTC table, these are: # # Big-5 code Description UTC table comments # # 0xA15A SPACING UNDERSCORE duplicates A1C4 # 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode # 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode # 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC # 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD # 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping # 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping # # In addition, the UTC table maps the following characters to Unicodes # which are not completely correct, in order to avoid conflicts with # other mappings: # # 0xA14D-0xA154 alternate punctuation forms for horizontal text or for # PRC-style vertical text (different period position than # in Taiwan); UTC table maps these to small forms # 0xA17D-0xA1A4 alternate (centered) forms for paired punctuation; UTC # table maps these to small forms # 0xA1CB bolder version of 0xA1CA, WAVY OVERLINE; UTC table maps # this to DOUBLE WAVY OVERLINE # 0xA279 duplicate of 0xA278, BOX DRAWINGS LIGHT VERTICAL; UTC # table maps this to RIGHT ONE EIGHTH BLOCK, even though # it is centered in the cell (not on the right). # # 2. Use of private use characters in Apple mappings # # The Apple mappings address the above problems in a different way. The # goals in the Apple mappings provided here are: # - Ensure roundtrip mapping from every character in the Mac OS Chinese # Traditional encoding to Unicode and back # - Use standard Unicode characters as much as possible, to maximize # interchangeability of the resulting Unicode text. Whenever possible, # avoid having content carried by private-use characters. # # To satisfy both goals, we use private use characters to mark variants # that are similar to a sequence of one or more standard Unicode characters. # # Apple has defined a block of 32 corporate characters as "transcoding # hints." These are used in combination with standard Unicode characters #...
dzidziaz