CHINTRAD.TXT

(328 KB) Pobierz
##Adobe File Version: 1.000
#=======================================================================
#   FTP file name:  CHINTRAD.TXT
#
#   Contents:       Map (external version) from Mac OS Chinese
#                   Traditional encoding to Unicode 2.1
#
#   Copyright:      (c) 1996-1999 by Apple Computer, Inc., all rights
#                   reserved.
#
#   Contact:        charsets@apple.com
#
#   Changes:
#
#       b02  1999-Sep-22    Update contact e-mail address. Matches
#                           internal utom<b2>, ufrm<b3>, and Text
#                           Encoding Converter version 1.5.
#       n07  1998-Feb-05    Just rewrite initial header comments and
#                           reorder so all one-byte characters are
#                           first; no mapping changes. Matches internal
#                           utom<n7>, ufrm<n8> and Text Encoding
#                           Converter version 1.3.
#       n03  1996-Aug-22    Matches internal ufrm<n1>.
#       n00  1996-Jul-31
#
# Standard header:
# ----------------
#
#   Apple, the Apple logo, and Macintosh are trademarks of Apple
#   Computer, Inc., registered in the United States and other countries.
#   Unicode is a trademark of Unicode Inc. For the sake of brevity,
#   throughout this document, "Macintosh" can be used to refer to
#   Macintosh computers and "Unicode" can be used to refer to the
#   Unicode standard.
#
#   Apple makes no warranty or representation, either express or
#   implied, with respect to these tables, their quality, accuracy, or
#   fitness for a particular purpose. In no event will Apple be liable
#   for direct, indirect, special, incidental, or consequential damages 
#   resulting from any defect or inaccuracy in this document or the
#   accompanying tables.
#
#   These mapping tables and character lists are subject to change.
#   The latest tables should be available from the following:
#
#   <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
#   <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>
#
#   For general information about Mac OS encodings and these mapping
#   tables, see the file "README.TXT".
#
# Format:
# -------
#
#   Three tab-separated columns;
#   '#' begins a comment which continues to the end of the line.
#     Column #1 is the Mac OS Chinese Traditional code (in hex as 0xNN
#       or 0xNNNN)
#     Column #2 is the corresponding Unicode or Unicode sequence (in
#       hex as 0xNNNN or 0xNNNN+0xNNNN). Sequences of up to 2
#       Unicode characters are used here.
#     Column #3 is a comment containing the Unicode name.
#       In some cases an additional comment follows the Unicode name.
#
#   The entries are in Mac OS Chinese Traditional code order.
#   All one-byte characters are at the beginning of the first section.
#
#   Some of these mappings require the use of corporate characters.
#   See the file "CORPCHAR.TXT" and notes below.
#
#   Control character mappings are not shown in this table, following
#   the conventions of the standard UTC mapping tables. However, the
#   Mac OS Chinese Traditional encoding uses the standard control
#   characters at 0x00-0x1F and 0x7F.
#
# Notes on Mac OS Chinese Traditional:
# ------------------------------------
#
#   This table covers the Mac OS Chinese Traditional encoding used in
#   Mac OS versions 7.1 and later, including the Chinese Language Kit.
#   The Mac OS Chinese Traditional encoding is based on Big 5, but it
#   changes the high-byte range and adds a few one-byte characters.
#
#   For Mac OS Chinese Traditional, two-byte characters have
#   first/lead/high byte in the range 0xA1-0xFC, and second/trail/low
#   byte in the range 0x40-0x7E or 0xA1-0xFE.
#
# 1. Standard Big 5
#
#    Some of the information below comes from Ken Lunde's document
#    "CJK.INF Version 2.1", available at
#    <ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf>.
#
#    Big 5 is not a formal standard, but rather a de facto industry
#    standard in Taiwan, with a few variants. It includes:
#
#    a) One-byte ASCII characters
#
#    b) Two-byte characters
#
#       In standard Big 5, two-byte characters have first/lead/high
#       byte in the range 0xA1-0xFE, and second/trail/low byte in the
#       range 0x40-0x7E or 0xA1-0xFE. These include:
#
#       - 0xA140-0xA3BF, various punctuation, symbol, number, separator,
#         and letter characters (plus a few ideographs mixed in).
#       - 0xA440-0xC67E, "ideographic" characters (5401 level 1 Hanzi)
#       - 0xC940-0xF9D5, "ideographic" characters (7652 level 2 Hanzi)
#
#       Some versions of Big 5 include a basic extension set:
#
#       - 0xC6A1-0xC7FC, Hiragana, Katakana & Cyrillic letters, and
#         circled & parenthesized digits
#
#       The ETEN version of Big 5 (perhaps the most widely used)
#       includes a different extension set:
#
#       - 0xC6A1-0xC8D3?, Hiragana, Katakana & Cyrillic letters,
#         circled & parenthesized digits, lowercase Roman numerals,
#         classic radicals, fractions, and various symbols
#       - 0xF9D6-0xF9FE?, more box drawing elements, more Hanzi 
#
# 2. Mac OS Chinese Traditional changes and additions
#
#    The Apple implementation does not include either of the extension
#    sets described above. In addition, it shortens the high-byte range
#    so the first/lead/high bytes of two-byte characters are limited to
#    the range 0xA1-0xFC. Finally, it adds the following one-byte
#    characters:
#
#      0x80  REVERSE SOLIDUS, alternate
#      0x81  height-metric character (see below)
#      0x82  width-metric character (see below)
#      0xA0  NO-BREAK SPACE
#      0xFD  COPYRIGHT SIGN
#      0xFE  TRADE MARK SIGN
#      0xFF  HORIZONTAL ELLIPSIS
#
#    The two characters at 0x81 and 0x82 are somewhat special. These
#    are one-byte characters whose glyphs have the same metrics as the
#    glyphs for the two-byte characters. This way application developers
#    can use QuickDraw functions such as CharWidth to determine the
#    metrics of the two-byte character glyphs in a particular font.
#      0x81  a character whose glyph has the height of a two-byte
#            character glyph.
#      0x82  a character whose glyph has the advance width of a two-
#            byte character glyph. Note: For old-style (FBIT/FDEF)
#            bitmap fonts, the width of this glyph is *half* the width
#            of the two-byte character glyphs.
#
# Unicode mapping issues and notes:
# ---------------------------------
#
# 1. Problems with UTC mappings
#
#    The Unicode mappings for the Big 5 characters are based on the
#    Big-5 mapping table provided by the Unicode Consortium (UTC),
#    dated 11 February 1994, which was created by Glenn Adams and
#    John Jenkins. That table is Copyright 1991-1994 by Unicode, Inc.
#
#    However, in that table Glenn Adams and John Jenkins note that it is
#    "currently impossible to provide round-trip compatibility between
#    BIG5 and Unicode." Not all of the characters in Big 5 correspond to
#    distinct, single Unicode characters.
#
#    The UTC table does not provide any mappings for several Big 5
#    characters, because of conflicts with the mappings for other
#    characters. As listed in the comments with the UTC table, these are:
#
#       Big-5 code  Description                    UTC table comments
#
#       0xA15A      SPACING UNDERSCORE             duplicates A1C4
#       0xA1C3      SPACING HEAVY OVERSCORE        not in Unicode
#       0xA1C5      SPACING HEAVY UNDERSCORE       not in Unicode
#       0xA1FE      LT DIAG UP RIGHT TO LOW LEFT   duplicates A2AC
#       0xA240      LT DIAG UP LEFT TO LOW RIGHT   duplicates A2AD
#       0xA2CC      HANGZHOU NUMERAL TEN           conflicts with A451 mapping
#       0xA2CE      HANGZHOU NUMERAL THIRTY        conflicts with A4CA mapping
#
#    In addition, the UTC table maps the following characters to Unicodes
#    which are not completely correct, in order to avoid conflicts with
#    other mappings:
#
#       0xA14D-0xA154  alternate punctuation forms for horizontal text or for
#                      PRC-style vertical text (different period position than
#                      in Taiwan); UTC table maps these to small forms
#       0xA17D-0xA1A4  alternate (centered) forms for paired punctuation; UTC
#                      table maps these to small forms
#       0xA1CB         bolder version of 0xA1CA, WAVY OVERLINE; UTC table maps
#                      this to DOUBLE WAVY OVERLINE
#       0xA279         duplicate of 0xA278, BOX DRAWINGS LIGHT VERTICAL; UTC
#                      table maps this to RIGHT ONE EIGHTH BLOCK, even though
#                      it is centered in the cell (not on the right).
#
# 2. Use of private use characters in Apple mappings
#
#    The Apple mappings address the above problems in a different way. The
#    goals in the Apple mappings provided here are:
#    - Ensure roundtrip mapping from every character in the Mac OS Chinese
#    Traditional encoding to Unicode and back
#    - Use standard Unicode characters as much as possible, to maximize
#    interchangeability of the resulting Unicode text. Whenever possible,
#    avoid having content carried by private-use characters.
#
#    To satisfy both goals, we use private use characters to mark variants
#    that are similar to a sequence of one or more standard Unicode characters.
#
#    Apple has defined a block of 32 corporate characters as "transcoding
#    hints." These are used in combination with standard Unicode characters
#...
Zgłoś jeśli naruszono regulamin