README.TXT

(26 KB) Pobierz
##Adobe File Version: 1.000
#=======================================================================
#   FTP file name:  README.TXT
#
#   Contents:       Background information on Unicode mapping tables
#                   for Mac OS text encodings
#
#   Copyright:      (c) 1995-1999 by Apple Computer, Inc., all rights
#                   reserved.
#
#   Contact:        charsets@apple.com
#
#   Changes:
#
#       b02  1999-Sep-22    Update information on Cyrillic. Update
#                           contact e-mail address.
#       n07  1998-Feb-05    Rewrite to provide additional information
#                           relevant to using the accompanying mapping
#                           tables, and to delete some extraneous
#                           information. Delete Bulgarian (no special
#                           encoding, uses standard Cyrillic), add
#                           Farsi, Devanagari, Gurmukhi, Gujarati,
#                           Celtic, Gaelic, Inuit, Tibetan.
#       n04  1995-Nov-15    Update info for Hebrew and Thai
#       n03  1995-Apr-15    First version (after fixing some typos).
#
##################

0. Preliminaries
----------------

For maximum interchangeability, this file and the accompanying Mac OS 
mapping tables use only ASCII characters. They are intended to be 
displayed in a monospaced font.

Apple, the Apple logo, Mac, and Macintosh are trademarks of Apple 
Computer, Inc., registered in the United States and other countries. 
QuickDraw and TrueType are trademarks of Apple Computer, Inc. Unicode is 
a trademark of Unicode Inc. PostScript is a trademark of Adobe Systems 
Inc., which may be registered in certain jurisdictions. IBM is a 
registered trademark of International Business Machines Corporation. ITC 
Zapf Dingbats is a registered trademark of the International Typeface 
Corporation. For the sake of brevity, throughout this document and the 
accompanying tables, "Macintosh" can be used to refer to Macintosh
computers and "Unicode" can be used to refer to the Unicode standard.

Apple Computer, Inc. ("Apple") makes no warranty or representation, 
either express or implied, with respect to this document and the 
accompanying tables, their quality, accuracy, or fitness for a 
particular purpose. In no event will Apple be liable for direct, 
indirect, special, incidental, or consequential damages resulting from 
any defect or inaccuracy in this document or the accompanying tables.

1. Introduction
---------------

This document summarizes some Unicode mapping considerations that are
relevant for the accompanying mapping tables. It also provides an
overview of Mac OS encodings.

These mapping tables and character lists are subject to change.
The latest tables should be available from the following:

<ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/>
<ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/>

2. Round-trip fidelity and overview of mapping techniques
---------------------------------------------------------

For a particular set of national and international standards, Unicode
provides round-trip fidelity: Text in one of those encodings can be
mapped to Unicode and back again, yielding the original characters.
Characters which are distinct in one of these source standards have 
a distinct counterpart in Unicode. Note that this counterpart might not
be a single Unicode character; as is pointed out in "The Unicode
Standard, Version 2.0" (page 2-10), "sometimes a single code value in
another standard corresponds to a sequence of code values in the Unicode
Standard, or vice versa."

However, Unicode does not attempt to provide round-trip fidelity for 
most vendor standards. Nevertheless, Apple and other platform vendors 
may need to provide such round-trip fidelity for their current encodings 
(this can be important in file systems, for example). In order to do 
this, Apple makes use of some Unicode characters in the corporate-use
zone (the upper end of the private use area).

Corporate-zone characters must be used with care. Indiscriminate use of
such characters can result in text which is not easily interchanged with
other systems, since these characters have no standard meaning outside a
particular platform. The mappings provided here are intended to minimize
the use of private use characters, or to use them in such a way that
basic text content will not be lost if the corporate zone characters are
dropped when text is transferred to another system.

The tables provided here have three goals, in the following order of
importance:
1. Provide 100% round-trip mapping from a Mac OS encoding to Unicode
and back (even if the mappings here are converted to maximal
decompositions, see below).
2. Map characters in a Mac OS encoding into the Unicode characters
that best represent the interpretation and usage of the Mac OS
characters.
3. When mapping text in a Mac OS encoding to Unicode using the tables,
the resulting Unicode text should be as interchangeable as possible.

To satisfy these goals, the mappings use a variety of techniques. First
we attempt to achieve round-trip mappings using any standard Unicode
feature at our disposal, without resorting to corporate-zone characters.
This can includes the following techniques:
- Use of all Unicode characters defined in Unicode 2.1, including
  compatibility characters.
- Mapping a single character in a Mac OS encoding to a sequence of
  standard Unicode characters, or vice versa. This requires grouping
  characters into appropriate chunks for lookup before mapping them
  (this mainly applies to sequences of Unicode characters).
- Using Unicode direction overrides to force direction attributes when
  mapping to Unicode. This requires resolution of Unicode character
  direction, and use of this information, when mapping from Unicode back
  to certain Mac OS encodings.
The requirements imposed on Unicode handling are necessary for other,
non-transcoding operations in a full Unicode implementation anyway, so
requiring them for transcoding should not impose much of a burden.

Next, if round-trip fidelity cannot be achieved using the above
techniques, we attempt to use corporate-zone characters only as
"transcoding hints" (more on this below). These are combined with one or
more standard Unicode characters to mark them as special for
transcoding, but have no other function and can be deleted with no loss
of basic text content (only of round-trip fidelity).

Finally, if a character in a Mac OS encoding is unrelated to any Unicode
or Unicode sequence, we may map it to a single corporate-zone Unicode
code point.

These techniques are described in more detail in the following sections.

Some clients of these tables may have a different set of goals. For
example, some clients may prefer to avoid compatibility characters,
perhaps sacrificing round-trip fidelity if necessary. In most cases it
is fairly easy to construct other types of mappings from the mappings
given here. In particular, the mappings here have been designed so that
if they are converted to maximal decomposition mappings (by recursive
application of the canonical decompositions in the Unicode database),
the resulting mappings will still provide 100% roundtrip fidelity.

There is one more round-trip issue that should be mentioned. If a
Unicode character or sequence can be mapped at all into a particular
Mac encoding, then the reverse mapping back to Unicode should yield
the original Unicode character or sequence (except for possible 
differences in direction overrides or other Unicode characters in the
"Other, Format" category). The tables here also provide this. For a
related issue, see the next section.

3. Mapping tolerance: Strict and loose
--------------------------------------

In many character sets, a single character may have multiple semantics, 
either by explicit definition, ambiguous definition, or established 
usage. For example, the JIS character 0x2142, or 0x8161 in Shift-JIS, 
is specified in the JIS X0208 standard to have two meanings: "double 
vertical line" and "parallel". Each of these meanings corresponds to a 
different Unicode character: 0x2016 DOUBLE VERTICAL LINE and 0x2225 
PARALLEL TO. When mapping from Unicode to Shift-JIS, it is normally 
desirable to map both of these Unicode characters to the single
Shift-JIS character. However, when mapping the Shift-JIS character to
Unicode, we can choose only one of the possible Unicode characters.

For two encodings X and Y, we can define a set of "strict" mappings
from one to the other as follows: If text in X can be mapped to Y using
the strict mappings from X to Y, then the resulting text can be mapped
back using the strict mappings from Y to X to end up with the original
text from X. Similarly, if text in Y can be mapped to X using the strict
mappings from Y to X, then the resulting text can be mapped back using
the strict mappings from X to Y to end up with the original text from Y.

There may be several characters in one encoding that all map to a
single character in another encoding, but only one of these mappings
can be strict; the others are "loose".

The mappings given in the accompanying tables are strict mappings.
However, the Mac OS Text Encoding Converter also supports loose
mappings and fallback mappings. Some of the accompanying tables provide
suggestions about possible loose mappings.

4. Mapping a Mac encoding character to a Unicode sequence or vice versa
-----------------------------------------------------------------------

In some cases, a character in a Mac OS encoding maps to a sequence of
Unicode characters. For example, the Mac OS Japanese encoding includes
a character for the circled CJK ideograph "big". Although Unicode
encodes other circl...
Zgłoś jeśli naruszono regulamin