?? draft-ietf-idn-requirements-05.txt
字號:
IETF IDN Working Group Editors Zita Wenzel, James SengInternet Draft draft-ietf-idn-requirements-05.txt24 April 2001 Expires 24 October 2001 Requirements of Internationalized Domain NamesStatus of this MemoThis document is an Internet-Draft and is in full conformance withall provisions of Section 10 of RFC2026.Internet-Drafts are working documents of the Internet EngineeringTask Force (IETF), its areas, and its working groups. Note thatother groups may also distribute working documents asInternet-Drafts.Internet-Drafts are draft documents valid for a maximum of sixmonths and may be updated, replaced, or obsoleted by otherdocuments at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as"work in progress."The list of current Internet-Drafts can be accessed athttp://www.ietf.org/ietf/1id-abstracts.txtThe list of Internet-Draft Shadow Directories can be accessed athttp://www.ietf.org/shadow.html.Intended Scope The intended scope of this document is to explore requirements for theinternationalization of domain names on the Internet. It is notintended to document user requirements. It is recommended thatsolutions not necessarily be within the DNS itself, but could be a layerinterjected between the application and the DNS. Proposals SHOULDfulfill most, if not all, of the requirements. This document MAY beupdated based on clinical trials.AbstractThis document describes the requirement for encoding internationalcharacters into DNS names and records. This document is guidance fordeveloping protocols for internationalized domain names.1. IntroductionAt present, the encoding of Internet domain names is restricted to asubset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and manyother text based items on the Internet have already been at leastpartially internationalized. It is important for domain names to besimilarly internationalized or for an equivalent solution to be found.This document assumes that the most effective solution involves puttingnon-ASCII names inside some parts of the overall DNS system.This document is being discussed on the "idn" mailing list. To join thelist, send a message to <majordomo@ops.ietf.org> with the words"subscribe idn" in the body of the message. Archives of the mailinglist can also be found at ftp://ops.ietf.org/pub/lists/idn*.1.1 Definitions and ConventionsA language is a way that humans interact. In computerised form, a textin a written language can be expressed as a string of characters.The same set of characters can often be used for many written languages,and many written languages can be expressed using different scripts.The same characters are often shown with somewhat different glyphs(shapes) for display of a text depending on the font used, theautomatic shaping applied, or the automatic formation of ligatures. Inaddition, the same characters can be shown with somewhat differentglyphs (shapes) for display of a text depending on the language beingused, even within the same font or trough automatic font change.A character is a member of a set of elements used for organization,control, or representation of textual data.A graphic character is a character, other than a control function,that has a visual representation normally handwritten, printed, ordisplayed.Characters mentioned in this document are identified by their positionin the Unicode [UNICODE] character set. This character set is alsoknown as the UCS [ISO10646]. The notation U+12AB, for example, indicatesthe character at position 12AB (hexadecimal) in the Unicode characterset. Note that the use of this notation is not an indication of arequirement to use Unicode.Examples quoted in this document should be considered as a method tofurther explain the meanings and principles adopted by the document. Itis not a requirement for the protocol to satisfy the examples.Unicode Technical Report 17 [UTR17] defines a character encodingmodel in several levels (much of the text below is quoted fromUnicode Technical Report 17 [UTR17]):1. A abstract character repertoire (ACR) is defined as the set of abstract characters to be encoded, normally a familiar alphabet or symbol set. The word abstract just means that these objects are defined by convention (such as the 26 letters of the English alphabet, uppercase and lowercase forms). Examples: the ASCII repertoire, the Latin-15 repertoire, the JIS X 0208 repertoire, the UCS repertiore (of a particular version).2. A coded character set (CCS) is defined to be a mapping from a set of abstract characters to the set of non-negative integers. This range of integers need not be contiguous. An abstract character is defined to be in a coded character set if the coded character set maps from it to an integer. That integer is said to be the code point for the abstract character. That abstract character is then an encoded character. Examples: ASCII, Latin-15, JIS X 0208, the UCS.3. A character encoding form (CEF) is a mapping from the set of integers used in a CCS to the set of sequences of code units. A code unit is an integer occupying a specified binary width in a computer architecture, such as a septet, an octet, or a 16-bit unit. The encoding form enables character representation as actual data in a computer. The sequences of code units do not necessarily have the same length. Examples: ASCII, Latin-15, Shift-JIS, UTF-16, UTF-8.4. A character encoding scheme (CES) is a mapping of code units into serialized octet sequences. Character encoding schemes are relevant to the issue of cross-platform persistent data involving code units wider than a byte, where byte-swapping may be required to put data into the byte polarity canonical for a particular platform. The CES may involve two or more CCS's, and may include code units (e.g. single shifts, SI/SO, or escape sequences) that are not part of the CCS per se, but which are defined by the character encoding architecture and which may require an external registry of particular values (as for the ISO 2022 escape sequences). In such a case, the CES is called a compound CES. (A CES that only involves a single CCS is called a simple CES.) Examples: ASCII, Latin-15, Shift-JIS, UTF-16BE, UTF-16LE, UTF-8.5. The mapping from an abstract character repertoire (ACR) to a serialised sequence of octets is called a Character Map (CM). A simple character map thus implicitly includes a CCS, a CEF, and a CES, mapping from abstract characters to code units to octets. A compound character map includes a compound CES, and thus includes more than one CCS and CEF. In that case, the abstract character repertoire for the character map is the union of the repertoires covered by the coded character sets involved. Character Maps are the things that in the IAB architecture get IANA charset identifiers. A sequence of encoded characters must be unambiguously mapped onto a sequence of octets by the charset. The charset must be specified in all instances, as in Internet protocols, where textual content is treated as a ordered sequence of octets, and where the textual content must be reconstructible from that sequence of octets. Charset names are registered by the IANA according to procedures documented in [RFC2278]. In many cases, the same name is used for both a character map and for a character encoding scheme, such as UTF-16BE. Typically this is done for simple character maps when such usage is clear from context.6. A transfer encoding syntax (TES) is a reversible transform of encoded data which may (or may not) include textual data represented in one or more character encoding schemes. Examples: 8bit, Quoted-Printable, BASE64, UTF-7 (defunct), (UTF-5, and RACE).1.2 Description of the Domain Name SystemThe Domain Name System is defined by [RFC1034] and [RFC1035], withclarifications, extensions and modifications given in [RFC1123],[RFC1996], [RFC2181], and others. Of special importance here is thesecurity extensions described in [RFC2535] and companions.Over the years, many different words have been used to describe thecomponents of resource naming on the Internet (e.g., URI, URN); to makecertain that the set of terms used in this document are well-defined andnon-ambiguous, the definitions are given here.A master server for a zone holds the main copy of that zone. This copyis sometimes stored in a zone file. A slave server for a zone holds acomplete copy of the records for that zone. Slave servers MAY be eitherauthorized by the zone owner (secondary servers) or unauthorized(so-called "stealth secondaries"). Master and authorized slave serversare listed in the NS records for the zone, and are termed"authoritative" servers. In many contexts, outside this document theterm "primary" is used interchangeably with "master" and "secondary" isused interchangeably with "slave".A caching server holds temporary copies of DNS records; it uses recordsto answer queries about domain names. Further explanation of these termscan be found in [RFC1034] and [RFC1996].DNS names can be represented in multiple forms, with differentproperties for internationalization. The most important ones are:- Domain name: The binary representation of a name used internally in the DNS protocol. This consists of a series of components of 1-63 octets, with an overall length limited to 255 octets (including the length fields).- Master file format domain name: This is a representation of the name as a sequence of characters in some character sets; the common convention (derived from [RFC1035] section 5.1) is to represent the octets of the name as ASCII characters where the octet is in the set corresponding to the ASCII values for [a-zA-Z0-9-], using an escape mechanism (\x or \NNN) where not, and separating the components of the name by the dot character (".").The form specified for most protocols using the DNS is a limited form ofthe master file format domain name. This limited form is defined in[RFC1034] Section 3.5 and [RFC1123]. In most implementations ofapplications today, domain names in the Internet have been limited tothe much more restricted forms used, e.g., in email. Those names arelimited to the upper- and lower-case letters a-z (interpreted in acase-independent fashion), the digits, and the hyphen-minus, all inASCII.1.3 Definition of "hostname" and "Internationalized Domain Name"In the DNS protocols, a name is referred to as a sequence of octets.However, when discussing requirements for internationalized domainnames, what we are looking for are ways to represent characters thatare meaningful for humans.In this document, this is referred to as a "hostname". While this termhas been used for many different purposes over the years, it is usedhere in the sense of sequence of characters (not octets) representing adomain name conforming to the limited hostname syntax [RFC952].This document attempts to define the requirements for an"Internationalized Domain Name" (IDN). This is defined as a sequence ofcharacters that can be used in the context of functions where a hostnameis used today, but contains one or more characters that are outside theset of characters specified as legal characters for host names[RFC1123].1.4 A multilayer model of the DNS functionThe DNS can be seen as a multilayer function:- The bottom layer is where the packets are passed across the Internet in a DNS query and a DNS response. At this level, what matters is the format and meaning of bits and octets in a DNS packet.- Above that is the "DNS service", created by an infrastructure of DNS servers, NS records that point to those DNS servers, that is pointed to by the root servers (listed in the "root cache file" on each DNS server, often called "named.cache". It is at this level that the statement "the DNS has a single root" [RFC2826] makes sense, but still, what are being transferred are octets, not characters.- Interfacing to the user is a service layer, often called "the resolver library", and often embedded in the operating system or system libraries of the client machines. It is at the top of this layer that the API calls commonly known as "gethostbyname" and "gethostbyaddress" reside. These calls are modified to support IPv6 [RFC2553]. A conceptually similar layer exists in authoritative DNS servers, comprising the parts that generate "meaningful" strings in DNS files. Due to the popularity of the "master file" format, this layer often exists only in the administrative routines of the service maintainers.- The user of this layer (resolver library) is the application programs that use the DNS, such as mailers, mail servers, Web clients, Web servers, Web caches, IRC clients, FTP clients, distributed file systems, distributed databases, and almost all other applications on TCP/IP.Graphically, one can illustrate it like this:+---------------+ +---------------------+| Application | | (Base data) |+---------------+ +---------------------+ | Application service interface | | For ex. GethostbyXXXX interface | (no standard)+---------------+ +---------------------+| Resolver | | Auth DNS server |+---------------+ +---------------------+ | <----- DNS service interface -----> |+------------------------------------------------------------------+| DNS service || +-----------------------+ +--------------------+ || | Forwarding DNS server | | Caching DNS server | || +-----------------------+ +--------------------+ || || +-------------------------+ || | Parent-zone DNS servers | || +-------------------------+ || || +-------------------------+ || | Root DNS servers | || +-------------------------+ || |+------------------------------------------------------------------+1.5 Service model of the DNSThe Domain Name Service is used for multiple purposes, each of which ischaracterized by what it puts into the system (the query) and what itexpects as a result (the reply).The most used ones in the current DNS are:- Hostname-to-address service (A, AAAA, A6): Enter a hostname, and get back an IPv4 or IPv6 address.- Hostname-to-Mail server service (MX): As above, but the expected return value is a hostname and a priority for SMTP servers.- Address-to-hostname service (PTR): Enter an IPv4 or IPv6 address (in in-addr.arpa or ip6.int form respectively) and get back a hostname.- Domain delegation service (NS). Enter a domain name and get back nameserver records (designated hosts who provides authoritive nameservice) for the domain.New services are being defined, either as entirely new services (IPv6 tohostname mapping using binary labels) or as embellishments to otherservices (DNSSEC returning information about whether a given DNS serviceis performed securely or not).These services exist, conceptually, at the Application/Resolverinterface, NOT at the DNS-service interface. This document attempts toset requirements for an equivalent of the "used services" given above,where "hostname" is replaced by "Internationalized Domain Name". Thisdoesn't preclude the fact that IDN should work with any kind of DNSqueries. IDN is a new service. Since existing protocols like SMTP orHTTP use the old service, it is a matter of great concern how the newand old services work together, and how other protocols can takeadvantage of the new service.
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -