?? copy.sgml
字號:
The specified null string is sent by <command>COPY TO</command> without adding any backslashes; conversely, <command>COPY FROM</command> matches the input against the null string before removing backslashes. Therefore, a null string such as <literal>\N</literal> cannot be confused with the actual data value <literal>\N</literal> (which would be represented as <literal>\\N</literal>). </para> <para> The following special backslash sequences are recognized by <command>COPY FROM</command>: <informaltable> <tgroup cols="2"> <thead> <row> <entry>Sequence</entry> <entry>Represents</entry> </row> </thead> <tbody> <row> <entry><literal>\b</></entry> <entry>Backspace (ASCII 8)</entry> </row> <row> <entry><literal>\f</></entry> <entry>Form feed (ASCII 12)</entry> </row> <row> <entry><literal>\n</></entry> <entry>Newline (ASCII 10)</entry> </row> <row> <entry><literal>\r</></entry> <entry>Carriage return (ASCII 13)</entry> </row> <row> <entry><literal>\t</></entry> <entry>Tab (ASCII 9)</entry> </row> <row> <entry><literal>\v</></entry> <entry>Vertical tab (ASCII 11)</entry> </row> <row> <entry><literal>\</><replaceable>digits</></entry> <entry>Backslash followed by one to three octal digits specifies the character with that numeric code</entry> </row> <row> <entry><literal>\x</><replaceable>digits</></entry> <entry>Backslash <literal>x</> followed by one or two hex digits specifies the character with that numeric code</entry> </row> </tbody> </tgroup> </informaltable> Presently, <command>COPY TO</command> will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters. </para> <para> Any other backslashed character that is not mentioned in the above table will be taken to represent itself. However, beware of adding backslashes unnecessarily, since that might accidentally produce a string matching the end-of-data marker (<literal>\.</>) or the null string (<literal>\N</> by default). These strings will be recognized before any other backslash processing is done. </para> <para> It is strongly recommended that applications generating <command>COPY</command> data convert data newlines and carriage returns to the <literal>\n</> and <literal>\r</> sequences respectively. At present it is possible to represent a data carriage return by a backslash and carriage return, and to represent a data newline by a backslash and newline. However, these representations might not be accepted in future releases. They are also highly vulnerable to corruption if the <command>COPY</command> file is transferred across different machines (for example, from Unix to Windows or vice versa). </para> <para> <command>COPY TO</command> will terminate each row with a Unix-style newline (<quote><literal>\n</></>). Servers running on Microsoft Windows instead output carriage return/newline (<quote><literal>\r\n</></>), but only for <command>COPY</> to a server file; for consistency across platforms, <command>COPY TO STDOUT</> always sends <quote><literal>\n</></> regardless of server platform. <command>COPY FROM</command> can handle lines ending with newlines, carriage returns, or carriage return/newlines. To reduce the risk of error due to un-backslashed newlines or carriage returns that were meant as data, <command>COPY FROM</command> will complain if the line endings in the input are not all alike. </para> </refsect2> <refsect2> <title>CSV Format</title> <para> This format is used for importing and exporting the Comma Separated Value (<literal>CSV</>) file format used by many other programs, such as spreadsheets. Instead of the escaping used by <productname>PostgreSQL</productname>'s standard text mode, it produces and recognizes the common CSV escaping mechanism. </para> <para> The values in each record are separated by the <literal>DELIMITER</> character. If the value contains the delimiter character, the <literal>QUOTE</> character, the <literal>NULL</> string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the <literal>QUOTE</> character, and any occurrence within the value of a <literal>QUOTE</> character or the <literal>ESCAPE</> character is preceded by the escape character. You can also use <literal>FORCE QUOTE</> to force quotes when outputting non-<literal>NULL</> values in specific columns. </para> <para> The <literal>CSV</> format has no standard way to distinguish a <literal>NULL</> value from an empty string. <productname>PostgreSQL</>'s <command>COPY</> handles this by quoting. A <literal>NULL</> is output as the <literal>NULL</> string and is not quoted, while a data value matching the <literal>NULL</> string is quoted. Therefore, using the default settings, a <literal>NULL</> is written as an unquoted empty string, while an empty string is written with double quotes (<literal>""</>). Reading values follows similar rules. You can use <literal>FORCE NOT NULL</> to prevent <literal>NULL</> input comparisons for specific columns. </para> <para> Because backslash is not a special character in the <literal>CSV</> format, <literal>\.</>, the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a <literal>\.</> data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of <literal>\.</>, you might need to quote that value in the input file. </para> <note> <para> In <literal>CSV</> mode, all characters are significant. A quoted value surrounded by white space, or any characters other than <literal>DELIMITER</>, will include those characters. This can cause errors if you import data from a system that pads <literal>CSV</> lines with white space out to some fixed width. If such a situation arises you might need to preprocess the <literal>CSV</> file to remove the trailing white space, before importing the data into <productname>PostgreSQL</>. </para> </note> <note> <para> CSV mode will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-mode files. </para> </note> <note> <para> Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and <command>COPY</> might produce files that other programs cannot process. </para> </note> </refsect2> <refsect2> <title>Binary Format</title> <para> The file format used for <command>COPY BINARY</command> changed in <productname>PostgreSQL</productname> 7.4. The new format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are now in network byte order. </para> <refsect3> <title>File Header</title> <para> The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area. The fixed fields are: <variablelist> <varlistentry> <term>Signature</term> <listitem> <para>11-byte sequence <literal>PGCOPY\n\377\r\n\0</> — note that the zero byteis a required part of the signature. (The signature is designed to alloweasy identification of files that have been munged by a non-8-bit-cleantransfer. This signature will be changed by end-of-line-translationfilters, dropped zero bytes, dropped high bits, or parity changes.) </para> </listitem> </varlistentry> <varlistentry> <term>Flags field</term> <listitem> <para>32-bit integer bit mask to denote important aspects of the file format. Bitsare numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>). Note thatthis field is stored in network byte order (most significant byte first),as are all the integer fields used in the file format. Bits16-31 are reserved to denote critical file format issues; a readershould abort if it finds an unexpected bit set in this range. Bits 0-15are reserved to signal backwards-compatible format issues; a readershould simply ignore any unexpected bits set in this range. Currentlyonly one flag bit is defined, and the rest must be zero: <variablelist> <varlistentry> <term>Bit 16</term> <listitem> <para> if 1, OIDs are included in the data; if 0, not </para> </listitem> </varlistentry> </variablelist> </para> </listitem> </varlistentry> <varlistentry> <term>Header extension area length</term> <listitem> <para>32-bit integer, length in bytes of remainder of header, not including self.Currently, this is zero, and the first tuple followsimmediately. Future changes to the format might allow additional datato be present in the header. A reader should silently skip over any headerextension data it does not know what to do with. </para> </listitem> </varlistentry> </variablelist> </para> <para>The header extension area is envisioned to contain a sequence ofself-identifying chunks. The flags field is not intended to tell readerswhat is in the extension area. Specific design of header extension contentsis left for a later release. </para> <para> This design allows for both backwards-compatible header additions (add header extension chunks, or set low-order flag bits) and non-backwards-compatible changes (set high-order flag bits to signal such changes, and add supporting data to the extension area if needed). </para> </refsect3> <refsect3> <title>Tuples</title> <para>Each tuple begins with a 16-bit integer count of the number of fields in thetuple. (Presently, all tuples in a table will have the same count, but thatmight not always be true.) Then, repeated for each field in the tuple, thereis a 32-bit length word followed by that many bytes of field data. (Thelength word does not include itself, and can be zero.) As a special case,-1 indicates a NULL field value. No value bytes follow in the NULL case. </para> <para>There is no alignment padding or any other extra data between fields. </para> <para>Presently, all data values in a <command>COPY BINARY</command> file areassumed to be in binary format (format code one). It is anticipated that afuture extension might add a header field that allows per-column format codesto be specified. </para> <para>To determine the appropriate binary format for the actual tuple data youshould consult the <productname>PostgreSQL</productname> source, inparticular the <function>*send</> and <function>*recv</> functions foreach column's data type (typically these functions are found in the<filename>src/backend/utils/adt/</filename> directory of the sourcedistribution). </para> <para>If OIDs are included in the file, the OID field immediately follows thefield-count word. It is a normal field except that it's not includedin the field-count. In particular it has a length word — this will allowhandling of 4-byte vs. 8-byte OIDs without too much pain, and will allowOIDs to be shown as null if that ever proves desirable. </para> </refsect3> <refsect3> <title>File Trailer</title> <para> The file trailer consists of a 16-bit integer word containing -1. This is easily distinguished from a tuple's field-count word. </para> <para> A reader should report an error if a field-count word is neither -1 nor the expected number of columns. This provides an extra check against somehow getting out of sync with the data. </para> </refsect3> </refsect2> </refsect1> <refsect1> <title>Examples</title> <para> The following example copies a table to the client using the vertical bar (<literal>|</literal>) as the field delimiter:<programlisting>COPY country TO STDOUT WITH DELIMITER '|';</programlisting> </para> <para> To copy data from a file into the <literal>country</> table:<programlisting>COPY country FROM '/usr1/proj/bray/sql/country_data';</programlisting> </para> <para> To copy into a file just the countries whose names start with 'A':<programlisting>COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';</programlisting> </para> <para> Here is a sample of data suitable for copying into a table from <literal>STDIN</literal>:<programlisting>AF AFGHANISTANAL ALBANIADZ ALGERIAZM ZAMBIAZW ZIMBABWE</programlisting> Note that the white space on each line is actually a tab character. </para> <para> The following is the same data, output in binary format. The data is shown after filtering through the Unix utility <command>od -c</command>. The table has three columns; the first has type <type>char(2)</type>, the second has type <type>text</type>, and the third has type <type>integer</type>. All the rows have a null value in the third column.<programlisting>0000000 P G C O P Y \n 377 \r \n \0 \0 \0 \0 \0 \00000020 \0 \0 \0 \0 003 \0 \0 \0 002 A F \0 \0 \0 013 A0000040 F G H A N I S T A N 377 377 377 377 \0 0030000060 \0 \0 \0 002 A L \0 \0 \0 007 A L B A N I0000100 A 377 377 377 377 \0 003 \0 \0 \0 002 D Z \0 \0 \00000120 007 A L G E R I A 377 377 377 377 \0 003 \0 \00000140 \0 002 Z M \0 \0 \0 006 Z A M B I A 377 3770000160 377 377 \0 003 \0 \0 \0 002 Z W \0 \0 \0 \b Z I0000200 M B A B W E 377 377 377 377 377 377</programlisting> </para> </refsect1> <refsect1> <title>Compatibility</title> <para> There is no <command>COPY</command> statement in the SQL standard. </para> <para> The following syntax was used before <productname>PostgreSQL</> version 7.3 and is still supported:<synopsis>COPY [ BINARY ] <replaceable class="parameter">tablename</replaceable> [ WITH OIDS ] FROM { '<replaceable class="parameter">filename</replaceable>' | STDIN } [ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ] [ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]COPY [ BINARY ] <replaceable class="parameter">tablename</replaceable> [ WITH OIDS ] TO { '<replaceable class="parameter">filename</replaceable>' | STDOUT } [ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ] [ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]</synopsis> </para> </refsect1></refentry>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -