?? arcwriter.html
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_07) on Sun May 06 17:59:50 GMT 2007 --><TITLE>ARCWriter (Heritrix 1.12.1)</TITLE><META NAME="keywords" CONTENT="org.archive.io.arc.ARCWriter class"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){ parent.document.title="ARCWriter (Heritrix 1.12.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-summary.html"><FONT CLASS="NavBarFont1"><B>Package</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Class</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="class-use/ARCWriter.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../org/archive/io/arc/ARCUtils.html" title="class in org.archive.io.arc"><B>PREV CLASS</B></A> <A HREF="../../../../org/archive/io/arc/ARCWriterPool.html" title="class in org.archive.io.arc"><B>NEXT CLASS</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../index.html?org/archive/io/arc/ARCWriter.html" target="_top"><B>FRAMES</B></A> <A HREF="ARCWriter.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR><TR><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2"> SUMMARY: NESTED | <A HREF="#fields_inherited_from_class_org.archive.io.WriterPoolMember">FIELD</A> | <A HREF="#constructor_summary">CONSTR</A> | <A HREF="#method_summary">METHOD</A></FONT></TD><TD VALIGN="top" CLASS="NavBarCell3"><FONT SIZE="-2">DETAIL: FIELD | <A HREF="#constructor_detail">CONSTR</A> | <A HREF="#method_detail">METHOD</A></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><!-- ======== START OF CLASS DATA ======== --><H2><FONT SIZE="-1">org.archive.io.arc</FONT><BR>Class ARCWriter</H2><PRE>java.lang.Object <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">org.archive.io.WriterPoolMember</A> <IMG SRC="../../../../resources/inherit.gif" ALT="extended by "><B>org.archive.io.arc.ARCWriter</B></PRE><DL><DT><B>All Implemented Interfaces:</B> <DD><A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html" title="interface in org.archive.io">ArchiveFileConstants</A></DD></DL><HR><DL><DT><PRE>public class <B>ARCWriter</B><DT>extends <A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">WriterPoolMember</A><DT>implements <A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A></DL></PRE><P>Write ARC files. Assumption is that the caller is managing access to this ARCWriter ensuring only one thread of control accessing this ARC file instance at any one time. <p>ARC files are described here: <a href="http://www.archive.org/web/researcher/ArcFileFormat.php">Arc File Format</a>. This class does version 1 of the ARC file format. It also writes version 1.1 which is version 1 with data stuffed into the body of the first arc record in the file, the arc file meta record itself. <p>An ARC file is three lines of meta data followed by an optional 'body' and then a couple of '\n' and then: record, '\n', record, '\n', record, etc. If we are writing compressed ARC files, then each of the ARC file records is individually gzipped and concatenated together to make up a single ARC file. In GZIP terms, each ARC record is a GZIP <i>member</i> of a total gzip'd file. <p>The GZIPping of the ARC file meta data is exceptional. It is GZIPped w/ an extra GZIP header, a special Internet Archive (IA) extra header field (e.g. FEXTRA is set in the GZIP header FLG field and an extra field is appended to the GZIP header). The extra field has little in it but its presence denotes this GZIP as an Internet Archive gzipped ARC. See RFC1952 to learn about the GZIP header structure. <p>This class then does its GZIPping in the following fashion. Each GZIP member is written w/ a new instance of GZIPOutputStream -- actually ARCWriterGZIPOututStream so we can get access to the underlying stream. The underlying stream stays open across GZIPoutputStream instantiations. For the 'special' GZIPing of the ARC file meta data, we cheat by catching the GZIPOutputStream output into a byte array, manipulating it adding the IA GZIP header, before writing to the stream. <p>I tried writing a resettable GZIPOutputStream and could make it work w/ the SUN JDK but the IBM JDK threw NPE inside in the deflate.reset -- its zlib native call doesn't seem to like the notion of resetting -- so I gave up on it. <p>Because of such as the above and troubles with GZIPInputStream, we should write our own GZIP*Streams, ones that resettable and consious of gzip members. <p>This class will write until we hit >= maxSize. The check is done at record boundary. Records do not span ARC files. We will then close current file and open another and then continue writing. <p><b>TESTING: </b>Here is how to test that produced ARC files are good using the <a href="http://www.archive.org/web/researcher/tool_documentation.php">alexa ARC c-tools</a>: <pre> % av_procarc hx20040109230030-0.arc.gz | av_ziparc > \ /tmp/hx20040109230030-0.dat.gz % av_ripdat /tmp/hx20040109230030-0.dat.gz > /tmp/hx20040109230030-0.cdx </pre> Examine the produced cdx file to make sure it makes sense. Search for 'no-type 0'. If found, then we're opening a gzip record w/o data to write. This is bad. <p>You can also do <code>gzip -t FILENAME</code> and it will tell you if the ARC makes sense to GZIP. <p>While being written, ARCs have a '.open' suffix appended.<P><P><DL><DT><B>Author:</B></DT> <DD>stack</DD></DL><HR><P><!-- =========== FIELD SUMMARY =========== --><A NAME="field_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Field Summary</B></FONT></TH></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.io.WriterPoolMember"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from class org.archive.io.<A HREF="../../../../org/archive/io/WriterPoolMember.html" title="class in org.archive.io">WriterPoolMember</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/WriterPoolMember.html#DEFAULT_PREFIX">DEFAULT_PREFIX</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#DEFAULT_SUFFIX">DEFAULT_SUFFIX</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#HOSTNAME_VARIABLE">HOSTNAME_VARIABLE</A>, <A HREF="../../../../org/archive/io/WriterPoolMember.html#UTF8">UTF8</A></CODE></TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.io.arc.ARCConstants"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from interface org.archive.io.arc.<A HREF="../../../../org/archive/io/arc/ARCConstants.html" title="interface in org.archive.io.arc">ARCConstants</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_FILE_EXTENSION">ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_GZIP_EXTRA_FIELD">ARC_GZIP_EXTRA_FIELD</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#ARC_MAGIC_NUMBER">ARC_MAGIC_NUMBER</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CHECKSUM_FIELD_KEY">CHECKSUM_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CHECKSUM_HEADER_FIELD_KEY">CHECKSUM_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#CODE_HEADER_FIELD_KEY">CODE_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#COMPRESSED_ARC_FILE_EXTENSION">COMPRESSED_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_ENCODING">DEFAULT_ENCODING</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_GZIP_HEADER_LENGTH">DEFAULT_GZIP_HEADER_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DEFAULT_MAX_ARC_FILE_SIZE">DEFAULT_MAX_ARC_FILE_SIZE</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_ARC_FILE_EXTENSION">DOT_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_COMPRESSED_ARC_FILE_EXTENSION">DOT_COMPRESSED_ARC_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#DOT_COMPRESSED_FILE_EXTENSION">DOT_COMPRESSED_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#FILENAME_FIELD_KEY">FILENAME_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#FILENAME_HEADER_FIELD_KEY">FILENAME_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#GZIP_HEADER_BEGIN">GZIP_HEADER_BEGIN</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#HEADER_FIELD_SEPARATOR">HEADER_FIELD_SEPARATOR</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#IP_HEADER_FIELD_KEY">IP_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#LINE_SEPARATOR">LINE_SEPARATOR</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#LOCATION_HEADER_FIELD_KEY">LOCATION_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#MAX_METADATA_LINE_LENGTH">MAX_METADATA_LINE_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#MINIMUM_RECORD_LENGTH">MINIMUM_RECORD_LENGTH</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#OFFSET_FIELD_KEY">OFFSET_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#OFFSET_HEADER_FIELD_KEY">OFFSET_HEADER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#REQUIRED_VERSION_1_HEADER_FIELDS">REQUIRED_VERSION_1_HEADER_FIELDS</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#STATUSCODE_FIELD_KEY">STATUSCODE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/arc/ARCConstants.html#TOKENIZED_PREFIX">TOKENIZED_PREFIX</A></CODE></TD></TR></TABLE> <A NAME="fields_inherited_from_class_org.archive.io.ArchiveFileConstants"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#EEEEFF" CLASS="TableSubHeadingColor"><TH ALIGN="left"><B>Fields inherited from interface org.archive.io.<A HREF="../../../../org/archive/io/ArchiveFileConstants.html" title="interface in org.archive.io">ArchiveFileConstants</A></B></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><A HREF="../../../../org/archive/io/ArchiveFileConstants.html#ABSOLUTE_OFFSET_KEY">ABSOLUTE_OFFSET_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX">CDX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX_FILE">CDX_FILE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CDX_LINE_BUFFER_SIZE">CDX_LINE_BUFFER_SIZE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#COMPRESSED_FILE_EXTENSION">COMPRESSED_FILE_EXTENSION</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#CRLF">CRLF</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DATE_FIELD_KEY">DATE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DEFAULT_DIGEST_METHOD">DEFAULT_DIGEST_METHOD</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#DUMP">DUMP</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#GZIP_DUMP">GZIP_DUMP</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#HEADER">HEADER</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#INVALID_SUFFIX">INVALID_SUFFIX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#LENGTH_FIELD_KEY">LENGTH_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#MIMETYPE_FIELD_KEY">MIMETYPE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#NOHEAD">NOHEAD</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#OCCUPIED_SUFFIX">OCCUPIED_SUFFIX</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#READER_IDENTIFIER_FIELD_KEY">READER_IDENTIFIER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#RECORD_IDENTIFIER_FIELD_KEY">RECORD_IDENTIFIER_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#SINGLE_SPACE">SINGLE_SPACE</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#TYPE_FIELD_KEY">TYPE_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#URL_FIELD_KEY">URL_FIELD_KEY</A>, <A HREF="../../../../org/archive/io/ArchiveFileConstants.html#VERSION_FIELD_KEY">VERSION_FIELD_KEY</A></CODE></TD></TR></TABLE> <!-- ======== CONSTRUCTOR SUMMARY ======== --><A NAME="constructor_summary"><!-- --></A><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Constructor Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD><CODE><B><A HREF="../../../../org/archive/io/arc/ARCWriter.html#ARCWriter(java.util.concurrent.atomic.AtomicInteger, java.util.List, java.lang.String, boolean, long)">ARCWriter</A></B>(java.util.concurrent.atomic.AtomicInteger serialNo, java.util.List<java.io.File> dirs, java.lang.String prefix, boolean cmprs,
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -