?? package-summary.html
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><!--NewPage--><HTML><HEAD><!-- Generated by javadoc (build 1.5.0_07) on Sun May 06 18:00:00 GMT 2007 --><TITLE>org.archive.io.warc.v10 (Heritrix 1.12.1)</TITLE><META NAME="keywords" CONTENT="org.archive.io.warc.v10 package"><LINK REL ="stylesheet" TYPE="text/css" HREF="../../../../../stylesheet.css" TITLE="Style"><SCRIPT type="text/javascript">function windowTitle(){ parent.document.title="org.archive.io.warc.v10 (Heritrix 1.12.1)";}</SCRIPT><NOSCRIPT></NOSCRIPT></HEAD><BODY BGCOLOR="white" onload="windowTitle();"><!-- ========= START OF TOP NAVBAR ======= --><A NAME="navbar_top"><!-- --></A><A HREF="#skip-navbar_top" title="Skip navigation links"></A><TABLE BORDER="0" WIDTH="100%" CELLPADDING="1" CELLSPACING="0" SUMMARY=""><TR><TD COLSPAN=2 BGCOLOR="#EEEEFF" CLASS="NavBarCell1"><A NAME="navbar_top_firstrow"><!-- --></A><TABLE BORDER="0" CELLPADDING="0" CELLSPACING="3" SUMMARY=""> <TR ALIGN="center" VALIGN="top"> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../overview-summary.html"><FONT CLASS="NavBarFont1"><B>Overview</B></FONT></A> </TD> <TD BGCOLOR="#FFFFFF" CLASS="NavBarCell1Rev"> <FONT CLASS="NavBarFont1Rev"><B>Package</B></FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <FONT CLASS="NavBarFont1">Class</FONT> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-use.html"><FONT CLASS="NavBarFont1"><B>Use</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="package-tree.html"><FONT CLASS="NavBarFont1"><B>Tree</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../deprecated-list.html"><FONT CLASS="NavBarFont1"><B>Deprecated</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../index-all.html"><FONT CLASS="NavBarFont1"><B>Index</B></FONT></A> </TD> <TD BGCOLOR="#EEEEFF" CLASS="NavBarCell1"> <A HREF="../../../../../help-doc.html"><FONT CLASS="NavBarFont1"><B>Help</B></FONT></A> </TD> </TR></TABLE></TD><TD ALIGN="right" VALIGN="top" ROWSPAN=3><EM></EM></TD></TR><TR><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../../org/archive/io/warc/package-summary.html"><B>PREV PACKAGE</B></A> <A HREF="../../../../../org/archive/net/package-summary.html"><B>NEXT PACKAGE</B></A></FONT></TD><TD BGCOLOR="white" CLASS="NavBarCell2"><FONT SIZE="-2"> <A HREF="../../../../../index.html?org/archive/io/warc/v10/package-summary.html" target="_top"><B>FRAMES</B></A> <A HREF="package-summary.html" target="_top"><B>NO FRAMES</B></A> <SCRIPT type="text/javascript"> <!-- if(window==top) { document.writeln('<A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A>'); } //--></SCRIPT><NOSCRIPT> <A HREF="../../../../../allclasses-noframe.html"><B>All Classes</B></A></NOSCRIPT></FONT></TD></TR></TABLE><A NAME="skip-navbar_top"></A><!-- ========= END OF TOP NAVBAR ========= --><HR><H2>Package org.archive.io.warc.v10</H2>Experimental WARC Writer and Readers.<P><B>See:</B><BR> <A HREF="#package_description"><B>Description</B></A><P><TABLE BORDER="1" WIDTH="100%" CELLPADDING="3" CELLSPACING="0" SUMMARY=""><TR BGCOLOR="#CCCCFF" CLASS="TableHeadingColor"><TH ALIGN="left" COLSPAN="2"><FONT SIZE="+2"><B>Class Summary</B></FONT></TH></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../../org/archive/io/warc/v10/ExperimentalWARCWriter.html" title="class in org.archive.io.warc.v10">ExperimentalWARCWriter</A></B></TD><TD><b>Experimental</b> WARC implementation.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../../org/archive/io/warc/v10/WARCReader.html" title="class in org.archive.io.warc.v10">WARCReader</A></B></TD><TD>WARCReader.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../../org/archive/io/warc/v10/WARCReaderFactory.html" title="class in org.archive.io.warc.v10">WARCReaderFactory</A></B></TD><TD>Factory for WARC Readers.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../../org/archive/io/warc/v10/WARCRecord.html" title="class in org.archive.io.warc.v10">WARCRecord</A></B></TD><TD>A WARC file Record.</TD></TR><TR BGCOLOR="white" CLASS="TableRowColor"><TD WIDTH="15%"><B><A HREF="../../../../../org/archive/io/warc/v10/WARCWriterPool.html" title="class in org.archive.io.warc.v10">WARCWriterPool</A></B></TD><TD>A pool of WARCWriters.</TD></TR></TABLE> <P><A NAME="package_description"><!-- --></A><H2>Package org.archive.io.warc.v10 Description</H2><P>Experimental WARC Writer and Readers. Code and specification subject to changewith no guarantees of backward compatibility: i.e. newer readersmay not be able to parse WARCs written with older writers. This code, with noted exceptions, is a loose implementation of parts of the(unreleased and unfinished)<a href="http://archive-access.sourceforge.net/warc/warc_file_format-0.9.html">WARCFile Format (Version 0.9)</a>. Deviations from 0.9, outlined below in thesection <i>Deviations from Spec.</i>, are to be proposed as amendments to thespecification to make a new revision. Since the new spec. revision will likelybe named version 0.10, code in this package writes WARCs of version 0.10 -- not0.9.<h2>Implementation Notes</h2><h3>Tools</h3><p>Initial implementations of <code>Arc2Warc</code> and <code>Warc2Arc</code>tools can be found in the package above this one, at<A HREF="../../../../../org/archive/io/Arc2Warc.html" title="class in org.archive.io"><CODE>Arc2Warc</CODE></A> and <A HREF="../../../../../org/archive/io/Warc2Arc.html" title="class in org.archive.io"><CODE>Warc2Arc</CODE></A>respectively. Pass <code>--help</code> to learn how to use each tool.</p><h3>Unique ID Generator</h3><p>WARC requires a GUID for each record written. A configurable unique ID<A HREF="../../../../../org/archive/uid/GeneratorFactory.html" title="class in org.archive.uid"><CODE>GeneratorFactory</CODE></A>, it can be configured to use alternateunique ID generators, was added with a default of<A HREF="../../../../../org/archive/uid/UUIDGenerator.html" title="class in org.archive.uid"><CODE>UUIDGenerator</CODE></A>. The default implementation generates<a url="http://en.wikipedia.org/wiki/UUID">UUIDs</a> (using java5<code>java.util.UUID</code>) with an <code>urn</code> scheme using the uuidnamespace [See <a href="http://www.ietf.org/rfc/rfc4122.txt">RFC4122</a>].</p><h3><A HREF="../../../../../org/archive/util/anvl/package-summary.html"><CODE>ANVL</CODE></A></h3><p>The ANVL RFC822-like format is used writing <code>Named Fields</code> inWARCs and occasionally for metadata. An implementation was added at<A HREF="../../../../../org/archive/util/anvl/package-summary.html"><CODE>org.archive.util.anvl</CODE></A>.</p><h3>Miscellaneous</h3><p>Writing WARCs, the <code>response</code> record type is chosen as the corerecord that all others associate to: i.e. all others have a <code>Related-Record-ID</code> that points back to the<code>response</code>.</p><h2><a name="deviations">Deviations from Spec.</a></h2><p>The below deviations from spec. 0.9 have been realized in code and are tobe proposed as spec. amendments with newrevision likely to be 0.10 (Vocal assent was given by John, Gordon, and Stackto the below at <i>La Honda</i> Meeting, August 8th, 2006).</p><h3>mimetype in header line</h3><p>Allow full mimetypes in the header line as per RFC2045 rather thancurrent, shriveled mimetype that allows only type and subtype. This will meanmimetypes are allowed <i>parameters</i>: e.g.<code>text/plain; charset=UTF-8</code> or<code>application/http; msgtype=request</code>. Allowing full mimetypes, we can support the following scenarios withoutfurther amendment to specification and without parsers having to resort to<code>metadata</code> records or to custom<code>Named Fields</code> to figure how to interpret payload:<ul><li>Consider the case where an archiving organization would store allrelated to a capture as one record with a mimetype of <code>multipart/mixed; boundary=RECORD-ID</code>. An example recordmight comprise the parts <code>Content-Type: application/http; msgtype=request</code>,<code>Content-Type: application/http; msgtype=response</code>, and<code>Content-Type: text/xml+rdf</code> (For metadata).</li><li>Or, an archiving institution would store a capture with<code>multipart/alternatives</code> ranging frommost basic (or 'desiccated' in Kunze-speak)-- perhaps a <code>text/plain</code> rendition of a PDF capture -- through to<code>best</code>, the actual PDF binary itself.</li></ul></p><p>To support full mimetypes, we must allow for whitespace between parametersand allow that parameter values themselves might include whitespace('quoted-string'). The WARC Writer converts any embedded carriage-return andnewlines to single space.</p><h3>Swap position of recordid and mimetype in the header line</h3><p>Because of the above amendment where we allow full mimetypes on header line,to ease the parse, since miemtype now may include whitespace, we move themimetype to last position on header line and recordid to second-from-last.</p><h3>Use application/http instead of message/http</h3><p>message type has line length maximum of 1000 characters absent a<code>Content-Type-Encoding</code> header set to <code>BINARY</code>.(See definition of message/http for talk of adherence to MIME<code>message</code> line limits: See 19.1 Internet Media Type message/http and application/http in <a href="http://www.faqs.org/rfcs/rfc2616.html">RFC2616</a>).</p><h2>Suggested Spec. Amendments</h2><p>Apart from the above listed <a href="#deviations">deviations</a>, the belowchanges are also suggested for inclusion in 0.10 spec. revision</p><p>Below are mostly suggested edits. Changes are not substantative.</p><h3>Allow multiple instances of a single Named Parameter</h3><p>Allow that there may be multiple instances of same Named Parameterin any one Named Parameter block.E.g. Multiple <code>Related-Record-ID</code>s could prove of use.Spec. mentions this in <i>8.1 HTTP and HTTPS</i> section but betterbelongs in the <i>5.2 Named Parameters</i> preamble.</p><p>Related, add to <code>Named Field</code> section note on bidirectional<code>Related-Record-ID</code>.</p><h4>Miscellaneous</h4>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -