?? wget.texi
字號:
directory, without clobbering (if a name shows up more than once, thefilenames will get extensions @samp{.n}).@item -x@itemx --force-directoriesThe opposite of @samp{-nd}---create a hierarchy of directories, even ifone would not have been created otherwise. E.g. @samp{wget -xhttp://fly.srk.fer.hr/robots.txt} will save the downloaded file to@file{fly.srk.fer.hr/robots.txt}.@item -nH@itemx --no-host-directoriesDisable generation of host-prefixed directories. By default, invokingWget with @samp{-r http://fly.srk.fer.hr/} will create a structure ofdirectories beginning with @file{fly.srk.fer.hr/}. This option disablessuch behavior.@item --protocol-directoriesUse the protocol name as a directory component of local file names. Forexample, with this option, @samp{wget -r http://@var{host}} will save to@samp{http/@var{host}/...} rather than just to @samp{@var{host}/...}.@cindex cut directories@item --cut-dirs=@var{number}Ignore @var{number} directory components. This is useful for getting afine-grained control over the directory where recursive retrieval willbe saved.Take, for example, the directory at@samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with@samp{-r}, it will be saved locally under@file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option canremove the @file{ftp.xemacs.org/} part, you are still stuck with@file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; itmakes Wget not ``see'' @var{number} remote directory components. Hereare several examples of how @samp{--cut-dirs} option works.@example@groupNo options -> ftp.xemacs.org/pub/xemacs/-nH -> pub/xemacs/-nH --cut-dirs=1 -> xemacs/-nH --cut-dirs=2 -> .--cut-dirs=1 -> ftp.xemacs.org/xemacs/...@end group@end exampleIf you just want to get rid of the directory structure, this option issimilar to a combination of @samp{-nd} and @samp{-P}. However, unlike@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---forinstance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory willbe placed to @file{xemacs/beta}, as one would expect.@cindex directory prefix@item -P @var{prefix}@itemx --directory-prefix=@var{prefix}Set directory prefix to @var{prefix}. The @dfn{directory prefix} is thedirectory where all other files and subdirectories will be saved to,i.e. the top of the retrieval tree. The default is @samp{.} (thecurrent directory).@end table@node HTTP Options@section HTTP Options@table @samp@cindex .html extension@item -E@itemx --html-extensionIf a file of type @samp{application/xhtml+xml} or @samp{text/html} is downloaded and the URL does not end with the regexp @samp{\.[Hh][Tt][Mm][Ll]?}, this option will cause the suffix @samp{.html} to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses @samp{.asp} pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like @samp{http://site.com/article.cgi?25} will be saved as@file{article.cgi?25.html}.Note that filenames changed in this way will be re-downloaded every timeyou re-mirror a site, because Wget can't tell that the local@file{@var{X}.html} file corresponds to remote URL @samp{@var{X}} (sinceit doesn't yet know that the URL produces output of type@samp{text/html} or @samp{application/xhtml+xml}. To prevent this re-downloading, you must use @samp{-k} and @samp{-K} so that the original version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive Retrieval Options}).@cindex http user@cindex http password@cindex authentication@item --http-user=@var{user}@itemx --http-password=@var{password}Specify the username @var{user} and password @var{password} on an@sc{http} server. According to the type of the challenge, Wget willencode them using either the @code{basic} (insecure) or the@code{digest} authentication scheme.Another way to specify username and password is in the @sc{url} itself(@pxref{URL Format}). Either method reveals your password to anyone whobothers to run @code{ps}. To prevent the passwords from being seen,store them in @file{.wgetrc} or @file{.netrc}, and make sure to protectthose files from other users with @code{chmod}. If the passwords arereally important, do not leave them lying in those files either---editthe files and delete them after Wget has started the download.@iftexFor more information about security issues with Wget, @xref{SecurityConsiderations}.@end iftex@cindex proxy@cindex cache@item --no-cacheDisable server-side cache. In this case, Wget will send the remoteserver an appropriate directive (@samp{Pragma: no-cache}) to get thefile from the remote service, rather than returning the cached version.This is especially useful for retrieving and flushing out-of-datedocuments on proxy servers.Caching is allowed by default.@cindex cookies@item --no-cookiesDisable the use of cookies. Cookies are a mechanism for maintainingserver-side state. The server sends the client a cookie using the@code{Set-Cookie} header, and the client responds with the same cookieupon further requests. Since cookies allow the server owners to keeptrack of visitors and for sites to exchange this information, someconsider them a breach of privacy. The default is to use cookies;however, @emph{storing} cookies is not on by default.@cindex loading cookies@cindex cookies, loading@item --load-cookies @var{file}Load cookies from @var{file} before the first HTTP retrieval.@var{file} is a textual file in the format originally used by Netscape's@file{cookies.txt} file.You will typically use this option when mirroring sites that requirethat you be logged in to access some or all of their content. The loginprocess typically works by the web server issuing an @sc{http} cookieupon receiving and verifying your credentials. The cookie is thenresent by the browser when accessing that part of the site, and soproves your identity.Mirroring such a site requires Wget to send the same cookies yourbrowser sends when communicating with the site. This is achieved by@samp{--load-cookies}---simply point Wget to the location of the@file{cookies.txt} file, and it will send the same cookies your browserwould send in the same situation. Different browsers keep textualcookie files in different locations:@table @asis@item Netscape 4.x.The cookies are in @file{~/.netscape/cookies.txt}.@item Mozilla and Netscape 6.x.Mozilla's cookie file is also named @file{cookies.txt}, locatedsomewhere under @file{~/.mozilla}, in the directory of your profile.The full path usually ends up looking somewhat like@file{~/.mozilla/default/@var{some-weird-string}/cookies.txt}.@item Internet Explorer.You can produce a cookie file Wget can use by using the File menu,Import and Export, Export Cookies. This has been tested with InternetExplorer 5; it is not guaranteed to work with earlier versions.@item Other browsers.If you are using a different browser to create your cookies,@samp{--load-cookies} will only work if you can locate or produce acookie file in the Netscape format that Wget expects.@end tableIf you cannot use @samp{--load-cookies}, there might still be analternative. If your browser supports a ``cookie manager'', you can useit to view the cookies used when accessing the site you're mirroring.Write down the name and value of the cookie, and manually instruct Wgetto send those cookies, bypassing the ``official'' cookie support:@examplewget --no-cookies --header "Cookie: @var{name}=@var{value}"@end example@cindex saving cookies@cindex cookies, saving@item --save-cookies @var{file}Save cookies to @var{file} before exiting. This will not save cookiesthat have expired or that have no expiry time (so-called ``sessioncookies''), but also see @samp{--keep-session-cookies}.@cindex cookies, session@cindex session cookies@item --keep-session-cookiesWhen specified, causes @samp{--save-cookies} to also save sessioncookies. Session cookies are normally not saved because they aremeant to be kept in memory and forgotten when you exit the browser.Saving them is useful on sites that require you to log in or to visitthe home page before you can access some pages. With this option,multiple Wget runs are considered a single browser session as far asthe site is concerned.Since the cookie file format does not normally carry session cookies,Wget marks them with an expiry timestamp of 0. Wget's@samp{--load-cookies} recognizes those as session cookies, but it mightconfuse other browsers. Also note that cookies so loaded will betreated as other session cookies, which means that if you want@samp{--save-cookies} to preserve them again, you must use@samp{--keep-session-cookies} again.@cindex Content-Length, ignore@cindex ignore length@item --ignore-lengthUnfortunately, some @sc{http} servers (@sc{cgi} programs, to be moreprecise) send out bogus @code{Content-Length} headers, which makes Wgetgo wild, as it thinks not all the document was retrieved. You can spotthis syndrome if Wget retries getting the same document again and again,each time claiming that the (otherwise normal) connection has closed onthe very same byte.With this option, Wget will ignore the @code{Content-Length} header---asif it never existed.@cindex header, add@item --header=@var{header-line}Send @var{header-line} along with the rest of the headers in each@sc{http} request. The supplied header is sent as-is, which means itmust contain name and value separated by colon, and must not containnewlines.You may define more than one additional header by specifying@samp{--header} more than once.@example@groupwget --header='Accept-Charset: iso-8859-2' \ --header='Accept-Language: hr' \ http://fly.srk.fer.hr/@end group@end exampleSpecification of an empty string as the header value will clear allprevious user-defined headers.As of Wget 1.10, this option can be used to override headers otherwisegenerated automatically. This example instructs Wget to connect tolocalhost, but to specify @samp{foo.bar} in the @code{Host} header:@examplewget --header="Host: foo.bar" http://localhost/@end exampleIn versions of Wget prior to 1.10 such use of @samp{--header} causedsending of duplicate headers.@cindex proxy user@cindex proxy password@cindex proxy authentication@item --proxy-user=@var{user}@itemx --proxy-password=@var{password}Specify the username @var{user} and password @var{password} forauthentication on a proxy server. Wget will encode them using the@code{basic} authentication scheme.Security considerations similar to those with @samp{--http-password}pertain here as well.@cindex http referer@cindex referer, http@item --referer=@var{url}Include `Referer: @var{url}' header in HTTP request. Useful forretrieving documents with server-side processing that assume they arealways being retrieved by interactive web browsers and only come outproperly when Referer is set to one of the pages that point to them.@cindex server response, save@item --save-headersSave the headers sent by the @sc{http} server to the file, preceding theactual contents, with an empty line as the separator.@cindex user-agent@item -U @var{agent-string}@itemx --user-agent=@var{agent-string}Identify as @var{agent-string} to the @sc{http} server.The @sc{http} protocol allows the clients to identify themselves using a@code{User-Agent} header field. This enables distinguishing the@sc{www} software, usually for statistical purposes or for tracing ofprotocol violations. Wget normally identifies as@samp{Wget/@var{version}}, @var{version} being the current versionnumber of Wget.However, some sites have been known to impose the policy of tailoringthe output according to the @code{User-Agent}-supplied information.While this is not such a bad idea in theory, it has been abused byservers denying information to clients other than (historically)Netscape or, more frequently, Microsoft Internet Explorer. Thisoption allows you to change the @code{User-Agent} line issued by Wget.Use of this option is discouraged, unless you really know what you aredoing.Specifying empty user agent with @samp{--user-agent=""} instructs Wgetnot to send the @code{User-Agent} header in @sc{http} requests.@cindex POST@item --post-data=@var{string}@itemx --post-file=@var{file}Use POST as the method for all HTTP requests and send the specified datain the request body. @code{--post-data} sends @var{string} as data,whereas @code{--post-file} sends the contents of @var{file}. Other thanthat, they work in exactly the same way.Please be aware that Wget needs to know the size of the POST data inadvance. Therefore the argument to @code{--post-file} must be a regularfile; specifying a FIFO or something like @file{/dev/stdin} won't work.It's not quite clear how to work around this limitation inherent inHTTP/1.0. Although HTTP/1.1 introduces @dfn{chunked} transfer thatdoesn't require knowing the request length in advance, a client can'tuse chunked unless it knows it's talking to an HTTP/1.1 server. And itcan't know that until it receives a response, which in turn requires therequest to have been completed -- a chicken-and-egg problem.Note: if Wget is redirected after the POST request is completed, itwill not send the POST data to the redirected URL. This is becauseURLs that process POST often respond with a redirection to a regular
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -