18
July
2023
17:41

Replacing the Wget utility with Wget2 (part 2)

18 July 2023 17:41

Second part of the article.See part 1.

HTTP options:

--default-page=name
Use name as the default file name when it is unknown (for example, for URLs ending with a slash), instead of index.html.

--default-http-port=port
Set the default port for HTTP URLs (default: 80).

Used mainly for testing purposes.

--default-https-port=port
Set the default port for HTTPS URLs (default: 443).

Used mainly for testing purposes.

-E, --adjust-extension
If a file like application/xhtml+xml or text/html is loaded and the URL does not end with the regular expression .[Hh][Tt][Mm][Ll]?, this option will cause the .html suffix to be appended to the local file name. This is useful, for example, when you are mirroring a remote site that uses .asp pages, but want the mirrored pages to be viewable on your standard Apache server. Another good use for this is when you're loading CGI-generated materials. A URL like https://example.com/article.cgi?25 will be saved as article.cgi?25.html.

Note that file names changed this way will be re-downloaded every time you re-mirror the site, because Wget2 cannot tell that the local file X.html matches the remote URL X (since it does not yet know that the URL produces output like text/html or application/xhtml+xml).

Wget2 also ensures that any downloaded text/css files end with the .css suffix.

At some point in the future, this option may well be expanded to include suffixes for other content types, including content types that are not parsed by Wget.

--http-user=user, --http-password=password
Specify a username and password for HTTP authentication. Depending on the type of task, Wget will encode them using either the "basic" (insecure), "digest" Windows authentication scheme, or "NTLM".

If possible, put your credentials in ~/.netrc (see also --netrc and --netrc-file options) or in .wget2rc. This is much safer than using the command line, which can be seen by any other user. If passwords are really important, don't leave them lying around in these files. Edit the files and delete them after Wget2 starts downloading.

In the ~/.netrc file, passwords can be enclosed in double quotes to escape spaces. Also, escape characters with a backslash if necessary. Backslashes in passwords should always be escaped, so use \ instead of a single .

Also see --use-askpass и --ask-password for an interactive method of providing your password.

--http-proxy-user=user, --http-proxy-password=password
Specifies the username and password for authentication on the HTTP proxy server. See for details --http-user.

--http-proxy=proxies
Specifies a list of HTTP proxies separated by commas. Environment variable http_proxy' будет переопределена. Исключения можно установить с помощью переменной окружения no_proxy' или с помощью --no-proxy.

--https-proxy=proxies
Specifies a list of HTTPS proxies separated by commas. Environment variable https_proxy будет переопределена. Исключения можно установить с помощью переменной окружения no_proxy или с помощью --no-proxy.

--no-http-keep-alive
Disable keep-alive for HTTP(S) downloads. Typically Wget2 asks the server to keep the connection open so that when multiple documents are downloaded from the same server, they are transferred over the same TCP connection. This saves time and at the same time reduces the load on the server.

This option is useful when for some reason keep-alive connections do not work for you, for example due to a server bug or due to the inability of server scripts to cope with connections.

--no-cache
Disables server-side cache. In this case, Wget2 will send the appropriate directives (Cache-Control: no-cache and Pragma: no-cache) to the remote server to retrieve the file from the remote service rather than returning the cached version. This is especially useful for retrieving and cleaning up obsolete documents on proxy servers.
Caching is enabled by default.

--no-cookies
Disable the use of cookies. Cookies are a server-side mechanism for maintaining state. The server sends a cookie to the client using the "Set-Cookie" header, and the client responds with the same cookie to further requests. Because cookies allow server owners to track visitors and sites to share that information, they are considered by some to be a privacy violation. Cookies are used by default; however, saving cookies is disabled.

--load-cookies file
Load cookies from a file before the first HTTP(S) request. the file is a text file in the format originally used by Netscape's cookie.txt file.

Typically you will use this option when mirroring sites that require you to be logged in to access some or all of their content. The login process typically works by having the web server issue an HTTP cookie after receiving and validating your credentials. The cookie is then resent by the browser when you access that part of the site and thereby confirms your identity.

Mirroring such a site requires Wget2 to send the same cookies that your browser sends when communicating with the site. This is achieved using --load-cookies: Simply tell Wget2 the location of the cookies.txt file and it will send the same cookies that your browser would send in the same situation. Different browsers store text cookies in different places:

"Netscape 4.x". Cookies are located in ~/.netscape/cookies.txt.
"Mozilla and Netscape 6.x". The Mozilla cookie is also called cookies.txt and is located somewhere in the ~/.mozilla folder in your profile directory. The full path usually looks something like this: ~/.mozilla/default/some-weird-string/cookies.txt.
"Internet Explorer". You can create a cookie that Wget2 can use using the File, Import and Export, Export Cookies menu. This was tested with Internet Explorer 5; It is not guaranteed to work with earlier versions.
"Other browsers". If you use another browser to create cookies,--load-cookies will only work if you can find or create a cookie in the Netscape format that Wget2 expects.

If you can't use --load-cookies, there may be an alternative. If your browser supports a "cookie manager", you can use it to view the cookies used when accessing the site you are mirroring. Note down the cookie name and value and manually instruct Wget2 to send those cookies, bypassing the "official" cookie support:
wget2 --no-cookies --header "Cookie: <name>=<value>"

--save-cookies file
Save cookies to a file before exiting. This will not store expired or non-expired cookies (so-called "session cookies"), but see also the option --keep-session-cookies.

--keep-session-cookies
When specified, the option causes --save-cookies also store session cookies. Session cookies are usually not stored because they are designed to be stored in memory and are forgotten when you exit the browser. Saving them is useful on sites that require you to log in or visit the home page before you can access some pages. With this option, multiple runs of Wget2 are considered one browser session from the site's point of view.

Because the cookie format does not typically contain session cookies, Wget2 marks them with an expiration timestamp of 0.
Wget2 --load-cookies распознает их как файлы cookie сеанса, но это может сбить с толку другие браузеры. Также обратите внимание, что файлы cookie, загруженные таким образом, будут рассматриваться как другие файлы cookie сеанса, а это означает, что если вы хотите, чтобы --save-cookies снова их сохраняла, вы должны снова использовать --keep-session-cookies.

--cookie-suffixes=file
Load public suffixes used for cookie validation from the specified file.

Typically the libpsl core library loads this data from a system file or has the data built in. In some cases it may be necessary to download an updated PSL public suffix file, for example from public_suffix_list.dat

PSL allows you to prevent the installation of “super cookies” that lead to privacy leaks of cookies. More information can be found at https://publicsuffix.org/.

--ignore-length
Unfortunately, some HTTP servers (CGI programs to be precise) send bogus "Content-Length" headers, which causes Wget2 to go crazy because it thinks that not the entire document was received. You may notice this syndrome if Wget tries to fetch the same document over and over again, each time claiming that the (otherwise normal) connection was closed on the same byte.

With this option, Wget2 will ignore the "Content-Length" header as if it never existed.

--header=header-line
Send the header line along with the other headers in every HTTP request. The provided header is sent as is, which means it must contain the name and value separated by a colon, and must not contain newlines.

You can define more than one additional header by specifying –header more than once.
wget2 --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
https://example.com/

Specifying an empty string as the header value will clear the previous user-defined headers.

This option can be used to override headers that are otherwise automatically generated. This example tells Wget2 to connect to localhost, but specify example.com in the "Host" header:

wget2 --header="Host: example.com" http://localhost/

--max-redirect=number
Specify the maximum number of redirects for the resource. The default value is 20, which is usually much higher than needed. However, in cases where you want to allow more (or less), you can use this option.

--proxy-user=user, --proxy-password=password(Not implemented, use --http-proxy-password).
Specify the login user and password password for authentication on the proxy server. Wget2 will encode them using a "basic" authentication scheme.
Safety considerations such as those affecting --http-password, also apply here.

--referer=url
Includes the "Referer: url" header in the HTTP request. Useful for retrieving documents with server-side processing assuming they are always retrieved by interactive web browsers and only rendered properly if the Referer is set to one of the pages pointing to them.

--save-headers
Save the headers sent by the HTTP server before the actual content to a file with an empty line as delimiter.

-U agent-string, --user-agent=agent-string
Identify on the HTTP server using the following line “User-Agent”.

The HTTP protocol allows clients to identify themselves using a "User-Agent" header field. This allows WWW software to be differentiated, usually for statistical purposes or to track protocol violations. Wget is usually identified as Wget/version, where version is the current version number of Wget.

However, some sites are known to impose a policy of adapting the output according to the information provided by the "User-Agent". While this isn't such a bad idea in theory, it has been abused by servers denying information to clients other than (historically) Netscape or, more commonly, Microsoft Internet Explorer. This option allows you to change the "User-Agent" string returned by Wget. Using this option is not recommended unless you really know what you are doing.

--post-data=string, --post-file=file
Use POST as the method for all HTTP requests and send the specified data in the request body.--post-data отправляет строку в виде данных, тогда как --post-file sends the contents of the file. Otherwise they work exactly the same.

Specifically, both parameters expect content in the form "key1=value1&key2=value2" with percent encoding for special characters.
The only difference is that one expects its contents as a command line parameter, while the other accepts its contents from a file. In particular, --post-file is not intended to pass files as form attachments: they should appear as key=value data (with appropriate percentage encoding) just like everything else.

Currently Wget2 does not support "multipart/form-data" for POST data transfer; only "application/x-www-form-urlencoded". Only one of the parameters must be specified:--post-data или --post-file.

Note that wget2 does not require the content to be "key1=value1&key2=value2" and does not check for it. Wget2 will simply pass whatever data is provided to it. However, most servers expect POST data to be in the above format when processing HTML forms.

When sending a POST request using the parameter --post-file wget2 treats the file as a binary file and will send every character in the POST request without removing trailing newlines or page feeds. Any other control characters in the text will also be sent as is in the POST request.

ИмеKeep in mind that Wget2 needs to know the size of the POST data in advance. Therefore the argument --post-file must be a regular file; specifying FIFO or something like /dev/stdin will not work. It is not entirely clear how to get around this limitation inherent in HTTP/1.0. Although HTTP/1.1 introduced piecemeal transfers that do not require advance knowledge of the request length, a client cannot use this transfer unless it knows it is communicating with an HTTP/1.1 server. And he cannot know this until he receives an answer, which, in turn, requires fulfilling the request - the “chicken and egg problem.”

If Wget2 redirects after the POST request has completed, its behavior depends on the response code returned by the server. In the case of "301 permanently moved", "302 temporarily moved" or "307 temporarily redirected", Wget2 will continue to send the POST request according to RFC2616. If the server wants the client to change the request method when redirecting, it must send a "303 See Other" response code.

This example shows how to log into the server using POST, and then proceed to load the desired pages, presumably accessible only to authorized users:
#Login to the server. This can only be done once.
wget2 --save-cookies cookies.txt \
--post-data 'user=foo&password=bar' \
http://example.com/auth.php
#Now we capture the page or pages that are interesting to us.
wget2 --load-cookies cookies.txt \
-p http://example.com/interesting/article.php

If the server uses session cookies to track user authentication, the above will not work because --save-cookies не сохранит их (как и браузеры), а файл cookies.txt будет пустым. В этом случае используйте ---keep-session-cookies вместе с опцией --save-cookies to force session cookies to be saved.

--method=HTTP-Method
For RESTful scenarios, Wget2 allows you to send other HTTP methods without having to explicitly set them with --header=Header-Line. Wget2 будет использовать любую строку, переданную ему после --method, as an HTTP method for the server.

--body-data=Data-String, --body-file=Data-File
This option must be set when additional data needs to be sent to the server along with the method specified with the option --method. Ключ --body-data отправляет строку как данные, а --body-file sends the contents of the file. Otherwise they work exactly the same.

Currently option --body-file не предназначена для передачи файлов целиком. В настоящее время Wget2 не поддерживает «multipart/form-data» для передачи данных; только «application/x-www-form-urlencoded». В будущем это может быть изменено, чтобы wget2 отправлял --body-file как полный файл, а не отправлял его содержимое на сервер. Имейте в виду, что Wget2 необходимо заранее знать содержимое BODY Data, поэтому аргумент --body-file должен быть обычным файлом. См. --post-file для более подробного объяснения. Должен быть указан только один из --body-data и --body-file.

If Wget2 redirects after the request completes, Wget2 pauses the current method and sends a GET request until the redirect completes. This is true for all redirect response codes except 307 Temporary Redirect, which is used to explicitly indicate that the request method should not be changed. Another exception is when the method is set to "POST", in which case the redirection rules specified in the parameter are followed --post-data.

--content-disposition
When enabled, experimental (not fully functional) support for "Content-Disposition" headers is enabled. This can currently result in additional hits to the server for the "HEAD" request and is known to suffer from several bugs, so it is not currently enabled by default.

This option is useful for some CGI programs that load files, which use "Content-Disposition" headers to describe what the name of the loaded file should be.

--content-on-error
If this option is enabled, wget2 will not pass content when the server responds with an http status code indicating an error.

--save-content-on
After the equal sign, you need to specify a list of HTTP status codes, separated by commas, at which the content will be saved.

You can use '*' for ANY. An exclamation point (!) before the code means "exception".

Example 1:--save-content-on="*,!404" will save content with any HTTP status codes other than 404.

Example 2:--save-content-on=404 will only save content with an HTTP status code of 404.

Older option --content-on-error действует так же, как --save-content-on=*.

--trust-server-names
If this setting is enabled, the last component of the redirect URL will be used as the local file name when redirecting. By default, the last component of the source URL is used.

--auth-no-challenge
If this parameter is specified, Wget2 will send basic HTTP authentication information (unencrypted username and password) for all requests.

Using this option is not recommended and is only intended to support some obscure servers that never send HTTP authentication requests, but accept unsolicited authentication information, say, in addition to forms-based authentication.

--compression=TYPE
If this compression TYPE is specified (identity, gzip, deflate, xz, lzma, br, bzip2, zstd, lzip, or any combination thereof), Wget2 will set the "Accept-Encoding" header accordingly. --no-compression means no "Accept-Encoding" header at all. To set a custom "Accept-Encoding" value, use --no-compression в сочетании с --header="Accept-Encoding: xxx".

Compatibility note: Wget 1.X does not have the ways to specify compression type that Wget2 does.

--download-attr=[strippath|usepath]
The HTML5 download attribute can specify (or better: suggest) the filename from the href URL in the "a" and "area" tags. This option tells Wget2 to use this name when saving the file. Two possible values: `strippath' to strip the path from the filename. This is the default value.

Meaning usepath accepts a filename including directory. It is very dangerous and we cannot use it on untrusted inputs or servers without worry! Only use this if you really trust the input or the server.

HTTPS (SSL/TLS) options

To support HTTP (HTTPS) encrypted downloads, Wget2 must be compiled with external SSL library support. Currently, GnuTLS is used by default. Additionally, Wget2 also supports HSTS (HTTP Strict Transport Security). If Wget2 is compiled without SSL support, none of these options are available.

--secure-protocol=protocol
Select the secure protocol to be used (default: auto).

Allowed values auto, SSLv3, TLSv1, TLSv1_1, TLSv1_2, TLSv1_3 и PFS.

If used auto, the default TLS library mode is applied.

УкаKnowing SSLV3 forces you to use SSL3. This is useful when dealing with older and buggy SSL server implementations that find it difficult to correctly select the TLS protocol version using the underlying TLS library.

Specifying PFS ensures compliance with the so-called Perfect Forward Security sets. In short, PFS adds the security of generating a one-time key for each TLS connection. Which puts a little more pressure on the client and server CPU. Familiar to us as secure ciphers (for example, without MD4) and the TLS protocol.

TLSV1 enables TLS1.0 or higher. TLSV1_1 enables TLS1.1 or higher. TLSV1_2 enables TLS1.2 or higher. TLSV1_3 enables TLS1.3 or higher.

Any other protocol string is passed directly to the TLS library, currently Gnutls, as a "precedence" or "cipher" string.
This option is for users who understand what they are doing.

--https-only
In recursion mode, the program will only follow HTTPS links.

--no-check-certificate
Do not check the server certificate against available certificate authorities. Can also be used if the host URL name does not match the common name represented by the certificate.

By default, server certificate verification is performed against certificate authorities, breaking the SSL handshake and aborting boot if certificate verification fails. While this provides more secure downloads, it breaks compatibility with some sites that worked with previous versions of WGET, especially those that use self-signed, expired, or otherwise invalid certificates. This option forces an "insecure" mode of operation, which turns certificate verification errors into warnings and allows you to continue.

If you encounter "certificate verification" errors or mentions that "the common name does not match the requested hostname," you can use this option to bypass the verification and continue downloading. Use this option only if you are convinced of the site's authenticity, or if you really don't care about the validity of its certificate. It's almost always a bad idea to not check certificates when transferring sensitive or important data. For myself

Self-signed/internal certificates: You should download the certificate and check against it instead of forcing this insecure mode. If you are really sure that you do not want any certificate verification, you can specify --check-certificate=quiet to tell WGET2 not to print any warnings about invalid certificates, although this is incorrect in most cases.

--certificate=file
Use the client certificate stored in the file. This option is required for servers that are configured to require certificates from clients that connect to them. Typically no certificate is required and this switch is optional.

--certificate-type=type
Indicates the type of client certificate. The perceived values ​​are PEM (default) or DER, also known as ASN1.

--private-key=file
Read a private key from a file. This option allows you to provide the private key in a file separate from the certificate.

--private-key-type=type
Specify the private key type. Perceived values ​​of PEM (default) and DER.

--ca-certificate=file
Use a file that stores a bunch of certification authority certificates (“CA”) to verify the parties. Certificates must be in PEM format.

Without this option specified, Wget2 looks for CA certificates in the system locations (folders) selected during OpenSSL installation.

--ca-directory=directory
Specifies the directory containing certification authority (CA) certificates in PEM format. Each file contains one certificate authority (CA) certificate, and the file name is derived from the hash value of that certificate file. This is achieved by processing the certificate directory with the "C_REHASH" utility provided with OpenSSL. Using --ca-directory is more efficient than --ca-certificate when many certificates are installed because it allows WGET2 to obtain certificates on demand.

Without this option, WGET2 looks for CA certificates in the system locations selected during OpenSSL installation.

--crl-file=file
Specifies the certificate revocation file (CRL). It is required to indicate certificates that have been revoked by certification authorities (CAs).

--random-file=file
(Only for OpenSSL and LibreSSL). Use the file as a source of random data for pseudo-random number generator seeds on a system without the /dev/urandom device.

On such systems, the SSL library requires an external source of randomness for initialization. The randomness can be provided by the EGD (see –EGD below) or read from an external source specified by the user. If this option is not specified, WGET2 looks for random data in $randfile or, if this is incorrect, in $home/.rnd.

If you get the error “Could not seed OpenSSL PRNG; disabling SSL.”, you should provide random data using some of the methods described above.

--egd-file=file
[OpenSSL only] Use the file as an EGD socket. The acronym EGD stands for Entropy Gathering Daemon, a user space program that collects data from various unpredictable system sources and makes it possible for other programs that use encryption to use entropy, such as the SSL library, which requires sources of non-repeating random values ​​to seed the random number generator used to create cryptographically strong keys.

OpenSSL allows the user to specify their own entropy source using the "RAND_FILE" environment variable. If this variable is not set, or if the specified file does not provide sufficient randomness, OpenSSL will read random data from the EGD Socket specified using this option.

If this option is not specified (and the equivalent run command is not used), EGD is never associated. EGD is not required on modern UNIX systems that support /dev/urandom.

--hsts
WGET2 supports HSTS (HTTP Strict Transport Security, RFC 6797) by default. Use --no-hsts to force WGET2 to act as a non-HSTS-compatible user agent. As a consequence, WGET2 will ignore all “Strict-Transport-Security” headers and will not enforce any existing HSTS policy.

--hsts-file=file
By default, WGET2 stores its HSTS data in $xdg_data_home/wget/.wget-hsts or, if xdg_data_home is not set, in ~/.lo-cal/wget/.wget-hsts. You can use --hsts-file to override this.

WGET2 will use the supplied file as the HSTS database. Such a file must conform to the correct HSTS database format used by WGET. If WGET2 cannot parse the supplied file, the behavior is undefined.

To disable persistent storage, use --no-hsts-file.

The Wget2 HSTS database is a simple text file. Each line contains an HSTS entry (i.e., the site that issued the “Strict-Transport-Security” header and therefore specified the specific HSTS policy to be applied). Lines that begin with a dash ("#"), are ignored by Wget. Please note that despite this human-readable form, manually patching the HSTS database is generally not a good idea.

The HSTS input line consists of several fields separated by one or more spaces:

имя_хоста ПРОБЕЛ порт ПРОБЕЛ включать_поддомены ПРОБЕЛ создано ПРОБЕЛ максимальный_возраст

ПолThe hostname and port specifies the hostname and port to which this HSTS policy applies. The "port" field can be empty, and will be empty in most cases. This means that the port number will not be taken into account when deciding whether that HSTS policy should be applied on a given request (only the hostname will be evaluated). When the port is not empty, both the target hostname and the port will be evaluated, and the HSTS policy will only be applied if both the host port and the port in the file match. This feature has been enabled for testing/development purposes only. TestSuite WGET2 (in TestenV/) creates HSTS databases with explicit ports to ensure correct Wget2 behavior. Applying HSTS policies to non-default ports, RFC 6797 (see Appendix B, "Differences between 'HSTS Policy and Same-origin Policy'"). Therefore, this functionality should not be used in production environments and the port will usually be empty.

The last three fields do what is expected of them. The "include_subdomains" field can be either 1 or 0, and it signals whether subdomains of the target domain should also be part of this HSTS policy. The "created" and "max_age" fields contain the timestamp when the record was created (first seen by WGET) and the HSTS-defined value max-age, which states how long this HSTS policy remains active, measured in seconds since the last timestamp was created, which is stored in the "created" field. Once this time has passed, this HSTS policy will no longer be valid and will eventually be deleted from the database.

If you provide your own HSTS database via the option --hsts-file, be aware that WGET2 may change the provided file if any change occurs between the HSTS policies requested by the remote servers and the policies in the file. When Wget2 exits, it effectively updates the HSTS database by rewriting the database file with new entries.

If the provided file does not exist, WGET2 will create it. This file will contain the new HSTS records. If no HSTS records were generated (no “Strict-Transport-Security” headers were sent by any of the servers), then the file will not be created, even if it is empty. This behavior applies to the default database file (~/.wget-HSTS): it will not be created until some server causes the HSTS policy to be applied.

Be careful not to override possible changes made by other WGET2 processes at the same time to the HSTS database. Before flushing updated HSTS entries in a file, WGET2 rereads it and merges the changes.

Using a custom HSTS database and/or modifying an existing one. For more information about the potential security risks arising from this practice, see Section 14, "Security Considerations," of RFC 6797, especially Section 14.9, "Creative Manipulation of the HSTS Policy Set."

--hsts-preload
Enables loading of the HSTS preload list according to libhsts support. (Default: enabled if built with Libhsts).

--hsts-preload-file=file
If Wget2 is built with libhsts support, WGET2 uses the HSTS data provided by the installer. If it is not included in the distribution or if you want to upload your own file, use this option.

The data in this file must be in DAFSA format as generated by the libhsts program of the hsts-make-dafsa package.

--hpkp
Enable HTTP Public Key Pinning (HPKP) (default: enabled).

This Trust On First Use (TOFU) mechanism adds another layer of security to HTTPS (see RFC 7469).

The certificate key data of the previously established TLS session will be compared with the current data. In case both do not match, the connection will be terminated.

--hpkp-file=file
By default, WGET2 stores its HPKP data in $ xdg_data_home/wget/.wget-hpkp or, if xdg_data_home is not set, in ~/.lo-cal/wget/.wget-hpkp. You can use --hpkp-file to override this behavior.

WGET2 will use the specified file as the HPKP database. Such a file must conform to the correct HPKP database format used
Wget. If WGET2 cannot parse the supplied file, the behavior is undefined.

To disable storage, use --no-hpkp-file.

--tls-resume
Enable TLS session resumption, which is disabled by default.

To resume a TLS session, data from a previously established TLS session is required.

There are several security flaws associated with TLS 1.2 session resumption, which are explained in detail at the address.

--tls-session-file=file
By default, Wget2 stores its TLS session data in $xdg_data_home/wget/.wget-session or, if xdg_data_home is not set, in
~/.local/wget/.wget-session. You can use --tls-session-file to override it.

WGET2 will use the specified file as the TLS session database. Such a file must conform to the correct TLS session database format used by WGET. If WGET2 cannot parse the supplied file, the behavior is undefined.

To disable persistent storage, use --no-tls-session-file.

--tls-false-start
Enable TLS False Start (default: enabled).

This option reduces TLS negotiations by one round trip and thus speeds up HTTPS connections.

More information at https://tools.ietf.org/html/rfc7918.

--check-hostname
Enable TLS SNI (Server Name Indication) verification (default: enabled).

--ocsp
Enable OCSP access to the server to check whether the server's HTTPS certificates can be revoked (default: enabled).

This procedure is quite slow (connect to server, HTTP request, response) and thus we support OSCP stapling (server sends OCSP response in TLS handshake) and provide persistent OCSP caching.

--ocsp-date
Check that the OCSP response is too old. (default: enabled)

--ocsp-nonce
Allow nonce checking when validating an OCSP response. (default: enabled)

--ocsp-server
Specify the OCSP server address (default: OCSP server specified in the certificate).

--ocsp-stapling
Enable OCSP stapling support (default: enabled).

--ocsp-file=file
By default, WGET2 stores its TLS session data in $xdg_data_home/wget/.wget-ocsp или, если xdg_data_home не установлен, в ~/.local/wget/.wget-ocsp. Вы можете использовать --ocsp-file to override this.

WGET2 will use the specified file as the OCSP database. Such a file must conform to the correct OCSP database format used by WGET. If WGET2 cannot parse the supplied file, the behavior is undefined.

To disable persistent OCSP caching, use --no-ocsp-file.

--dane(experimental feature)
Enable support for DANE certificate verification (default: disabled).

In the event that server verification fails due to missing CA certificates (for example, an empty certification pool), this option allows verification of TLSA DNS records through DANE.

You should install DNSSEC to avoid MITM attacks. Additionally, the destination DNS host records must be configured for DANE.

Warning: This option or its behavior may change or may be removed without further notice.

--http2
Enable HTTP/2 protocol support (default: enabled).

Wget2 requests HTTP/2 via ALPN. If available, it uses it instead of HTTP/1.1. Up to 30 threads are used in parallel within one connection.

--http2-only

Resist and use only HTTP/2 connections and throw an error if the server does not accept this. Mainly for testing.

--https-enforce=mode
Sets how to handle URLs that do not have an explicit scheme specified (those with a scheme other than https://) (default mode: none)

mode=none
Use HTTP mode in URLs without a scheme. The recursive operation will use the schema of the parent document.

mode=soft
Try HTTPS first if the HTTP scheme is not specified. If there is an error, return to the backup HTTP.

mode=hard
Use HTTPS only, regardless of whether the HTTP scheme is specified or not. Do not fall back to fallback HTTP.


Plugin options

--list-plugins
Print a list of all available plugins and exit.

--local-plugin=file
Upload the file as a plugin.

--plugin=name
Load a plugin with the specified name from the plugin directories specified in the configuration.

--plugin-dirs=directories
Set up plugin directories. Plugin directories in the list are separated by commas.

--plugin-help
Print help for all downloaded plugins.

--plugin-opt=option
Specify plugin-specific parameters

Plugin parameters "option" are specified in the format <plugin_name>.<option>[=value].

Environment: proxy servers

Wget2 supports retrieval proxies over both protocols, HTTP and HTTPS. The standard way to specify a proxy location that WGET recognizes is using the following environment variables:

  • http_proxy
  • https_proxy

If specified, the http_proxy and https_proxy variables must contain the proxy URLs for HTTP and HTTPS connections, respectively.

no_proxy

This variable should contain a comma-separated list of domains for which proxies should not be used. For example, if the value of no_proxy is .example.com, the proxy will not be used to retrieve documents from *.example.com.

Completion codes

WGET2 can return one of several error codes if it encounters problems.

0 there were no problems.
1 general error code.
2 parsing error. For example, an error when parsing command line parameters or .wget2rc or .netrc files...
3 file input/output error.
4 network failure.
5 SSL verification failed.
6 Username/password authentication failed.
7 protocol error.
8 The server responded with an error code.
9 The public key is missing from Keyring.
10 Signature verification failed.

With the exception of 0 and 1, low-number exit codes take precedence over higher-numbered ones when multiple error types are encountered.

Launch file

Вы You may want to permanently change the default behavior of GNU WGET2. There's a better way to do this than by setting a command alias in your shell. GNU WGET2 allows you to set all parameters permanently through its .WGET2RC startup file.

While .WGET2RC is the main initialization file used by GNU WGET2, it is not a good idea to store passwords in this file.
This is because the startup file can be publicly readable or archived under version control. This is why WGET2 also reads the contents of the $Home/.NETRC file when needed.

The .WGET2RC file follows very similar syntax to .WGETRC, which is read by GNU WGET. It differs only in places where the command line options vary between wget1.x and wget2.

Location of Wget2rc

When initialized, WGET2 will attempt to read the "global" startup file, which by default is located at /usr/local/etc/wget2rc' (или какой -то префикс, отличный от/usr/local' if Wget2 was not installed there). The global startup file is useful for system administrators to enforce default policies such as setting the certificate store path, preloading the HSTS list, etc.

Wget2 will then look for the user initialization file. If the user used the command line option --config wget2 will try to download the file it points to. If the file does not exist, or if it cannot be read, WGET2 will make no further attempt to read any initialization files.

If the WGET2RC environment variable is set, WGET2 will try to download the file from the specified path. If the file does not exist, or if it cannot be read, WGET2 will make no further attempt to read the initialization file.

If -config fails and WGET2RC is not installed, WGET2 will attempt to load the user initialization file from the location as defined in the XDG base directory specification. It will read the first, and only the first file it finds from the following locations:

1.$XDG_CONFIG_HOME/wget/wget2rc

2.$HOME/.config/wget/wget2rc

3.$HOME/.wget2rc

The initialization file location in $home/.wget2rc is deprecated. If a file is found there, WGET2 will print a warning about it. Support for reading from this file will be removed in the future.

The fact that the user settings are loaded after the global one means that in the event of a conflict, the user's WGET2RC will override the global WGET2RC.

Errors

You can submit bug reports via The Gnu Wget2 Tracker (https://gitlab.com/gnuwget/wget2/issues).

Before you actually submit a bug report, try following a few simple guidelines.

  1. Please try to find out that the behavior you are seeing is actually a bug. If Wget2 crashes, it's an error. If Wget2 doesn't behave as documented, it's a bug. If things work strangely but you're not sure how they're supposed to work, it could very well be a bug, but you might want to double-check the documentation and mailing lists.

  2. Try to reproduce the error under the simplest possible circumstances. For example. If WGET2 crashed while loading WGET2 -RL0 -KKE -T5 --no-proxy https://example.com -o/tmp/log, you should try to see if the crash recurs and will happen with a simpler set of options. You can even try to start loading on the page where the crash occurred to see if that page somehow caused the crash.

Also, while I'd probably be interested in knowing the contents of your .WGET2RC file, just dumping it into a debug message is probably a bad idea. Instead, you should first try to see if the error reoccurs if .WGET2RC gets out of the way. Only if it turns out that the .wget2rc settings are contributing to the error, please email me the relevant parts of the file.

  1. Please run wget2 with the option -d and send us the resulting output (or relevant parts of it). If Wget2 was compiled without debugging support, recompile it. It is much easier to track errors using debugging.

NOTE. Be sure to remove any potentially sensitive information from the debug log before submitting it to the bug address. -D will not go out of its way to collect sensitive information, but the log will contain a fairly complete transcription of Wget2's communication with the server, which may include passwords and parts of the downloaded data. Since the bug address is public, you can assume that all bug reports are visible to the public.

  1. If Wget2 is broken, try running it in a debugger like GDB What Wget core and enter "where" to get the Backtrace. This may not work if the system administrator has disabled the core files, but it is safe to try.

Author

Wget2, written by Tim Ruehsen tim.ruehsen@gmx.de
Wget 1.x, originally written by Hrvoje Nikthić hniksic@xemacs.org

Copyright

Copyright (c) 2012-2015 Tim Rühsen

Copyright (C) 2015-2022 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU License, version 1.3 or any later version published by the Free Software Foundation; no invariant sections, no front-cover texts, and no back-cover texts. A copy of the license is included in the section called “GNU Free Documentation License”.

GNU Wget2 User's Guide



Related publications