Info: (wget) Recursive Accept/Reject Options

Info Catalog
wget: Recursive Retrieval Options
wget: Invoking
wget: Exit Status
wget: Recursive Accept/Reject Options

 
 2.12 Recursive Accept/Reject Options
 ====================================
 
 ‘-A ACCLIST --accept ACCLIST’
 ‘-R REJLIST --reject REJLIST’
      Specify comma-separated lists of file name suffixes or patterns to
      accept or reject (⇒Types of Files).  Note that if any of the
      wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of
      ACCLIST or REJLIST, it will be treated as a pattern, rather than a
      suffix.  In this case, you have to enclose the pattern into quotes
      to prevent your shell from expanding it, like in ‘-A "*.mp3"’ or
      ‘-A '*.mp3'’.
 
 ‘--accept-regex URLREGEX’
 ‘--reject-regex URLREGEX’
      Specify a regular expression to accept or reject the complete URL.
 
 ‘--regex-type REGEXTYPE’
      Specify the regular expression type.  Possible types are ‘posix’ or
      ‘pcre’.  Note that to be able to use ‘pcre’ type, wget has to be
      compiled with libpcre support.
 
 ‘-D DOMAIN-LIST’
 ‘--domains=DOMAIN-LIST’
      Set domains to be followed.  DOMAIN-LIST is a comma-separated list
      of domains.  Note that it does _not_ turn on ‘-H’.
 
 ‘--exclude-domains DOMAIN-LIST’
      Specify the domains that are _not_ to be followed (⇒Spanning
      Hosts).
 
 ‘--follow-ftp’
      Follow FTP links from HTML documents.  Without this option, Wget
      will ignore all the FTP links.
 
 ‘--follow-tags=LIST’
      Wget has an internal table of HTML tag / attribute pairs that it
      considers when looking for linked documents during a recursive
      retrieval.  If a user wants only a subset of those tags to be
      considered, however, he or she should be specify such tags in a
      comma-separated LIST with this option.
 
 ‘--ignore-tags=LIST’
      This is the opposite of the ‘--follow-tags’ option.  To skip
      certain HTML tags when recursively looking for documents to
      download, specify them in a comma-separated LIST.
 
      In the past, this option was the best bet for downloading a single
      page and its requisites, using a command-line like:
 
           wget --ignore-tags=a,area -H -k -K -r http://SITE/DOCUMENT
 
      However, the author of this option came across a page with tags
      like ‘<LINK REL="home" HREF="/">’ and came to the realization that
      specifying tags to ignore was not enough.  One can’t just tell Wget
      to ignore ‘<LINK>’, because then stylesheets will not be
      downloaded.  Now the best bet for downloading a single page and its
      requisites is the dedicated ‘--page-requisites’ option.
 
 ‘--ignore-case’
      Ignore case when matching files and directories.  This influences
      the behavior of -R, -A, -I, and -X options, as well as globbing
      implemented when downloading from FTP sites.  For example, with
      this option, ‘-A "*.txt"’ will match ‘file1.txt’, but also
      ‘file2.TXT’, ‘file3.TxT’, and so on.  The quotes in the example are
      to prevent the shell from expanding the pattern.
 
 ‘-H’
 ‘--span-hosts’
      Enable spanning across hosts when doing recursive retrieving (⇒
      Spanning Hosts).
 
 ‘-L’
 ‘--relative’
      Follow relative links only.  Useful for retrieving a specific home
      page without any distractions, not even those from the same hosts
      (⇒Relative Links).
 
 ‘-I LIST’
 ‘--include-directories=LIST’
      Specify a comma-separated list of directories you wish to follow
      when downloading (⇒Directory-Based Limits).  Elements of
      LIST may contain wildcards.
 
 ‘-X LIST’
 ‘--exclude-directories=LIST’
      Specify a comma-separated list of directories you wish to exclude
      from download (⇒Directory-Based Limits).  Elements of LIST
      may contain wildcards.
 
 ‘-np’
 ‘--no-parent’
      Do not ever ascend to the parent directory when retrieving
      recursively.  This is a useful option, since it guarantees that
      only the files _below_ a certain hierarchy will be downloaded.
      ⇒Directory-Based Limits, for more details.
Info Catalog
wget: Recursive Retrieval Options
wget: Invoking
wget: Exit Status