Download Options

block_download_sub_folders

Type
Default

bool

false

When this is set to true, downloads that would be in a folder structure like:

Downloads/folderA/folderB/folderC/image.jpg

will be changed to:

Downloads/folderA/image.jpg

disable_download_attempts

Type
Default

bool

false

By default the program will retry a download 10 times. You can set this to true to disable it and always retry until the download completes.

However, to make sure the program will not run endlessly, there are certain situations where a file will never be retried, like if the program receives a 404 HTTP status, meaning the link is dead.

disable_file_timestamps

Type
Default

bool

false

By default the program will do it's absolute best to try and find the upload date of a file. It'll then set the last modified and last accessed dates on the file to match. On Windows and macOS, it will also try to set the created date.

Setting this to true will disable this function, and the dates for those metadata entries will be the date the file was downloaded.

include_album_id_in_folder_name

Type
Default

bool

false

Setting this to true will include the album ID (random alphanumeric string) of the album in the download folder name.

include_thread_id_in_folder_name

Type
Default

bool

false

Setting this to true will include the thread ID (random alphanumeric string) of the forum thread in the download folder name.

maximum_number_of_children

Type
Default

list[NonNegativeInt]

[]

Limit the number of items to scrape using a tuple of up to 4 positions. Each position defines the maximum number of sub-items (children_limit) a specific type of scrape_item will have:

  1. Max number of children from a FORUM URL

  2. Max number of children from a FORUM POST

  3. Max number of children from a FILE HOST PROFILE

  4. Max number of children from a FILE HOST ALBUM

Using 0 on any position means no limit on the number of children for that type of scrape_item. Any tailing value not supplied is assumed as 0

Examples

Limit FORUM scrape to 15 posts max, grab all links and media within those posts, but only scrape a maximum of 10 items from each link in a post:

--maximum-number-of-children 15 0 10

remove_domains_from_folder_names

Type
Default

bool

false

Setting this to true will remove the "(DOMAIN)" portion of folder names on new downloads.

remove_generated_id_from_filenames

Type
Default

bool

false

Setting this to true will remove the alphanumeric ID added to the end of filenames by some websites.

This option only works for URLs from cyberdrop.me at the moment.

Multipart archive filenames will be corrected to follow the proper naming pattern for their format.

Supported formats: .rar .7z .tar .gz .bz2 .zip

scrape_single_forum_post

Type
Default

bool

false

Setting this to true will prevent Cyberdrop-DL from scraping an entire thread if the input URL had an specific post in it.

CDL will only download files within that post.

For most forum sites, the post id is part of the fragment in the URL.

ex: /thread/iphone-16-16e-16-plus-16-pro-16-promax.256047/page-64#post-7512404 has a post id of 7512404

If scrape_single_forum_post is false, CDL will download all post in the thread, from post 7512404 until the last post If scrape_single_forum_post is true, CDL will only download file within post 7512404 itself and stop.

separate_posts

Type
Default

bool

false

Setting this to true will separate content from forum and site posts into separate folders.

This option only works with sites that have 'posts':

  • Forums

  • Discourse

  • reddit

  • coomer, kemono and nekohouse.

For some sites, this value is hardcorded to true because each post is always an individual page:

  • Wordpress

  • eFukt

separate_posts_format

Type
Default

NonEmptyStr

{default}

This is the format for the directory created when using --separate-posts.

Unique Path Flags:

date: date of the post. This is a python datetime object

id: The post id. This is always a string, even if some sites use numbers

number: This no longer means anything. Currently, it always has the same value as id

title: post title. This is a string

Setting it to {default} will use the default format, which is different for each crawler:

Site
Default Format

Coomer, Kemono and Nekohouse

{date} - {title}

Forums (Xenforo/vBulletin/Invision)

{date} - {id} - {title}

Discourse

{date} - {id} - {title}

Reddit

{title}

WordPress

{date:%Y-%m-%d} - {id} - {title}

eFukt

{date:%Y-%m-%d} {title}

A date without a format_spec defaults to ISO 8601 format

You can use any valid format string supported by python, with the following restrictions:

  • You can not have positional arguments in the format string. ex: post {0} from date {1}

  • You can not have unnamed fields in the format string. ex: post {} from date {}

  • You can not perform operations within the format string. ex: post {id + 1} from date {date}

  • All the fields named in the format string must be valid fields for that format option. CDL will validate this at startup

skip_download_mark_completed

Type
Default

bool

false

Setting this to true will skip the download process for every file and mark them as downloaded in the database.

skip_referer_seen_before

Type
Default

bool

false

Setting this to true will skip downloading files from any referer that have been scraped before. The file (s) will always be skipped, regardless of whether the referer was successfully scraped or not

maximum_thread_depth

Type
Default

NonNegativeInt

0

Restricts how many levels deep the scraper is allowed to go while scraping a thread

Values:

  • 0: No nesting allowed, only the top level thread is allowed

  • 1+: limits to the value given

This setting is hardcoded to false for Discourse sites

Example

Consider CDL finds the following sub-threads while scraping an input URL:

└── thread_01
    β”œβ”€β”€ thread_02
    β”œβ”€β”€ thread_03
    β”‚   β”œβ”€β”€ thread_09
    β”‚   β”œβ”€β”€ thread_10
    β”‚   └── thread_11
    β”œβ”€β”€ thread_04
    β”œβ”€β”€ thread_05
    β”œβ”€β”€ thread_06
    β”œβ”€β”€ thread_07
    β”‚   └── thread_12
    └── thread_08
  • With maximum_thread_depth = 0, CDL will only download files in thread_01, all the other threads will be ignored

  • With maximum_thread_depth = 1, CDL will only download files in thread_01 to thread_08. All threads from thread_09 to thread_12 will be ignored

  • With maximum_thread_depth >= 2, CDL will download files from all the threads in this case

Last updated