Download Options
block_download_sub_folders
block_download_sub_folders
bool
false
When this is set to true
, downloads that would be in a folder structure like:
Downloads/folderA/folderB/folderC/image.jpg
will be changed to:
Downloads/folderA/image.jpg
disable_download_attempts
disable_download_attempts
bool
false
By default the program will retry a download 10 times. You can set this to true
to disable it and always retry until the download completes.
However, to make sure the program will not run endlessly, there are certain situations where a file will never be retried, like if the program receives a 404
HTTP status, meaning the link is dead.
disable_file_timestamps
disable_file_timestamps
bool
false
By default the program will do it's absolute best to try and find the upload date of a file. It'll then set the last modified
and last accessed
dates on the file to match. On Windows and macOS, it will also try to set the created
date.
Setting this to true
will disable this function, and the dates for those metadata entries will be the date the file was downloaded.
include_album_id_in_folder_name
include_album_id_in_folder_name
bool
false
Setting this to true
will include the album ID (random alphanumeric string) of the album in the download folder name.
include_thread_id_in_folder_name
include_thread_id_in_folder_name
bool
false
Setting this to true
will include the thread ID (random alphanumeric string) of the forum thread in the download folder name.
maximum_number_of_children
maximum_number_of_children
list[NonNegativeInt]
[]
Limit the number of items to scrape using a tuple of up to 4 positions. Each position defines the maximum number of sub-items (children_limit
) a specific type of scrape_item
will have:
Max number of children from a FORUM URL
Max number of children from a FORUM POST
Max number of children from a FILE HOST PROFILE
Max number of children from a FILE HOST ALBUM
Using 0
on any position means no limit on the number of children for that type of scrape_item
. Any tailing value not supplied is assumed as 0
Examples
Limit FORUM scrape to 15 posts max, grab all links and media within those posts, but only scrape a maximum of 10 items from each link in a post:
--maximum-number-of-children 15 0 10
remove_domains_from_folder_names
remove_domains_from_folder_names
bool
false
Setting this to true
will remove the "(DOMAIN)" portion of folder names on new downloads.
remove_generated_id_from_filenames
remove_generated_id_from_filenames
bool
false
Setting this to true
will remove the alphanumeric ID added to the end of filenames by some websites.
This option only works for URLs from cyberdrop.me
at the moment.
Multipart archive filenames will be corrected to follow the proper naming pattern for their format.
Supported formats: .rar
.7z
.tar
.gz
.bz2
.zip
scrape_single_forum_post
scrape_single_forum_post
bool
false
Setting this to true
will prevent Cyberdrop-DL from scraping an entire thread if the input URL had an specific post in it.
CDL will only download files within that post.
For most forum sites, the post id is part of the fragment in the URL.
ex: /thread/iphone-16-16e-16-plus-16-pro-16-promax.256047/page-64#post-7512404
has a post id of 7512404
If scrape_single_forum_post
is false
, CDL will download all post in the thread, from post 7512404
until the last post If scrape_single_forum_post
is true
, CDL will only download file within post 7512404
itself and stop.
separate_posts
separate_posts
bool
false
Setting this to true
will separate content from forum and site posts into separate folders.
This option only works with sites that have 'posts':
Forums
Discourse
reddit
coomer
,kemono
andnekohouse
.
For some sites, this value is hardcorded to true
because each post is always an individual page:
Wordpress
eFukt
separate_posts_format
separate_posts_format
NonEmptyStr
{default}
This is the format for the directory created when using --separate-posts
.
Unique Path Flags:
date
: date of the post. This is a pythondatetime
object
id
: The post id. This is always astring
, even if some sites use numbers
number
: This no longer means anything. Currently, it always has the same value asid
title
: post title. This is astring
Not all sites support all possible flags. Ex: Posts from reddit only support the title
flag If you use a format with a field that the site does not support, CDL will replace it with UNKNOWN_<FIELD_NAME>
ex: using the format reddit post #{number}
-> reddit post #UNKNOWN_ID
Setting it to {default}
will use the default format, which is different for each crawler:
Coomer
, Kemono
and Nekohouse
{date} - {title}
Forums (Xenforo/vBulletin/Invision)
{date} - {id} - {title}
Discourse
{date} - {id} - {title}
Reddit
{title}
WordPress
{date:%Y-%m-%d} - {id} - {title}
eFukt
{date:%Y-%m-%d} {title}
A date without a format_spec
defaults to ISO 8601 format
You can use any valid format string supported by python, with the following restrictions:
You can not have positional arguments in the format string. ex:
post {0} from date {1}
You can not have unnamed fields in the format string. ex:
post {} from date {}
You can not perform operations within the format string. ex:
post {id + 1} from date {date}
All the fields named in the format string must be valid fields for that format option. CDL will validate this at startup
skip_download_mark_completed
skip_download_mark_completed
bool
false
Setting this to true
will skip the download process for every file and mark them as downloaded in the database.
skip_referer_seen_before
skip_referer_seen_before
bool
false
Setting this to true
will skip downloading files from any referer that have been scraped before. The file (s) will always be skipped, regardless of whether the referer was successfully scraped or not
maximum_thread_depth
maximum_thread_depth
NonNegativeInt
0
It is not recommended to set this above the default value of 0
, as there is a high chance of infinite nesting in certain cases.
For example, when dealing with Megathreads, if a Megathread is linked to another Megathread, you could end up scraping an undesirable amount of data.
Restricts how many levels deep the scraper is allowed to go while scraping a thread
Values:
0
: No nesting allowed, only the top level thread is allowed1+
: limits to the value given
Example
Consider CDL finds the following sub-threads while scraping an input URL:
βββ thread_01
βββ thread_02
βββ thread_03
β βββ thread_09
β βββ thread_10
β βββ thread_11
βββ thread_04
βββ thread_05
βββ thread_06
βββ thread_07
β βββ thread_12
βββ thread_08
With
maximum_thread_depth
= 0, CDL will only download files inthread_01
, all the other threads will be ignoredWith
maximum_thread_depth
= 1, CDL will only download files inthread_01
tothread_08
. All threads fromthread_09
tothread_12
will be ignoredWith
maximum_thread_depth
>= 2, CDL will download files from all the threads in this case
Last updated