PythonでYoutubeをダウンロードする - 疑問解決記『実体験に基づきながら』

各箇所のコード説明
- 次に動画取得エラー時に無視して続行する、設定を追加する

pythonのAPIでyoutube-dlを使っていた所、以下の問題点に遭遇。

①　カレントディレクトリ以外の指定方法がわからない

②　ファイル名がid+mp4でmp4形式にならない

③　エラー無視させてダウンロードできない

全部の解決方法を書いていきます。

youtubu-dlで動画をダウンロードするスクリプトは以下
開発環境でコピペして実行してみてください。

- - - - -

　# ライブラリの読み込み
　import youtube_dl

　# 初期設定
　directory="download/"#・・・①
　ydl = youtube_dl.YoutubeDL({'outtmpl': directory+'%(id)s.%(ext)s',"ignoreerrors": True})#・・・①",②

　# ダウンロードする動画のURL
　url = 'DLしたいURL'・・・③
　# 動画情報をダウンロードする
　with ydl:
　 result = ydl.extract_info(#・・・④
　 url,
　 download=True
　 )

- - - - -

各箇所のコード説明

まずディレクトリの指定方法は至極単純だった。

保存するときにディレクトリを動画保存名の前に付け加えるだけで可能だった。
(例 download/+id.mp4)

①のdirectoryはファイルパスを変数として保持

現在だとカレントディレクトリ内にdownloadフォルダを作る設定になっている。

desktopに置きたい場合はdirectory変数の値を

『C:\Users\あなたのパソコンのユーザー名\Desktop\』にすればデスクトップに置かれるだろう。

①"

'outtmpl': directory+'%(id)s.%(ext)s'

(id)sはyoutubeObjectが保持する動画idを意味し、(ext)sは拡張子を意味し、id.mp4のように整形される。

プレースホルダみたいなものだと思う。

【.】がないとidmp4となり正しくファイルが出力されないため注意、id.mp4と直すと動画になる。

この他にも動画名に付けることが出来る変数を多数保持している

使える変数は以下https://masayoshi-9a7ee.hatenablog.com/entry/20150905/1441414821

他サイトも対応できるように、youtubeで扱わない変数も存在している。

id：動画ID
title：動画タイトル
url：動画URL
ext：拡張子
alt_title：サブタイトル
display_id：動画IDと何が違うのか不明
uploader：動画upload者のフルネーム
license：動画のライセンス
creator：動画の制作者
release_date：YYYYMMDDで表される動画のrelease日時
timestamp：動画が公開されたtimestamp
upload_date：YYYYMMDDで表される動画のupload日時
uploader_id：動画upload者のIDもしくはニックネーム
location：動画が撮影された場所
duration：動画の長さ
view_count：動画の閲覧数
like_count：動画のポジティブな評価数
dislike_count：動画のネガティブな評価数
repost_count：動画のrepostつまり引用の数
average_rating：動画の平均評価数
comment_count：動画のコメント数
age_limit：動画の制限年齢
is_live：live streamつまり生放送なのかどうか
start_time：URLで指定されている再生開始時間とあるがyoutubeでたまに見る奴の事だろうか。
end_time：URLで指定されている再生終了時間。
format：動画フォーマット
format_id：動画フォーマットコード
format_note：動画フォーマットの追加情報
width：動画解像度の幅
height：動画解像度の高さ
resolution：動画解像度
tbr：音声と映像の平均bitrate
abr：音声の平均bitrate
acodec：音声codec
asr：音声のサンプリングレート
vbr：映像の平均bitrate
fps：映像のFPS
vcodec：映像codec
container：動画のコンテナフォーマット
filesize：ファイルサイズ
filesize_approx：概算ファイルサイズ
protocol：http等のプロトコル
extractor：要はyoutube等のドメイン
extractor_key：上記と似ているがこっちはサービス名で微妙に違う
epoch：ファイル作成したUINX TIME
autonumber：ダウンロードごとに振られる連番
playlist：プレイリスト名もしくはID
playlist_index：プレイリスト内動画のindex
playlist_id：プレイリストID
playlist_title：プレイリストタイトル
playlist_uploader ：プレイリストupload者のフルネーム
playlist_uploader_id ：プレイリストupload者のID

チャプターもしくはセクションを持っているときに使用可能
chapterチャプターのタイトル
chapter_number：チャプターの数
chapter_id：チャプターのID

シリーズもしくはエピソードを持っているときに使用可能
series：シリーズのタイトル
season：シーズンのタイトル
season_number：シーズンの数
season_id：シーズンのID
episode：エピソードのタイトル
episode_number：エピソードの数
episode_id：エピソードのID
音楽アルバム等で使用可能
track：タイトル
track_number：track number
track_id：トラックID
artist：Artist
genre：ジャンル
album：アルバム名
album_type：アルバムタイプ
album_artist：Album Artist
disc_number：Disc Number
release_year：YYYYで表される発売年

次に動画取得エラー時に無視して続行する、設定を追加する

Youtube_dl.Youtube(Object)は初期設定を送る

Objecetの中身はMap、'outtmpl'と"ignoreerrors"は開発者が用意した変数名

出力テンプレートとエラー無視の2つの変数が存在するので、

出力テンプレートは、ファイル名の設定の他、品質の設定が出来た気がする。

品質の設定は、省略する。(元からベストな設定で落としてくれるため)

エラー無視にはTrueを入れてあげると、エラー無視してダウンロードが可能になる。

他にもログインのためのusernameやpassword といったキーが存在するので

ログインが必要なサービスの時は使うと良い。

mapに要素を追加する場合は

{outtmpl:"go","username":"nazenani",password:"ziki",...,"必要なキー":対応した型の物}

のような形で追加すると良い。

指定できる全部のキーは以下　翻訳して試してみてください
　username: Username for authentication purposes.
password: Password for authentication purposes.
videopassword: Password for accessing a video.
ap_mso: Adobe Pass multiple-system operator identifier.
ap_username: Multiple-system operator account username.
ap_password: Multiple-system operator account password.
usenetrc: Use netrc for authentication instead.
verbose: Print additional info to stdout.
quiet: Do not print messages to stdout.
no_warnings: Do not print out anything for warnings.
forceurl: Force printing final URL.
forcetitle: Force printing title.
forceid: Force printing ID.
forcethumbnail: Force printing thumbnail URL.
forcedescription: Force printing description.
forcefilename: Force printing final filename.
forceduration: Force printing duration.
forcejson: Force printing info_dict as JSON.
dump_single_json: Force printing the info_dict of the whole playlist
(or video) as a single JSON line.
simulate: Do not download the video files.
format: Video format code. See options.py for more information.
outtmpl: Template for output names.
outtmpl_na_placeholder: Placeholder for unavailable meta fields.
restrictfilenames: Do not allow "&" and spaces in file names
ignoreerrors: Do not stop on download errors.
force_generic_extractor: Force downloader to use the generic extractor
nooverwrites: Prevent overwriting files.
playliststart: Playlist item to start at.
playlistend: Playlist item to end at.
playlist_items: Specific indices of playlist to download.
playlistreverse: Download playlist items in reverse order.
playlistrandom: Download playlist items in random order.
matchtitle: Download only matching titles.
rejecttitle: Reject downloads for matching titles.
logger: Log messages to a logging.Logger instance.
logtostderr: Log messages to stderr instead of stdout.
writedescription: Write the video description to a .description file
writeinfojson: Write the video description to a .info.json file
writeannotations: Write the video annotations to a .annotations.xml file
writethumbnail: Write the thumbnail image to a file
write_all_thumbnails: Write all thumbnail formats to files
writesubtitles: Write the video subtitles to a file
writeautomaticsub: Write the automatically generated subtitles to a file
allsubtitles: Downloads all the subtitles of the video
(requires writesubtitles or writeautomaticsub)
listsubtitles: Lists all available subtitles for the video
subtitlesformat: The format code for subtitles
subtitleslangs: List of languages of the subtitles to download
keepvideo: Keep the video file after post-processing
daterange: A DateRange object, download only if the upload_date is in the range.
skip_download: Skip the actual download of the video file
cachedir: Location of the cache files in the filesystem.
False to disable filesystem cache.
noplaylist: Download single video instead of a playlist if in doubt.
age_limit: An integer representing the user's age in years.
Unsuitable videos for the given age are skipped.
min_views: An integer representing the minimum view count the video
must have in order to not be skipped.
Videos without view count information are always
downloaded. None for no limit.
max_views: An integer representing the maximum view count.
Videos that are more popular than that are not
downloaded.
Videos without view count information are always
downloaded. None for no limit.
download_archive: File name of a file where all downloads are recorded.
Videos already present in the file are not downloaded
again.
cookiefile: File name where cookies should be read from and dumped to.
nocheckcertificate:Do not verify SSL certificates
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use
geo_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites.
socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi
debug_printtraffic:Print out sent and received HTTP traffic
include_ads: Download ads as well
default_search: Prepend this string if an input url is not valid.
'auto' for elaborate guessing
encoding: Use this encoding instead of the system-specified.
extract_flat: Do not resolve URLs, return the immediate result.
Pass in 'in_playlist' to only show this behavior for
playlist items.
postprocessors: A list of dictionaries, each with an entry
* key: The name of the postprocessor. See
youtube_dl/postprocessor/__init__.py for a list.
as well as any further keyword arguments for the
postprocessor.
progress_hooks: A list of functions that get called on download
progress, with a dictionary with the entries
* status: One of "downloading", "error", or "finished".
Check this first and ignore unknown values.

If status is one of "downloading", or "finished", the
following properties may also be present:
* filename: The final filename (always present)
* tmpfilename: The filename we're currently writing to
* downloaded_bytes: Bytes on disk
* total_bytes: Size of the whole file, None if unknown
* total_bytes_estimate: Guess of the eventual file size,
None if unavailable.
* elapsed: The number of seconds since download started.
* eta: The estimated time in seconds, None if unknown
* speed: The download speed in bytes/second, None if
unknown
* fragment_index: The counter of the currently
downloaded video fragment.
* fragment_count: The number of fragments (= individual
files that will be merged)

Progress hooks are guaranteed to be called at least once
(with status "finished") if the download is successful.
merge_output_format: Extension to use when merging formats.
fixup: Automatically correct known faults of the file.
One of:
- "never": do nothing
- "warn": only emit a warning
- "detect_or_warn": check whether we can do anything
about it, warn otherwise (default)
source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
of seconds to sleep) when used along with
max_sleep_interval.
max_sleep_interval:Upper bound of a range for randomized sleep before each
download (maximum possible number of seconds to sleep).
Must only be used along with sleep_interval.
Actual sleep time will be a random float from range
[sleep_interval; max_sleep_interval].
listformats: Print an overview of available video formats and exit.
list_thumbnails: Print a table of all thumbnails and exit.
match_filter: A function that gets called with the info_dict of
every video.
If it returns a message, the video is ignored.
If it returns None, the video is downloaded.
match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header
geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header
geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to
geo_bypass_country

The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.

The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see youtube_dl/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts,
http_chunk_size.

The following options are used by the post processors:
prefer_ffmpeg: If False, use avconv instead of ffmpeg if both are available,
otherwise prefer ffmpeg.
ffmpeg_location: Location of the ffmpeg/avconv binary; either the path
to the binary or its containing directory.
postprocessor_args: A list of additional command-line arguments for the
postprocessor.

以上3つが解決策でした。

④の部分で行っていることはURLを引数とし渡し、download機構をオンにしているだけになります。

実際のダウンロード部は④になります。

疑問解決記『実体験に基づきながら』

幅広い情報を共有したい。その想いを伝えるために、筆者の体験を絡めて専門的な事から、日常的な事まで幅広く取り扱っていくブログ。

youtube-dlでダウンロード時にエラー無視するスクリプト

各箇所のコード説明

次に動画取得エラー時に無視して続行する、設定を追加する