20150408再更新:发现github上有个叫作you-get的项目,可以比我这个脚本更出色地完成这种任务。于是以下内容就仅供参考好了。
提示:由于flvcd.com不再正常提供有解析结果的网页,本脚本只剩下参考价值。此非我力所能及,敬请谅解。
20150217更新:flvcd.com服务恢复,源脚本稍作修改可用。具体见代码。下载处链接已更新。
今天在网上看到一篇题为《python3自动下载优酷视频小程序》的文章,赶紧试用了一下,发现已经失效了。可能是优酷已经改变了其策略,导致经常性403错误。而且,原作者竟然通过直接拼接的方式合并视频,相当于:
众所周知,如此合并是行不通的,至少是有隐患的。于是,我尝试使用了新方法来实现这个程序的功能。cat 1.flv 2.flv > 3.flv
原理
本脚本youkudownloader.py由Python3写成,使用了相对脆弱的正则表达式来通过flvcd.com来获取较为稳定的视频地址,同时使用了mencoder作为合并视频的工具,并且为图方便省事,排除了对其他操作系统的支持。使用方法
在终端中输入:chmod +x youkudownloader.py #此步骤在第一次使用前做一次即可。 python3 youkudownloader.py video_url把video_url替换成优酷视频id页,有如http://v.youku.com/v_show/id_XMzA2NTM4NDQ0.html。或者你也可以先留空,留待脚本运行时输入。 需要注意的是,mencode必须已经安装好,否则就白忙活了。
代码
#!/usr/bin/env python3 # -*- coding:utf-8 -*- '''youkudownloader.py
Author: Tianhu Zhang <zszth@126.com>
License: GPLv3
A python3 script for downloading videos from youku.com.
Using flvcd.com to get source urls.
Using mencoder to merge files.
IMPORTANT!!
1. This script works ONLY under LINUX (or other unix-like systems).
2. Please make sure that you have mencoder installed.
In Ubuntu (or other similar linux distributions), enter this in your
shell to install mencoder:
sudo apt-get install mencoder
Installations on other systems may vary.
Usage:
In shell enter:
chmod +x youkudownloader.py #Do this for the first time.
python3 youkudownloader.py video_url
And video_url should be replaced by the url of Youku video id page,
such as http://v.youku.com/v_show/id_XMzA2NTM4NDQ0.html
Or you can leave it empty and type the url in when asked.
'''
__author__ = 'Tianhu Zhang'
import urllib.request
import re
import os
FLVCD_URL = 'http://www.flvcd.com/parse.php?kw='
# video_id_page = 'http://v.youku.com/v_show/id_XMzA2NTM4NDQ0.html'
def get_info(video_id_page):
parse_page = urllib.request.urlopen(FLVCD_URL+video_id_page).read().decode('gb2312')
# print(parse_page)
# urls_unparsed = re.findall(r'<a href=".*?" target="_blank" onclick="_alert\(\);return false;">', parse_page)
urls_unparsed = re.findall(r'<a href=".*?" target="_blank" onclick=\'_alert\(\);return false;\'>', parse_page)
urls = []
for url in urls_unparsed:
urls.append(url[9:-51])
# print(urls)
# print(len(urls))
title = re.findall(r'<strong>当前解析视频:</strong>.*?<strong>', parse_page)[0][24:-8].strip(' ')
title = re.sub(r'[ /#$&()-\\\t*?+.,\'"_`~|<>{}^]', "", title)
# print(title)
return [urls, title]
def download(info):
para = ''
for i in range(len(info[0])):
with open(info[1]+'_{0}.flv'.format(i), mode='wb') as f:
print('Downloading part {0} of {1}...'.format(i+1, len(info[0])))
data = urllib.request.urlopen(info[0][i]+'&referer=www.youku.com').read()
f.write(data)
para = para + ' ' + info[1]+'_{0}.flv'.format(i)
cmd = 'mencoder -ovc copy -oac mp3lame' + para + ' -o ' +info[1] +'.flv'
print('Merging...')
if os.system(cmd) == 0:
print('Done. File saved to ' + info[1] +'.flv successfully.')
os.system('rm ' + info[1] + '_*.flv')
if __name__ == '__main__':
if len(os.sys.argv)<2:
video_id_page = input('Input video id page full url: >')
else:
video_id_page = os.sys.argv[1]
video = get_info(video_id_page)
download(video)