查看原文
其他

爬虫实战|井川里予抖音热舞高清无水印视频,十行代码教你下载!

酷头酷头 印象Python 2022-08-01

文 | 酷头

来源:印象python「ID: python_logic」


嗨!大家好,我是酷头
欢迎来到学习python的宝藏基地~~~


长按下方二维码可以添加我为好友哦



~~~~~文末赠送数据分析书籍~~~~~


今天我们通过抖音视频ID下载高清无水印热舞小姐姐视频,只需要短短是来行代码即可实现,感兴趣的小伙伴就来试试吧

这里我们分为五步实现:



接下来我们就一起来看看吧


首先我们打开抖音网页版,找一个自己喜欢的视频

链接如下:

https://www.douyin.com/video/7004173561606753539




网页分析


我们F12打开浏览器开发者模式,这里有一个小小的技巧不知道大家有没有注意到哈。


平时我们下载图片、评论、小说等我们都是在XHR中找数据源

但是我们如果要获取的是音频或者视频文件就要在Media中找数据源



我们在meida中找到数据源如下复制到浏览器打开这个就是视频播放地址

大家可以将其复制到浏览器进行测试如下:



接下来我们要找的就是视频播放的来源了

我们复制其中一部分进行搜索





发送请求


我们模拟浏览器发送请求,这里需要添加headers防止被网站反爬而无法获取到数据。


url = f'https://www.douyin.com/video/7004173561606753539'
# 1. 发送请求
headers = {
    'Referer''https://www.douyin.com/',
    'User-Agent''Mozilla/5.0 (Windows NT 10.0; WOW64)pleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4651.0 Sa7.36',
    'cookie''ttwid=1%7Cy07zoYkoxh90nEDN0p46kPksuWyfQRvWojXibL1Mecc%7C1634437652%7C7a03eab7646b1dece5830e3ddd3d7d62ada886079ffb431640d58afe006ed2c4; _tea_utm_cache_6383=undefined; douyin.com; MONITOR_WEB_ID=09929d96-a1aa-422b-949e-8a4b87b58f21; passport_csrf_token_default=5cfabdc8a6737d87e9aa41c6c6877832; passport_csrf_token=5cfabdc8a6737d87e9aa41c6c6877832; _tea_utm_cache_1300=undefined; s_v_web_id=verify_kuulzc63_0o8IxtyR_8e2a41afbc2b9db9e00a230fbe5a25c9; sid_tt=c5302d49f9d884bb29753fbc3058d50f; sessionid=c5302d49f9d884bb29753fbc3058d50f; sessionid_ss=c5302d49f9d884bb29753fbc3058d50f; sid_ucp_v1=1.0.0-KDhjNjVhNDdlMDVjZTVhODNkYWI3ZjkwNzRmYmE1NTlkYjZhOGViZWIKFwi3iJCHnYyUARDHlK6LBhjvMTgGQPQHGgJscSIgYzUzMDJkNDlmOWQ4ODRiYjI5NzUzZmJjMzA1OGQ1MGY; ssid_ucp_v1=1.0.0-KDhjNjVhNDdlMDVjZTVhODNkYWI3ZjkwNzRmYmE1NTlkYjZhOGViZWIKFwi3iJCHnYyUARDHlK6LBhjvMTgGQPQHGgJscSIgYzUzMDJkNDlmOWQ4ODRiYjI5NzUzZmJjMzA1OGQ1MGY; passport_auth_status=236a0c6330546e8e55c22d3db7013446%2C; FOLLOW_YELLOW_POINT_USER=MS4wLjABAAAA6ZQ5xEpEBo8tspMAC7ehXEcHs7JybDRoyOQcEKaKMXI; FOLLOW_YELLOW_POINT_STATUE_INFO=1%2F1634470738878; __ac_nonce=0616c06b2005ed9a5632c; __ac_signature=_02B4Z6wo00f01ZnqTEwAAIDA-uCMJvP7NJWZzkjAAAcXE4YuvsWFv3wiF3TJO-s7WL5tp2CUupZDFmQqhK6ySy4EYKpeboqmefiuMuI3yfWitSlNmX1UoEAIhufR0gVqpXmJruGF4Ef72ZyGe0; msToken=aBPLK2LYEfdc4C3i_mx9JKzoA_PFxDDSDkBds2_9WnczlzmUp7iJqh4CfIvP8mshpsHoJ8z49QMTIFmo1Y-1_fs2Am4AS1lnYEvNOnYhODSdfRT2vY4nTIKL; tt_scid=cuKv4W7ln1d.N5a5GA8CK2WVzRDdkBdxg-MQ05JR-f--9wDNugz-qUPl7c7REHOP74ac; msToken=UOTNJ7Pqv9YU9AFpN47U3UlvWw3204vsYiJ79MprL8TgK6sIvJ44vKe6GjAluKkwEitlMdNMqkptO8ocGEfFR7-YpnubRYUcIvW4Tmll5yBubJnGrMdFRA=='
}



获取数据


请求发送完毕之后我们接下来查看浏览器响应数据

如下:

# 2.获取数据
resp = requests.get(url, headers = headers)
ic(resp.text)



数据转码


数据已经成功地额获取到,我们可以进入到下一步数据的提取



我们复制如上图部分在我们获取到的数据中进行链接查找,

我们的目标很简单,就是标题和视频链接



我们在获取到数据源中已经成功的找到了我们的视频链接

对比发现浏览器响应给我们的链接是经过编码的,通过这个链接我们是无法直接下载视频的。


所以我们需要先在网站中解码后进行测试

接下来先将这个链接提取出来在进行解码操作,这里就要用到正则了


标题的提取很简单,中间的内容我们直接使用万能的(.*?)即可获取



链接是因为经过解码的,但是我们可以对比mediea中的链接进行提取

视频链接是以url开头的,%3d是等于号,所以我们可以构造如下:

'src(.*?)vr%3D%22'

 # 3.解析数据
title = re.findall('<title data-react-helmet="true"> (.*?)</title>', resp.text)[0]
href = re.findall('src(.*?)vr%3D%22', resp.text)


获取到了这么多的href,那个才是我们需要的href呢?

来吧,直接遍历这个列表

ic| h: ('="https://lf1-cdn-tos.bytescm.com/obj/static/log-sdk/collect/collect.js"></script><meta '
        'data-react-helmet="true" charset="UTF-8"/><meta data-react-helmet="true" '
        'name="viewport" content="width=device-width,initial-scale=1"/><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/xgplayerLive.7572286b.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lib.ui.9df85a56.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lib.util.43069e5f.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/react.vendor2.443c54c8.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/lottie.c89672be.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/bytejs.b9eaa346.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/vendors2.6038bfa3.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/chunk/vendors.58c74f55.js"></script><script '
        'defer="defer" '
        'src="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/index.16edc129.js"></script><link '
        'href="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/vendors.dbe96361.css" '
        'rel="stylesheet"><link '
        'href="https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/index.86bc1c1a.css" '
        'rel="stylesheet"><script id="RENDER_DATA" '
        'type="application/json">%7B%22_location%22%3A%22%2Fvideo%2F7004173561606753539%22%2C%22C_0%22%3A%7B%22abTestData%22%3A%7B%22navTabRecommendType%22%3A1%2C%22navTabFollowType%22%3A1%2C%22navTabHotType%22%3A1%7D%2C%22odin%22%3A%7B%22user_id%22%3A%22651331000075319%22%2C%22user_type%22%3A12%2C%22user_is_auth%22%3A1%2C%22user_unique_id%22%3A%227019856247298475561%22%7D%2C%22user%22%3A%7B%22isLogin%22%3Atrue%2C%22info%22%3A%7B%22uid%22%3A%22651331000075319%22%2C%22secUid%22%3A%22MS4wLjABAAAA6ZQ5xEpEBo8tspMAC7ehXEcHs7JybDRoyOQcEKaKMXI%22%2C%22shortId%22%3A%224105451342%22%2C%22nickname%22%3A%22%E4%B8%8D%E5%8D%96%E8%90%8C%E7%9A%84%E9%82%93%E8%82%AF%22%2C%22desc%22%3A%22%22%2C%22gender%22%3A2%2C%22avatarUrl%22%3A%22%2F%2Fp3.douyinpic.com%2Faweme%2F100x100%2Ftos-cn-i-0813%2Fd2cba90a048c46e7a3e55302e940b1dc.jpeg%22%2C%22avatar300Url%22%3A%22%2F%2Fp3.douyinpic.com%2Fimg%2Ftos-cn-i-0813%2Fd2cba90a048c46e7a3e55302e940b1dc~c5_300x300.webp%22%2C%22followStatus%22%3A0%2C%22followerStatus%22%3A0%2C%22awemeCount%22%3A12%2C%22followingCount%22%3A29%2C%22followerCount%22%3A14%2C%22mplatformFollowersCount%22%3A14%2C%22favoritingCount%22%3A173%2C%22totalFavorited%22%3A22%2C%22uniqueId%22%3A%22dk_818%22%2C%22customVerify%22%3A%22%22%2C%22enterpriseVerifyReason%22%3A%22%22%2C%22secret%22%3A0%2C%22userCanceled%22%3Afalse%2C%22roomData%22%3A%7B%7D%2C%22shareQrcodeUrl%22%3A%22https%3A%2F%2Fp3.douyinpic.com%2Fobj%2F31735000ab94954cbd483%22%2C%22roomId%22%3A0%2C%22favoritePermission%22%3A1%7D%2C%22statusCode%22%3A0%2C%22isSpider%22%3Afalse%7D%2C%22isSpider%22%3Afalse%7D%2C%22C_19%22%3A%7B%22awemeId%22%3A%227004173561606753539%22%2C%22logPb%22%3A%22%7B%5C%22impr_id%5C%22%3A%5C%22021634474240681fdbddc0100fff0030ad295880000002b3be82d%5C%22%7D%22%2C%22aweme%22%3A%7B%22statusCode%22%3A0%2C%22detail%22%3A%7B%22awemeId%22%3A%227004173561606753539%22%2C%22awemeType%22%3A0%2C%22groupId%22%3A%226982868964862971143%22%2C%22authorInfo%22%3A%7B%22uid%22%3A%222568934929994387%22%2C%22secUid%22%3A%22MS4wLjABAAAAS4y5ucL_DoxSbSJBPC0doCNy94lXd6CnDTl7l8UTpWkwQ_ZK9GxsxqFxJRrl2doF%22%2C%22nickname%22%3A%22%E6%A5%A0%E6%A5%A0.%22%2C%22remarkName%22%3A%22%22%2C%22avatarUri%22%3A%22%2F%2Fp3-pc.douyinpic.com%2Fimg%2Ftos-cn-i-0813%2F33c618100012493cb30d0559fff786dc~c5_100x100.jpeg%3Ffrom%3D116350172%22%2C%22followerCount%22%3A453%2C%22totalFavorited%22%3A7050%2C%22followStatus%22%3A0%2C%22followerStatus%22%3A0%2C%22enterpriseVerifyReason%22%3A%22%22%2C%22customVerify%22%3A%22%22%7D%2C%22desc%22%3A%22%E8%8A%B1%E4%BA%86%E7%82%B9%E6%97%B6%E9%97%B4%E5%89%AA%E9%9B%86%23%E4%BA%95%E5%B7%9D%E9%87%8C%E4%BA%88%20%E8%B7%B3%E8%88%9E%E8%A7%86%E9%A2%91%E5%90%88%E9%9B%86%EF%BC%8C%E5%96%9C%E6%AC%A2%E5%8F%AF%E4%BB%A5%E6%94%AF%E6%8C%81%E4%B8%80%E4%B8%8B%E6%88%91%E4%B8%AB%E8%B0%A2%E8%B0%A2%E5%AE%B6%E9%93%B6%E4%BB%AC%E2%9D%A4%EF%B8%8F%23%E7%83%AD%E9%97%A8%23%E6%8E%A8%E5%B9%BF%E5%B0%8F%E5%8A%A9%E6%89%8B%20%20%40DOU%2B%E5%B0%8F%E5%8A%A9%E6%89%8B%22%2C%22authorUserId%22%3A%222568934929994387%22%2C%22createTime%22%3A1630786249%2C%22textExtra%22%3A%5B%7B%22start%22%3A7%2C%22end%22%3A12%2C%22type%22%3A1%2C%22hashtagId%22%3A%221662018461135883%22%2C%22hashtagName%22%3A%22%E4%BA%95%E5%B7%9D%E9%87%8C%E4%BA%88%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A37%2C%22end%22%3A40%2C%22type%22%3A1%2C%22hashtagId%22%3A%221588489879306259%22%2C%22hashtagName%22%3A%22%E7%83%AD%E9%97%A8%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A40%2C%22end%22%3A46%2C%22type%22%3A1%2C%22hashtagId%22%3A%221627250169964547%22%2C%22hashtagName%22%3A%22%E6%8E%A8%E5%B9%BF%E5%B0%8F%E5%8A%A9%E6%89%8B%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%22%22%2C%22isCommerce%22%3Afalse%7D%2C%7B%22start%22%3A49%2C%22end%22%3A58%2C%22type%22%3A0%2C%22hashtagId%22%3A%22%22%2C%22hashtagName%22%3A%22%22%2C%22awemeId%22%3A%22%22%2C%22userId%22%3A%2270258503077%22%2C%22isCommerce%22%3Afalse%7D%5D%2C%22userDigged%22%3Afalse%2C%22video%22%3A%7B%22width%22%3A1080%2C%22height%22%3A1920%2C%22ratio%22%3A%221080p%22%2C%22duration%22%3A51851%2C%22playAddr%22%3A%5B%7B%22src%22%3A%22%2F%2Fv26-web.douyinvod.com%2F4655cc98bb0358b53cd639b679207958%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F8c10edb56f1546cc97f1ea484b390be8%2F%3Fa%3D6383%26br%3D3071%26bt%3D3071%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D4%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaDdpPDY6OmVpNzlmM2VoZmdqNDVrcjRvMG1gLS1kLTBzczJfXjUvM2A1Xy1jXzAuXi46Yw%253D%253D%26vl%3D%26')
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2F33673180fc481002021f006712e77be1%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F8c10edb56f1546cc97f1ea484b390be8%2F%3Fa%3D6383%26br%3D3071%26bt%3D3071%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D4%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaDdpPDY6OmVpNzlmM2VoZmdqNDVrcjRvMG1gLS1kLTBzczJfXjUvM2A1Xy1jXzAuXi46Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv26-web.douyinvod.com%2F8d29f309041395f0f17eeb2875bbfcb0%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F7f7d16ce9e8a4c06b25dc58132327a79%2F%3Fa%3D6383%26br%3D2054%26bt%3D2054%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D3%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApNzRkMzo8PGU3N2Y2PGg2OGdqNDVrcjRvMG1gLS1kLTBzc2I1YDBfMDBeXjE2MWBgNTQ6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2Facabc63318f8e681e83a2ff71a3c8206%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2F7f7d16ce9e8a4c06b25dc58132327a79%2F%3Fa%3D6383%26br%3D2054%26bt%3D2054%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D3%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApNzRkMzo8PGU3N2Y2PGg2OGdqNDVrcjRvMG1gLS1kLTBzc2I1YDBfMDBeXjE2MWBgNTQ6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv26-web.douyinvod.com%2F8fa32b926628584a42e222b32d81c239%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2Fccf5fff2e52f430896430aa9bd3e5c43%2F%3Fa%3D6383%26br%3D1901%26bt%3D1901%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D6%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaGhoZDtmZDs8Nzo5ZGc6OmdqNDVrcjRvMG1gLS1kLTBzcy4zMl5gLTJgLi82MS40NDI6Yw%253D%253D%26vl%3D%26'
ic| h: '%22%3A%22%2F%2Fv3-web.douyinvod.com%2F499af49c2cee102d8a427272bee6e903%2F616c2743%2Fvideo%2Ftos%2Fcn%2Ftos-cn-ve-15%2Fccf5fff2e52f430896430aa9bd3e5c43%2F%3Fa%3D6383%26br%3D1901%26bt%3D1901%26cd%3D0%257C0%257C0%26ch%3D26%26cr%3D0%26cs%3D0%26cv%3D1%26dr%3D0%26ds%3D6%26er%3D%26ft%3Djal9wj--bz7ThWZRpvct%26l%3D021634474240681fdbddc0100fff0030ad295880000002b3be82d%26lr%3Dall%26mime_type%3Dvideo_mp4%26net%3D0%26pl%3D0%26qs%3D0%26rc%3Dajw0bmc6Znh3NzMzNGkzM0ApaGhoZDtmZDs8Nzo5ZGc6OmdqNDVrcjRvMG1gLS1kLTBzcy4zMl5gLTJgLi82MS40NDI6Yw%253D%253D%26vl%3D%26'


显然第一个排除,都是一些乱七八糟的html结构数据,明显不是链接。

下面几个都有可能,我们先来解码第一个进行测试


http://tool.chinaz.com/tools/urlencode.aspx



解码后发现链接最前面的':'需要使用'https:'替换后才是视频的真实请求链接

并且这几个视频的请求连接第一是最高清的



好了接下来我们使用代码来解码如下:


# 3.解析数据
title = re.findall('<title data-react-helmet="true"> (.*?)</title>', resp.text)[0]
href = re.findall('src(.*?)vr%3D%22', resp.text)[1]
video_url = requests.utils.unquote(href).replace('":"''https:')  # 解码




保存数据


数据分析完成已经成功获取到我们想要的url和标题,接下来下载就很简单了


 # 5.保存数据
  video_content = requests.get(url=video_url).content
  with open('抖音高清视频\\' + title + '.mp4', mode='wb'as fin:
      fin.write(video_content)
      print(title+'.mp4文件下载完成!!')


最后我们使用函数封装,每次下载新的视频只需要传入视频id即可


def download_douyin(video_id)


接下来一起看看效果吧!


花了点时间剪集#井川里予 跳舞视频合集,喜欢可以支持一下我丫谢谢家银们 - 抖音.mp4文件下载完成!!心过好每一天。


万丈高楼平地起,辉煌只能靠自己。#社会慢摇 - 抖音.mp4


喜欢哪个#穿搭#身材 - 抖音.mp4


是不是高清无水印?



好了今天分享到此为止,

下期看我如何使用selenium批量爬取抖音小姐姐视频~~~



印象python】交流群正式开放啦!
欢迎扫码添加老邓好友,通过后拉你入群。

文末赠书



内容简介



全面:数据分析与大数据处理所需的所有技术,包含基础理论、核心概念、实施流程,从编程语言准备、数据采集与清洗、数据分析与可视化,到大型数据的分布式存储与分布式计算等。


深入:一本书讲透1种编程语言和14种数据分析与大处理工具,以及大数据分析技术及项目开发方法。


丰富:包含45个“新手问答”、17个章节的“实训”、3个项目综合实战、50道Python面试题精选。


限时抢购👇



送书规则


送书方式:本次共包邮送书2本,均由留言送出!


留言内容:在本文下面留言,主题:说说你对数据分析的一些认识或者看法?

开奖方式:选择精心留言8条,群抽奖参与赠书!


开奖时间:2021年10月22日20:00,开奖后12小时内未与我联系视为放弃,逾期不候。


抽奖规则:

1.截止日前需要给本文点赞+在看,领奖时需要提供截图,否则无效

2. 参与本次活动的读者需在活动截止前添加老邓好友,否则中奖无效!

3.每人限得一本!

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存