Short video crawler based on scrapy, crawling with search query of the target sites.
Supports:
Site | Name | Status |
---|---|---|
kuaishou | ✔️ | |
ixigua | 🚧 | |
新片场 | xinpianchang | ✔️ |
haokan | 🚧 | |
度小视/全民小视频* | quanmin | ✔️ |
*度小视/全民小视频官网已经下线,但是目前本项目仍可用(2024.6测试)
requirements:
- python 3.10+
- poetry
git clone https://github.com/dxsooo/ShortVideoCrawl
cd ShortVideoCrawl
poetry install --only main
poetry shell
For example:
cd shortvideocrawl
# main parameters:
# query: query word
# count: target video count
# kuaishou
scrapy crawl kuaishou -a query='蔡徐坤' -a count=50
# xigua, with highest resolution and size smaller than 64 MB, duration smaller than 5 min
# scrapy crawl ixigua -a query='蔡徐坤' -a count=50
# xinpianchang, with highest resolution and size smaller than 64 MB, duration smaller than 5 min, but can only get a fixed number of video
scrapy crawl xinpianchang -a query='蔡徐坤'
# haokan, with highest resolution
# scrapy crawl haokan -a query='蔡徐坤' -a count=50
# quanmin
scrapy crawl quanmin -a query='蔡徐坤' -a count=50
videos are saved in ./videos
, named with video id of source platform.