跳转至

name: bilibili-video-download-analyze category: media description: Download Bilibili videos from restricted network environments (China server, no browser access) — bypasses B站反爬 via direct API, downloads video+audio streams separately, merges with ffmpeg, and extracts keyframe screenshots for analysis. trigger: When user shares a B站 video link (b23.tv or bilibili.com) and you need to view the content but browser navigation times out or fails due to network restrictions.


Bilibili Video Download & Analysis Pipeline

Overview

When B站 is inaccessible via browser (timeout, 412 errors, Cloudflare challenges), use this pipeline to download and analyze video content programmatically.

Prerequisites

  • ffmpeg — already available on the system
  • Python 3 stdlib (urllib, json, base64) — already available
  • No external dependencies needed

Pipeline Steps

Step 1: Resolve Short URL → BV ID

curl -sL -o /dev/null -w "%{url_effective}" "https://b23.tv/fvAHbOe"
# Extract BV ID from the redirected URL: BV1Hz9fBuE29
%{url_effective}" "https://b23.tv/fvAHbOe"

Extract BV ID from the redirected URL: BV1Hz9fBuE29

```### Step 2: Get Video Info (AID, CID, Title, Duration)

import urllib.request, json

bvid = "BV1Hz9fBuE29"
url = f"https://api.bilibili.com/x/web-interface/view?bvid={bvid}"
req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
resp = urllib.request.urlopen(req, timeout=15)
data = json.loads(resp.read())
aid = data['data']['aid']
cid = data['data']['pages'][0]['cid']
title = data['data']['title']
duration = data['data']['duration']  # seconds

Step 3: Get Video/Audio Stream URLs from B站 Dash API

url = f"https://api.bilibili.com/x/player/playurl?avid={aid}&cid={cid}&qn=16&otype=json&fnver=0&fnval=4048"
req = urllib.request.Request(url, headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Referer": "https://www.bilibili.com"
})
resp = urllib.request.urlopen(req, timeout=20)
data = json.loads(resp.read())
dash = data['data']['dash']

# Video: lowest quality (id=32, ~480p, avc1 codec)
video_url = dash['video'][0]['base_url']

# Audio: lowest bitrate (id=30216, mp4a.40.2)
audio_url = dash['audio'][0]['base_url']

Key parameters: - qn=16 → 480p quality - fnval=4048 → DASH format - Headers must include Referer: https://www.bilibili.com to bypass simple checks 48→ DASH format - Headers must includeReferer: https://www.bilibili.com` to bypass simple checks### Step 4: Download Video & Audio Streams

def download_media(url, output_path, desc):
    req = urllib.request.Request(url, headers={...})
    with urllib.request.urlopen(req, timeout=120) as response:
        with open(output_path, 'wb') as f:
            while True:
                chunk = response.read(8192)
                if not chunk: break
                f.write(chunk)

download_media(video_url, "/tmp/bili_video.m4s", "视频流")
download_media(audio_url, "/tmp/bili_audio.m4s", "音频流")

Step 5: Merge with ffmpeg

ffmpeg -y -i /tmp/bili_video.m4s -i /tmp/bili_audio.m4s -c:v copy -c:a aac -strict experimental /tmp/bili_final.mp4

Step 6: Extract Keyframes for Vision Analysis

mkdir -p /tmp/bili_frames
# Every 30 seconds
ffmpeg -y -i /tmp/bili_final.mp4 -vf "fps=1/30" -q:v 2 /tmp/bili_frames/frame_%03d.jpg

# Or create smaller thumbnails for API compatibility
python3 << 'PYEOF'
from PIL import Image
import os
# For vision APIs with size limits, shrink to 480px wide
img = Image.open("frame_001.jpg")
img.thumbnail((480, 360))
img.save("small_001.jpg", "JPEG", quality=70)
PYEOF

Step 7: Get Comments (for context)

# Comments API (may require auth for some videos)
url = f"https://api.bilibili.com/x/v2/comment/reply?oid={aid}&type=1&pn=1&ps=20&sort=2"

Step 8: Check for Subtitles

url = f"https://api.bilibili.com/x/player/v2?cid={cid}&aid={aid}"
 8: Check for Subtitles

```python
url = f"https://api.bilibili.com/x/player/v2?cid={cid}&aid={aid}"# Check data['subtitle']['subtitles'] for available subtitle URLs

If subtitles exist, fetch them directly. If not, the video likely has no narration — download audio and use a speech-to-text API (if available) or rely on comments for context.

Known Issues & Workarounds

❌ you-get / lux / BBDown fail

These tools hit B站 412 (Cloudflare) or are blocked. Don't waste time installing them — use the direct Dash API approach above which works without cookies.

❌ Browser navigation to B站 times out

Don't retry browser_navigate — it will hang for 60s each time. Use the API pipeline instead.

❌ you-get bilibili parser is broken

As of 2026-05, you-get's bilibili extractor crashes with TypeError: the JSON object must be str, bytes or bytearray, not NoneType due to HTML structure changes. Don't rely on it.

✅ Direct Dash API works

The /x/player/playurl endpoint with proper headers (User-Agent + Referer) still works for public videos without login.

Step 9 (FINAL): Analyze Screenshots with mmx CLI (MiniMax VLM)

If mmx CLI is installed and authenticated (has MiniMax API key), use it to analyze screenshots directly — no vision-capable LLM provider needed:

# Check if mmx is available
which mmx && mmx --version

# Check auth status
mmx auth status

# Analyze a single frame
mmx vision describe --image /tmp/bili_frames/frame_015.jpg \
  --prompt "详细描述这个Unity截图中的所有内容" \
  --output text --quiet
 --image /tmp/bili_frames/frame_015.jpg \
  --prompt "详细描述这个Unity截图中的所有内容" \
  --output text --quiet# Analyze multiple frames in sequence to reconstruct video narrative
mmx vision describe --image /tmp/bili_frames/frame_024.jpg \
  --prompt "请描述Animator状态机中的图层、参数、状态和连线逻辑" \
  --output text --quiet

Key mmx vision options: - --image <path-or-url> — local file path or URL (auto base64) - --file-id <id> — pre-uploaded file ID (for large images) - --prompt <text> — question about the image - --output text — human-readable response - --quiet — suppress banners

Optimal framing for B站改模 tutorial analysis: - Analyze every 10-30 second keyframes (fps=1/10 or fps=1/30) - Use prompts that specifically ask for: Animator layers, parameters (Parameters), conditions (Conditions), Animation window curves, Shader settings, any text/buttons visible - Chinese-language prompts work best for Chinese UI screenshots

⚠️ Vision Analysis Fallback Chain

When vision is needed but the current LLM provider doesn't support image_url:

  1. Try mmx vision describe first (MiniMax VLM CLI) — works if mmx installed and API key configured
  2. Try browser_vision — if browser can load the page with the image
  3. Fallback: download video locally via this skill's Steps 1-8, extract keyframes, then prompt user to switch to a vision-capable session ia this skill's Steps 1-8, extract keyframes, then prompt user to switch to a vision-capable session## Verification Checklist

  4. Step 1: BV ID extracted from short URL

  5. Step 2: AID + CID obtained from view API
  6. Step 3: Dash API returns video + audio URLs (check code=0)
  7. Step 4: Both streams downloaded (check file sizes > 0)
  8. Step 5: ffmpeg merge successful (output file exists)
  9. Step 6: Keyframe images extracted
  10. Comments/subtitles checked for technical details