Files
yt-dlp-proxy/AGENTS.md
T

66 lines
6.0 KiB
Markdown
Raw Normal View History

# Building a yt-dlp Proxy to Watch Videos in Browser Without Ads
## Introduction
A yt-dlp proxy server allows bypassing ads and restrictions when watching videos on platforms like YouTube. This tool is useful for those who want to enjoy content without ad breaks or for those facing regional restrictions and censorship.
## Task
Our goal is to create a simple proxy server that handles requests like `http://localhost/www.youtube.com/watch&v=VIDEO_ID` and returns a page with an HLS player. As clear from the URL - the video URL to be proxied is passed in the query.
We need to proxy the HLS stream specifically, as it allows minimizing traffic and provides smoother video playback. Additionally, HLS is supported by most modern browsers, making it the ideal choice for our proxy server.
The proxy should handle requests like `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)`, where the Query String content is escaped for use as a query string and added to the path for HLS files such as index.m3u8 and ts files. For example, a request to `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/index.m3u8` should return the HLS playlist for the video with the specified VIDEO_ID.
The proxy's task is: on request `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/index.m3u8`, proxy the request to yt-dlp, which will download the HLS playlist for the video with the specified VIDEO_ID and return it to the client. Similarly, when requesting `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/segment.ts`, the proxy should proxy the request to yt-dlp to get the corresponding video segment and return it to the client.
Obviously, we need to temporarily cache yt-dlp sessions for some period to avoid re-parsing the video page on each request to the HLS playlist and segments. This will significantly improve performance and reduce server load.
## Implementation
To implement the yt-dlp proxy server, you can use Python and the Flask library to create a web server. You will also need the yt-dlp library to interact with YouTube and other platforms and get HLS streams. Examine yt_dlp/YoutubeDL.py in venv you download to understand how to use yt-dlp for getting HLS playlists and segments.
As an HTML templating engine, you can use Jinja2, which is built into Flask, for dynamically generating the page with the HLS player based on the video URL. Styles: `<link rel="stylesheet" href="https://unpkg.com/mvp.css">` for a simple and clean design.
### Implementation Rules
1. Minimum MVP: implement only what's necessary for HLS proxying one video URL per request through `yt-dlp`.
2. Minimum stack: only Python, Flask, `yt-dlp` and external HLS player; no Node.js and no complex frameworks.
3. One format: support only HLS (`.m3u8` + segments).
4. One platform in MVP: first PornHub; extending to other platforms — only after stabilization.
5. Simple routing: short and predictable URLs without extra magic.
6. External cache. Only HTTP headers for cache management, no complex in-memory solutions. Cache TTL — 365 days.
7. Security at minimum, but required: validate input URL, restrict target domains to those supported by yt-dlp, request timeouts.
8. Errors and logs — only practical minimum: understandable HTTP errors and basic structured logging.
9. Configuration only through environment variables: port, cache TTL, log level and timeouts.
10. HTTPS not in application: TLS terminates at external reverse proxy (Nginx/Caddy/Traefik), Flask runs behind it.
11. Tests only on critical path: URL parsing, cache, playlist and segment proxying, error handling.
12. Documentation and license: only `README.md`, `AGENTS.md` and MIT license.
### Project Structure
```
- app.py - main Flask application file that handles incoming HTTP requests and interacts with yt-dlp through functions from dlp.py.
- dlp.py - module for interacting with yt-dlp, containing functions to get HLS playlists and segments.
functions:
- get_hls_playlist(video_url): gets HLS playlist for the specified video as a string that can be returned to the client. The segment list should be filtered to only include those available for the given video and supported by yt-dlp.
- get_hls_segment(video_url, segment_name): gets the specified video segment: downloads it using yt-dlp and returns its content as bytes that can be returned to the client. It should also use yt-dlp to download the segment since only yt-dlp can handle the necessary authentication and access control for the video content.
caching:
- Caching of yt-dlp sessions will be implemented using a simple in-memory dictionary that will store video parsing results for each VIDEO_ID. No complex in-memory solutions, just a dictionary with TTL for each key. TTL will be set to 365 days, which will effectively cache results and minimize repeated requests to yt-dlp.
- utils.py - helper functions for URL validation, cache management and error handling.
- tests/ - folder for tests that will check critical application paths such as URL parsing, caching, playlist and segment proxying, and error handling.
1. functions tests
2. integration tests for the main application flow:
signle integration test that will consist of server serving a single test video (use ffmpeg for generating it). it should query that server over proxy and check if it works properly.
yt-dlp expects from the server a javascript player that it can recognize. also server should set a cookie on the video page and require that cookie for the HLS playlist and segments requests. this will ensure that only requests coming from the video page can access the HLS content, providing a basic level of security and preventing unauthorized access to the video streams.
- templates/index.html - simple HTML file with form for video URL input.
- templates/player.html - HTML file with HLS player that will be used to play video obtained through proxy.
- requirements.txt
- README.md
- AGENTS.md
- LICENSE
```
No files other than those listed above will be in the project. All functions and logic will be implemented in these files, and there will be no additional modules or complex data structures.