72 lines
7.1 KiB
Markdown
72 lines
7.1 KiB
Markdown
# Building a yt-dlp Proxy to Watch Videos in Browser Without Ads
|
|
|
|
## Introduction
|
|
|
|
A yt-dlp proxy server allows bypassing ads and restrictions when watching videos on platforms like YouTube. This tool is useful for those who want to enjoy content without ad breaks or for those facing regional restrictions and censorship.
|
|
|
|
## Task
|
|
|
|
Our goal is to create a simple proxy server that handles requests like `http://localhost/www.youtube.com/watch&v=VIDEO_ID` and returns a page with an HLS player. As clear from the URL - the video URL to be proxied is passed in the query.
|
|
|
|
We need to proxy the HLS stream specifically, as it allows minimizing traffic and provides smoother video playback. Additionally, HLS is supported by most modern browsers, making it the ideal choice for our proxy server.
|
|
|
|
The proxy should handle requests like `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)`, where the Query String content is escaped for use as a query string and added to the path for HLS files such as index.m3u8 and ts files. For example, a request to `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/index.m3u8` should return the HLS playlist for the video with the specified VIDEO_ID.
|
|
|
|
The proxy's task is: on request `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/index.m3u8`, proxy the request to yt-dlp, which will download the HLS playlist for the video with the specified VIDEO_ID and return it to the client. Similarly, when requesting `http://localhost/hls/www.youtube.com/watch(v=VIDEO_ID)/segment.ts`, the proxy should proxy the request to yt-dlp to get the corresponding video segment and return it to the client.
|
|
|
|
Obviously, we need to temporarily cache yt-dlp sessions for some period to avoid re-parsing the video page on each request to the HLS playlist and segments. This will significantly improve performance and reduce server load.
|
|
|
|
## Implementation
|
|
|
|
To implement the yt-dlp proxy server, you can use Python and the Flask library to create a web server. You will also need the yt-dlp library to interact with YouTube and other platforms and get HLS streams. Examine yt_dlp/YoutubeDL.py in venv you download to understand how to use yt-dlp for getting HLS playlists and segments.
|
|
|
|
As an HTML templating engine, you can use Jinja2, which is built into Flask, for dynamically generating the page with the HLS player based on the video URL. Styles: `<link rel="stylesheet" href="https://unpkg.com/mvp.css">` for a simple and clean design.
|
|
|
|
### Implementation Rules
|
|
|
|
1. Minimum MVP: implement only what's necessary for HLS proxying one video URL per request through `yt-dlp`.
|
|
2. Minimum stack: only Python, Flask, `yt-dlp` and external HLS player; no Node.js and no complex frameworks.
|
|
3. One format: support only HLS (`.m3u8` + segments).
|
|
4. One platform in MVP: first PornHub; extending to other platforms — only after stabilization.
|
|
5. Simple routing: short and predictable URLs without extra magic.
|
|
6. External cache. Only HTTP headers for cache management, no complex in-memory solutions. Cache TTL — 365 days.
|
|
7. Security at minimum, but required: validate input URL, restrict target domains to those supported by yt-dlp, request timeouts.
|
|
8. Errors and logs — only practical minimum: understandable HTTP errors and basic structured logging.
|
|
9. Configuration only through environment variables: port, cache TTL, log level and timeouts.
|
|
10. HTTPS not in application: TLS terminates at external reverse proxy (Nginx/Caddy/Traefik), Flask runs behind it.
|
|
11. TDD: Write a single integration test that will consist of downloading few video urls. It should query these videos over proxy and check if it works properly (yt-dlp is fully capable substitute for a browser that can be configured to output all necessary debug inforation, such as headers and cookies). Also write tests for critical functions like URL parsing, caching, playlist and segment proxying, and error handling. All test should be in `tests/` folder and use `pytest` as a testing framework. All tests should generate maximum debugging output to make it easy to understand what went wrong in case of failure.
|
|
12. Documentation and license: only `README.md`, `AGENTS.md` and MIT license.
|
|
|
|
### Common Pitfalls
|
|
|
|
1. Do not disable tests or skip critical paths. If something is not working, fix it instead of skipping tests.
|
|
2. Do not create workarounds. They are not allowed. If something is not working, fix it instead of creating a workaround.
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
- app.py - main Flask application file that handles incoming HTTP requests and interacts with yt-dlp through functions from dlp.py.
|
|
- dlp.py - module for interacting with yt-dlp, containing functions to get HLS playlists and segments. examine yt_dlp/YoutubeDL.py in venv in order to understand how to use yt-dlp for getting HLS playlists and segments
|
|
functions:
|
|
- get_hls_playlist(video_url): gets HLS playlist for the specified video as a string that can be returned to the client. The segment list should be filtered to only include those available for the given video and supported by yt-dlp.
|
|
it should also rewrite segment filenames in case if they expire during of before download, so that they can be requested through the proxy using predictable URL structure.
|
|
- get_hls_segment(video_url, segment_filename): gets the specified video segment for rewritten filename: downloads it using yt-dlp and returns its content as bytes that can be returned to the client. It should also use yt-dlp to download the segment since only yt-dlp can handle the necessary authentication and access control for the video content.
|
|
|
|
caching:
|
|
- Caching of yt-dlp sessions will be implemented using a simple in-memory dictionary that will store video parsing results for each VIDEO_ID. No complex in-memory solutions, just a dictionary with TTL for each key. TTL will be set to 365 days, which will effectively cache results and minimize repeated requests to yt-dlp.
|
|
- utils.py - helper functions for URL validation, cache management and error handling.
|
|
- tests/ - folder for tests that will check critical application paths such as URL parsing, caching, playlist and segment proxying, and error handling.
|
|
1. functions tests
|
|
2. integration tests for the main application flow:
|
|
- test that the proxy can successfully retrieve and return HLS playlists and segments for valid video URLs. http logging should display when type of url is parsed and served, and when cache is hit or missed. test should also print out the headers and parial content of the playlist and segment responses (as hex) to verify that they are correct and contain expected data.
|
|
- test that the proxy correctly handles invalid URLs, unsupported platforms, and other error scenarios, returning appropriate HTTP error responses.
|
|
- templates/index.html - simple HTML file with form for video URL input.
|
|
- templates/player.html - HTML file with HLS player that will be used to play video obtained through proxy.
|
|
- requirements.txt
|
|
- README.md
|
|
- AGENTS.md
|
|
- LICENSE
|
|
```
|
|
|
|
No files other than those listed above will be in the project. All functions and logic will be implemented in these files, and there will be no additional modules or complex data structures.
|