WIP: refactor #460

BennyThink · 2024-11-25T18:31:25Z

on going...

Due to server-side restrictions, the Terabox download function is currently does not work as intended.

ytdlbot/engine/special.py

This commit removes the `parse_cookie_file` function as it is no longer required following the removal of the Terabox download function.

…our branch

src/engine/__init__.py

+def special_download_entrance(url: str, tempdir: str, bm, **kwargs) -> list:
+    """Specific link downloader"""
+    domain = urlparse(url).hostname
+    if "youtube.com" in domain or "youtu.be" in domain:


To fix the problem, we need to ensure that the domain is correctly parsed and validated. Instead of checking if "youtube.com" or "youtu.be" is a substring of the domain, we should use the urlparse function to extract the hostname and then check if it matches the allowed domains exactly or ends with the allowed domains preceded by a dot.

Parse the URL using urlparse to extract the hostname.

Check if the hostname matches "youtube.com" or "youtu.be" exactly, or ends with ".youtube.com" or ".youtu.be".

Update the conditions in the special_download_entrance function accordingly.

src/engine/__init__.py

+    domain = urlparse(url).hostname
+    if "youtube.com" in domain or "youtu.be" in domain:
+        raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
+    elif "www.instagram.com" in domain:


To fix the problem, we need to parse the URL and perform a proper check on its host value. This involves using the urlparse function to extract the hostname and then checking if the hostname matches the allowed domains correctly. We should ensure that the check handles arbitrary subdomain sequences correctly.

The best way to fix the problem without changing existing functionality is to update the checks to use the endswith method with a preceding dot for the allowed domains. This ensures that only the correct domains and their subdomains are accepted.

src/engine/__init__.py

+        raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
+    elif "www.instagram.com" in domain:
+        return instagram(url, tempdir, bm, **kwargs)
+    elif "pixeldrain.com" in domain:


To fix the problem, we need to parse the URL and check the hostname properly instead of using substring checks. This can be done by using the urlparse function to extract the hostname and then verifying if it matches the allowed domains exactly or ends with the allowed domains for subdomains.

We will modify the special_download_entrance function to use urlparse to extract the hostname and then perform the necessary checks.

src/engine/__init__.py

+        return instagram(url, tempdir, bm, **kwargs)
+    elif "pixeldrain.com" in domain:
+        return pixeldrain(url, tempdir, bm, **kwargs)
+    elif "krakenfiles.com" in domain:


To fix the problem, we need to ensure that the domain is correctly validated by checking the hostname of the parsed URL. This can be done by using the urlparse function to parse the URL and then checking if the hostname matches the allowed domains exactly or ends with the allowed domains preceded by a dot. This approach ensures that the domain is correctly validated and prevents bypassing the check by embedding the allowed host in an unexpected location.

We will modify the special_download_entrance function to use this approach for all domain checks.

src/engine/generic.py

+                # src/cookies.txt
+                ydl_opts["cookiefile"] = "youtube-cookies.txt"
+
+        if self._url.startswith("https://drive.google.com"):


To fix the problem, we should parse the URL and check its hostname to ensure it matches the intended domain. This approach is more robust and less prone to errors compared to simple string comparisons.

Import the urlparse function from the urllib.parse module.

Parse the URL using urlparse.

Check if the hostname of the parsed URL matches "drive.google.com".

src/engine/krakenfiles.py

+    token_parts = []
+    for form_tag in soup.find_all("form"):
+        action = form_tag.get("action")
+        if action and "krakenfiles.com" in action:


To fix the problem, we need to parse the URL and check its hostname to ensure it matches the expected domain. This can be done using the urlparse function from the urllib.parse module. Specifically, we should extract the hostname from the action URL and verify that it ends with ".krakenfiles.com".

Parse the action URL using urlparse.

Extract the hostname from the parsed URL.

Check if the hostname ends with ".krakenfiles.com".

Only append the action to link_parts if the hostname check passes.

BennyThink added 4 commits September 21, 2024 17:43

step1

d42e55b

remove

57b159c

add pdm

eedb8e3

going

bd47128

BennyThink changed the title ~~New~~ WIP: refactor Nov 25, 2024

BennyThink mentioned this pull request Nov 25, 2024

refactor of ytdlbot #461

Closed

BennyThink self-assigned this Nov 25, 2024

BennyThink and others added 4 commits November 25, 2024 19:42

going

5cb76e3

going

48c3304

Improved extract_code_from_instagram_url function

4f02bc2

Remove terabox download function

e0f095d

Due to server-side restrictions, the Terabox download function is currently does not work as intended.

SanujaNS reviewed Nov 27, 2024

View reviewed changes

ytdlbot/engine/special.py Outdated Show resolved Hide resolved

SanujaNS and others added 11 commits November 27, 2024 20:44

Remove unused parse_cookie_file function

f10b07f

This commit removes the `parse_cookie_file` function as it is no longer required following the removal of the Terabox download function.

define abstract class

cf0c2b8

basic upload done?

991a036

db operation

580c499

db operation

072f1d1

db operation

deb7683

db operation

9ff9b20

db operation

4c1c710

fix name_pattern regex of extract_url_and_name function

0f8df41

pre-commit

83a8c73

pre-push

ef0beca

BennyThink force-pushed the new branch 2 times, most recently from 6e0dc62 to 52c4f86 Compare December 1, 2024 17:02

pre-push

f7bd4fe

BennyThink force-pushed the new branch from 52c4f86 to f7bd4fe Compare December 1, 2024 17:03

SanujaNS and others added 3 commits December 1, 2024 23:06

Refactor: Change single quotes to double quotes for name_pattern regex

f8dec38

add more methods

e484098

rename

6ba0cc2

BennyThink added 2 commits December 3, 2024 21:01

use self._bot_msg

5346fb6

download done?

38ed4c4

BennyThink force-pushed the new branch from 5d9a2ef to aaea329 Compare December 4, 2024 19:59

update deps

3b1b9c7

BennyThink force-pushed the new branch from aaea329 to 3b1b9c7 Compare January 11, 2025 11:00

BennyThink added 17 commits January 11, 2025 12:17

add entrance

de04b72

runnable

f5beefd

fixes

611d39a

fix

1da0edc

record usage fix

14e3368

add cookies

1546ed3

fix settings

897211f

hint

233193f

hint

f4b49af

reset

bd17efa

rename

ae84522

formats fix

86e896c

update README.md

da8d45f

update

4f9df5a

Merged master into your-branch-name, resolved conflicts in favor of y…

e909640

…our branch

rename

7f2f16e

rename

e30324b

BennyThink merged commit d472cdc into master Jan 14, 2025
1 check passed

github-advanced-security bot found potential problems Jan 14, 2025

View reviewed changes

@@ -91,3 +91,5 @@
-                    if self._url.startswith("https://drive.google.com"):
+                    from urllib.parse import urlparse
+                    parsed_url = urlparse(self._url)
+                    if parsed_url.hostname == "drive.google.com":
                         # Always use the `source` format for Google Drive URLs.

@@ -9,3 +9,3 @@
             from bs4 import BeautifulSoup
+            from urllib.parse import urlparse
@@ -19,4 +19,6 @@
                     action = form_tag.get("action")
-                    if action and "krakenfiles.com" in action:
-                        link_parts.append(action)
+                    if action:
+                        parsed_url = urlparse(action)
+                        if parsed_url.hostname and parsed_url.hostname.endswith(".krakenfiles.com"):
+                            link_parts.append(action)
                     input_tag = form_tag.find("input", {"name": "token"})

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: refactor #460

WIP: refactor #460

BennyThink commented Nov 25, 2024

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

WIP: refactor #460

WIP: refactor #460

Conversation

BennyThink commented Nov 25, 2024