Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: refactor #460

Merged
merged 43 commits into from
Jan 14, 2025
Merged

WIP: refactor #460

merged 43 commits into from
Jan 14, 2025

Conversation

BennyThink
Copy link
Member

on going...

@BennyThink BennyThink changed the title New WIP: refactor Nov 25, 2024
@BennyThink BennyThink mentioned this pull request Nov 25, 2024
@BennyThink BennyThink self-assigned this Nov 25, 2024
BennyThink and others added 4 commits November 25, 2024 19:42
Due to server-side restrictions, the Terabox download function is currently does not work as intended.
ytdlbot/engine/special.py Outdated Show resolved Hide resolved
@BennyThink BennyThink force-pushed the new branch 2 times, most recently from 6e0dc62 to 52c4f86 Compare December 1, 2024 17:02
@BennyThink BennyThink merged commit d472cdc into master Jan 14, 2025
1 check passed
def special_download_entrance(url: str, tempdir: str, bm, **kwargs) -> list:
"""Specific link downloader"""
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
youtube.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that the domain is correctly parsed and validated. Instead of checking if "youtube.com" or "youtu.be" is a substring of the domain, we should use the urlparse function to extract the hostname and then check if it matches the allowed domains exactly or ends with the allowed domains preceded by a dot.

  • Parse the URL using urlparse to extract the hostname.
  • Check if the hostname matches "youtube.com" or "youtu.be" exactly, or ends with ".youtube.com" or ".youtu.be".
  • Update the conditions in the special_download_entrance function accordingly.
Suggested changeset 1
src/engine/__init__.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/__init__.py b/src/engine/__init__.py
--- a/src/engine/__init__.py
+++ b/src/engine/__init__.py
@@ -18,3 +18,3 @@
     domain = urlparse(url).hostname
-    if "youtube.com" in domain or "youtu.be" in domain:
+    if domain == "youtube.com" or domain.endswith(".youtube.com") or domain == "youtu.be" or domain.endswith(".youtu.be"):
         raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
EOF
@@ -18,3 +18,3 @@
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:
if domain == "youtube.com" or domain.endswith(".youtube.com") or domain == "youtu.be" or domain.endswith(".youtu.be"):
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
elif "www.instagram.com" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
www.instagram.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we need to parse the URL and perform a proper check on its host value. This involves using the urlparse function to extract the hostname and then checking if the hostname matches the allowed domains correctly. We should ensure that the check handles arbitrary subdomain sequences correctly.

The best way to fix the problem without changing existing functionality is to update the checks to use the endswith method with a preceding dot for the allowed domains. This ensures that only the correct domains and their subdomains are accepted.

Suggested changeset 1
src/engine/__init__.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/__init__.py b/src/engine/__init__.py
--- a/src/engine/__init__.py
+++ b/src/engine/__init__.py
@@ -18,9 +18,9 @@
     domain = urlparse(url).hostname
-    if "youtube.com" in domain or "youtu.be" in domain:
+    if domain and (domain.endswith(".youtube.com") or domain == "youtu.be"):
         raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
-    elif "www.instagram.com" in domain:
+    elif domain and domain.endswith(".instagram.com"):
         return instagram(url, tempdir, bm, **kwargs)
-    elif "pixeldrain.com" in domain:
+    elif domain and domain.endswith(".pixeldrain.com"):
         return pixeldrain(url, tempdir, bm, **kwargs)
-    elif "krakenfiles.com" in domain:
+    elif domain and domain.endswith(".krakenfiles.com"):
         return krakenfiles(url, tempdir, bm, **kwargs)
EOF
@@ -18,9 +18,9 @@
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:
if domain and (domain.endswith(".youtube.com") or domain == "youtu.be"):
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
elif "www.instagram.com" in domain:
elif domain and domain.endswith(".instagram.com"):
return instagram(url, tempdir, bm, **kwargs)
elif "pixeldrain.com" in domain:
elif domain and domain.endswith(".pixeldrain.com"):
return pixeldrain(url, tempdir, bm, **kwargs)
elif "krakenfiles.com" in domain:
elif domain and domain.endswith(".krakenfiles.com"):
return krakenfiles(url, tempdir, bm, **kwargs)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
elif "www.instagram.com" in domain:
return instagram(url, tempdir, bm, **kwargs)
elif "pixeldrain.com" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
pixeldrain.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we need to parse the URL and check the hostname properly instead of using substring checks. This can be done by using the urlparse function to extract the hostname and then verifying if it matches the allowed domains exactly or ends with the allowed domains for subdomains.

We will modify the special_download_entrance function to use urlparse to extract the hostname and then perform the necessary checks.

Suggested changeset 1
src/engine/__init__.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/__init__.py b/src/engine/__init__.py
--- a/src/engine/__init__.py
+++ b/src/engine/__init__.py
@@ -18,9 +18,9 @@
     domain = urlparse(url).hostname
-    if "youtube.com" in domain or "youtu.be" in domain:
+    if domain in ["youtube.com", "youtu.be"]:
         raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
-    elif "www.instagram.com" in domain:
+    elif domain == "www.instagram.com":
         return instagram(url, tempdir, bm, **kwargs)
-    elif "pixeldrain.com" in domain:
+    elif domain == "pixeldrain.com":
         return pixeldrain(url, tempdir, bm, **kwargs)
-    elif "krakenfiles.com" in domain:
+    elif domain == "krakenfiles.com":
         return krakenfiles(url, tempdir, bm, **kwargs)
EOF
@@ -18,9 +18,9 @@
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:
if domain in ["youtube.com", "youtu.be"]:
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
elif "www.instagram.com" in domain:
elif domain == "www.instagram.com":
return instagram(url, tempdir, bm, **kwargs)
elif "pixeldrain.com" in domain:
elif domain == "pixeldrain.com":
return pixeldrain(url, tempdir, bm, **kwargs)
elif "krakenfiles.com" in domain:
elif domain == "krakenfiles.com":
return krakenfiles(url, tempdir, bm, **kwargs)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
return instagram(url, tempdir, bm, **kwargs)
elif "pixeldrain.com" in domain:
return pixeldrain(url, tempdir, bm, **kwargs)
elif "krakenfiles.com" in domain:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
krakenfiles.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we need to ensure that the domain is correctly validated by checking the hostname of the parsed URL. This can be done by using the urlparse function to parse the URL and then checking if the hostname matches the allowed domains exactly or ends with the allowed domains preceded by a dot. This approach ensures that the domain is correctly validated and prevents bypassing the check by embedding the allowed host in an unexpected location.

We will modify the special_download_entrance function to use this approach for all domain checks.

Suggested changeset 1
src/engine/__init__.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/__init__.py b/src/engine/__init__.py
--- a/src/engine/__init__.py
+++ b/src/engine/__init__.py
@@ -18,9 +18,9 @@
     domain = urlparse(url).hostname
-    if "youtube.com" in domain or "youtu.be" in domain:
+    if domain in ["youtube.com", "youtu.be"] or domain.endswith(".youtube.com") or domain.endswith(".youtu.be"):
         raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
-    elif "www.instagram.com" in domain:
+    elif domain == "www.instagram.com" or domain.endswith(".instagram.com"):
         return instagram(url, tempdir, bm, **kwargs)
-    elif "pixeldrain.com" in domain:
+    elif domain == "pixeldrain.com" or domain.endswith(".pixeldrain.com"):
         return pixeldrain(url, tempdir, bm, **kwargs)
-    elif "krakenfiles.com" in domain:
+    elif domain == "krakenfiles.com" or domain.endswith(".krakenfiles.com"):
         return krakenfiles(url, tempdir, bm, **kwargs)
EOF
@@ -18,9 +18,9 @@
domain = urlparse(url).hostname
if "youtube.com" in domain or "youtu.be" in domain:
if domain in ["youtube.com", "youtu.be"] or domain.endswith(".youtube.com") or domain.endswith(".youtu.be"):
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.")
elif "www.instagram.com" in domain:
elif domain == "www.instagram.com" or domain.endswith(".instagram.com"):
return instagram(url, tempdir, bm, **kwargs)
elif "pixeldrain.com" in domain:
elif domain == "pixeldrain.com" or domain.endswith(".pixeldrain.com"):
return pixeldrain(url, tempdir, bm, **kwargs)
elif "krakenfiles.com" in domain:
elif domain == "krakenfiles.com" or domain.endswith(".krakenfiles.com"):
return krakenfiles(url, tempdir, bm, **kwargs)
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
# src/cookies.txt
ydl_opts["cookiefile"] = "youtube-cookies.txt"

if self._url.startswith("https://drive.google.com"):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
https://drive.google.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we should parse the URL and check its hostname to ensure it matches the intended domain. This approach is more robust and less prone to errors compared to simple string comparisons.

  1. Import the urlparse function from the urllib.parse module.
  2. Parse the URL using urlparse.
  3. Check if the hostname of the parsed URL matches "drive.google.com".
Suggested changeset 1
src/engine/generic.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/generic.py b/src/engine/generic.py
--- a/src/engine/generic.py
+++ b/src/engine/generic.py
@@ -91,3 +91,5 @@
 
-        if self._url.startswith("https://drive.google.com"):
+        from urllib.parse import urlparse
+        parsed_url = urlparse(self._url)
+        if parsed_url.hostname == "drive.google.com":
             # Always use the `source` format for Google Drive URLs.
EOF
@@ -91,3 +91,5 @@

if self._url.startswith("https://drive.google.com"):
from urllib.parse import urlparse
parsed_url = urlparse(self._url)
if parsed_url.hostname == "drive.google.com":
# Always use the `source` format for Google Drive URLs.
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
token_parts = []
for form_tag in soup.find_all("form"):
action = form_tag.get("action")
if action and "krakenfiles.com" in action:

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
krakenfiles.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI 1 day ago

To fix the problem, we need to parse the URL and check its hostname to ensure it matches the expected domain. This can be done using the urlparse function from the urllib.parse module. Specifically, we should extract the hostname from the action URL and verify that it ends with ".krakenfiles.com".

  • Parse the action URL using urlparse.
  • Extract the hostname from the parsed URL.
  • Check if the hostname ends with ".krakenfiles.com".
  • Only append the action to link_parts if the hostname check passes.
Suggested changeset 1
src/engine/krakenfiles.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/engine/krakenfiles.py b/src/engine/krakenfiles.py
--- a/src/engine/krakenfiles.py
+++ b/src/engine/krakenfiles.py
@@ -9,3 +9,3 @@
 from bs4 import BeautifulSoup
-
+from urllib.parse import urlparse
 
@@ -19,4 +19,6 @@
         action = form_tag.get("action")
-        if action and "krakenfiles.com" in action:
-            link_parts.append(action)
+        if action:
+            parsed_url = urlparse(action)
+            if parsed_url.hostname and parsed_url.hostname.endswith(".krakenfiles.com"):
+                link_parts.append(action)
         input_tag = form_tag.find("input", {"name": "token"})
EOF
@@ -9,3 +9,3 @@
from bs4 import BeautifulSoup

from urllib.parse import urlparse

@@ -19,4 +19,6 @@
action = form_tag.get("action")
if action and "krakenfiles.com" in action:
link_parts.append(action)
if action:
parsed_url = urlparse(action)
if parsed_url.hostname and parsed_url.hostname.endswith(".krakenfiles.com"):
link_parts.append(action)
input_tag = form_tag.find("input", {"name": "token"})
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants