-
-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: refactor #460
WIP: refactor #460
Conversation
Due to server-side restrictions, the Terabox download function is currently does not work as intended.
This commit removes the `parse_cookie_file` function as it is no longer required following the removal of the Terabox download function.
6e0dc62
to
52c4f86
Compare
def special_download_entrance(url: str, tempdir: str, bm, **kwargs) -> list: | ||
"""Specific link downloader""" | ||
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
youtube.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to ensure that the domain is correctly parsed and validated. Instead of checking if "youtube.com" or "youtu.be" is a substring of the domain, we should use the urlparse
function to extract the hostname and then check if it matches the allowed domains exactly or ends with the allowed domains preceded by a dot.
- Parse the URL using
urlparse
to extract the hostname. - Check if the hostname matches "youtube.com" or "youtu.be" exactly, or ends with ".youtube.com" or ".youtu.be".
- Update the conditions in the
special_download_entrance
function accordingly.
-
Copy modified line R19
@@ -18,3 +18,3 @@ | ||
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: | ||
if domain == "youtube.com" or domain.endswith(".youtube.com") or domain == "youtu.be" or domain.endswith(".youtu.be"): | ||
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") |
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: | ||
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") | ||
elif "www.instagram.com" in domain: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
www.instagram.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to parse the URL and perform a proper check on its host value. This involves using the urlparse
function to extract the hostname and then checking if the hostname matches the allowed domains correctly. We should ensure that the check handles arbitrary subdomain sequences correctly.
The best way to fix the problem without changing existing functionality is to update the checks to use the endswith
method with a preceding dot for the allowed domains. This ensures that only the correct domains and their subdomains are accepted.
-
Copy modified line R19 -
Copy modified line R21 -
Copy modified line R23 -
Copy modified line R25
@@ -18,9 +18,9 @@ | ||
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: | ||
if domain and (domain.endswith(".youtube.com") or domain == "youtu.be"): | ||
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") | ||
elif "www.instagram.com" in domain: | ||
elif domain and domain.endswith(".instagram.com"): | ||
return instagram(url, tempdir, bm, **kwargs) | ||
elif "pixeldrain.com" in domain: | ||
elif domain and domain.endswith(".pixeldrain.com"): | ||
return pixeldrain(url, tempdir, bm, **kwargs) | ||
elif "krakenfiles.com" in domain: | ||
elif domain and domain.endswith(".krakenfiles.com"): | ||
return krakenfiles(url, tempdir, bm, **kwargs) |
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") | ||
elif "www.instagram.com" in domain: | ||
return instagram(url, tempdir, bm, **kwargs) | ||
elif "pixeldrain.com" in domain: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
pixeldrain.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to parse the URL and check the hostname properly instead of using substring checks. This can be done by using the urlparse
function to extract the hostname and then verifying if it matches the allowed domains exactly or ends with the allowed domains for subdomains.
We will modify the special_download_entrance
function to use urlparse
to extract the hostname and then perform the necessary checks.
-
Copy modified line R19 -
Copy modified line R21 -
Copy modified line R23 -
Copy modified line R25
@@ -18,9 +18,9 @@ | ||
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: | ||
if domain in ["youtube.com", "youtu.be"]: | ||
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") | ||
elif "www.instagram.com" in domain: | ||
elif domain == "www.instagram.com": | ||
return instagram(url, tempdir, bm, **kwargs) | ||
elif "pixeldrain.com" in domain: | ||
elif domain == "pixeldrain.com": | ||
return pixeldrain(url, tempdir, bm, **kwargs) | ||
elif "krakenfiles.com" in domain: | ||
elif domain == "krakenfiles.com": | ||
return krakenfiles(url, tempdir, bm, **kwargs) |
return instagram(url, tempdir, bm, **kwargs) | ||
elif "pixeldrain.com" in domain: | ||
return pixeldrain(url, tempdir, bm, **kwargs) | ||
elif "krakenfiles.com" in domain: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
krakenfiles.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to ensure that the domain is correctly validated by checking the hostname of the parsed URL. This can be done by using the urlparse
function to parse the URL and then checking if the hostname matches the allowed domains exactly or ends with the allowed domains preceded by a dot. This approach ensures that the domain is correctly validated and prevents bypassing the check by embedding the allowed host in an unexpected location.
We will modify the special_download_entrance
function to use this approach for all domain checks.
-
Copy modified line R19 -
Copy modified line R21 -
Copy modified line R23 -
Copy modified line R25
@@ -18,9 +18,9 @@ | ||
domain = urlparse(url).hostname | ||
if "youtube.com" in domain or "youtu.be" in domain: | ||
if domain in ["youtube.com", "youtu.be"] or domain.endswith(".youtube.com") or domain.endswith(".youtu.be"): | ||
raise ValueError("ERROR: This is ytdl bot for Youtube links just send the link.") | ||
elif "www.instagram.com" in domain: | ||
elif domain == "www.instagram.com" or domain.endswith(".instagram.com"): | ||
return instagram(url, tempdir, bm, **kwargs) | ||
elif "pixeldrain.com" in domain: | ||
elif domain == "pixeldrain.com" or domain.endswith(".pixeldrain.com"): | ||
return pixeldrain(url, tempdir, bm, **kwargs) | ||
elif "krakenfiles.com" in domain: | ||
elif domain == "krakenfiles.com" or domain.endswith(".krakenfiles.com"): | ||
return krakenfiles(url, tempdir, bm, **kwargs) |
# src/cookies.txt | ||
ydl_opts["cookiefile"] = "youtube-cookies.txt" | ||
|
||
if self._url.startswith("https://drive.google.com"): |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
https://drive.google.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we should parse the URL and check its hostname to ensure it matches the intended domain. This approach is more robust and less prone to errors compared to simple string comparisons.
- Import the
urlparse
function from theurllib.parse
module. - Parse the URL using
urlparse
. - Check if the hostname of the parsed URL matches "drive.google.com".
-
Copy modified lines R92-R94
@@ -91,3 +91,5 @@ | ||
|
||
if self._url.startswith("https://drive.google.com"): | ||
from urllib.parse import urlparse | ||
parsed_url = urlparse(self._url) | ||
if parsed_url.hostname == "drive.google.com": | ||
# Always use the `source` format for Google Drive URLs. |
token_parts = [] | ||
for form_tag in soup.find_all("form"): | ||
action = form_tag.get("action") | ||
if action and "krakenfiles.com" in action: |
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
krakenfiles.com
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to parse the URL and check its hostname to ensure it matches the expected domain. This can be done using the urlparse
function from the urllib.parse
module. Specifically, we should extract the hostname from the action
URL and verify that it ends with ".krakenfiles.com".
- Parse the
action
URL usingurlparse
. - Extract the hostname from the parsed URL.
- Check if the hostname ends with ".krakenfiles.com".
- Only append the
action
tolink_parts
if the hostname check passes.
-
Copy modified line R10 -
Copy modified lines R20-R23
@@ -9,3 +9,3 @@ | ||
from bs4 import BeautifulSoup | ||
|
||
from urllib.parse import urlparse | ||
|
||
@@ -19,4 +19,6 @@ | ||
action = form_tag.get("action") | ||
if action and "krakenfiles.com" in action: | ||
link_parts.append(action) | ||
if action: | ||
parsed_url = urlparse(action) | ||
if parsed_url.hostname and parsed_url.hostname.endswith(".krakenfiles.com"): | ||
link_parts.append(action) | ||
input_tag = form_tag.find("input", {"name": "token"}) |
on going...