Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilized v1.1.2 #90

Merged
merged 52 commits into from
Oct 16, 2024
Merged

Stabilized v1.1.2 #90

merged 52 commits into from
Oct 16, 2024

Conversation

OSINT-TECHNOLOGIES
Copy link
Owner

No description provided.

@OSINT-TECHNOLOGIES OSINT-TECHNOLOGIES self-assigned this Oct 16, 2024
datagather_modules/crawl_processor.py Dismissed Show dismissed Hide dismissed
categorized_links['Facebook'].append(urllib.parse.unquote(link))
elif 'twitter.com' in link:
elif hostname and hostname.endswith('twitter.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
twitter.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that the hostname check is robust against subdomain attacks. Instead of using hostname.endswith('twitter.com'), we should check if the hostname is exactly twitter.com or ends with .twitter.com. This will prevent URLs like malicious-twitter.com from passing the check.

  • Parse the URL to extract the hostname.
  • Check if the hostname is exactly twitter.com or ends with .twitter.com.
  • Apply similar changes to other social media domain checks to ensure consistency and security.
Suggested changeset 1
datagather_modules/crawl_processor.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datagather_modules/crawl_processor.py b/datagather_modules/crawl_processor.py
--- a/datagather_modules/crawl_processor.py
+++ b/datagather_modules/crawl_processor.py
@@ -117,22 +117,22 @@
         hostname = parsed_url.hostname
-        if hostname and hostname.endswith('facebook.com'):
-            categorized_links['Facebook'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('twitter.com'):
-            categorized_links['Twitter'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('instagram.com'):
-            categorized_links['Instagram'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('t.me'):
-            categorized_links['Telegram'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('tiktok.com'):
-            categorized_links['TikTok'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('linkedin.com'):
-            categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('vk.com'):
-            categorized_links['VKontakte'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('youtube.com'):
-            categorized_links['YouTube'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('wechat.com'):
-            categorized_links['WeChat'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('ok.ru'):
-            categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
+        if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
+            categorized_links['Facebook'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
+            categorized_links['Twitter'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
+            categorized_links['Instagram'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
+            categorized_links['Telegram'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
+            categorized_links['TikTok'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
+            categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
+            categorized_links['VKontakte'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
+            categorized_links['YouTube'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
+            categorized_links['WeChat'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
+            categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
 
EOF
@@ -117,22 +117,22 @@
hostname = parsed_url.hostname
if hostname and hostname.endswith('facebook.com'):
categorized_links['Facebook'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('twitter.com'):
categorized_links['Twitter'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('instagram.com'):
categorized_links['Instagram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('t.me'):
categorized_links['Telegram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('tiktok.com'):
categorized_links['TikTok'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('linkedin.com'):
categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('vk.com'):
categorized_links['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('youtube.com'):
categorized_links['YouTube'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('wechat.com'):
categorized_links['WeChat'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('ok.ru'):
categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
categorized_links['Facebook'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
categorized_links['Twitter'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
categorized_links['Instagram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
categorized_links['Telegram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
categorized_links['TikTok'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
categorized_links['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
categorized_links['YouTube'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
categorized_links['WeChat'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
datagather_modules/crawl_processor.py Dismissed Show dismissed Hide dismissed
categorized_links['Telegram'].append(urllib.parse.unquote(link))
elif 'tiktok.com' in link:
elif hostname and hostname.endswith('tiktok.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
tiktok.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that the hostname check is robust and cannot be bypassed by embedding the target string in an unexpected location. The best way to achieve this is to use a more precise check that ensures the hostname is exactly the expected domain or a subdomain of it.

  • We will modify the hostname.endswith checks to ensure that the hostname is either exactly the target domain or a subdomain of it.
  • This involves checking if the hostname is equal to the target domain or ends with . followed by the target domain.
  • We will make these changes in the sm_gather function within the datagather_modules/crawl_processor.py file.
Suggested changeset 1
datagather_modules/crawl_processor.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datagather_modules/crawl_processor.py b/datagather_modules/crawl_processor.py
--- a/datagather_modules/crawl_processor.py
+++ b/datagather_modules/crawl_processor.py
@@ -117,22 +117,22 @@
         hostname = parsed_url.hostname
-        if hostname and hostname.endswith('facebook.com'):
-            categorized_links['Facebook'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('twitter.com'):
-            categorized_links['Twitter'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('instagram.com'):
-            categorized_links['Instagram'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('t.me'):
-            categorized_links['Telegram'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('tiktok.com'):
-            categorized_links['TikTok'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('linkedin.com'):
-            categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('vk.com'):
-            categorized_links['VKontakte'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('youtube.com'):
-            categorized_links['YouTube'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('wechat.com'):
-            categorized_links['WeChat'].append(urllib.parse.unquote(link))
-        elif hostname and hostname.endswith('ok.ru'):
-            categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
+        if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
+            categorized_links['Facebook'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
+            categorized_links['Twitter'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
+            categorized_links['Instagram'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
+            categorized_links['Telegram'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
+            categorized_links['TikTok'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
+            categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
+            categorized_links['VKontakte'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
+            categorized_links['YouTube'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
+            categorized_links['WeChat'].append(urllib.parse.unquote(link))
+        elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
+            categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
 
EOF
@@ -117,22 +117,22 @@
hostname = parsed_url.hostname
if hostname and hostname.endswith('facebook.com'):
categorized_links['Facebook'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('twitter.com'):
categorized_links['Twitter'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('instagram.com'):
categorized_links['Instagram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('t.me'):
categorized_links['Telegram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('tiktok.com'):
categorized_links['TikTok'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('linkedin.com'):
categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('vk.com'):
categorized_links['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('youtube.com'):
categorized_links['YouTube'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('wechat.com'):
categorized_links['WeChat'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('ok.ru'):
categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))
if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
categorized_links['Facebook'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
categorized_links['Twitter'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
categorized_links['Instagram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
categorized_links['Telegram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
categorized_links['TikTok'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
categorized_links['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
categorized_links['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
categorized_links['YouTube'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
categorized_links['WeChat'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
categorized_links['Odnoklassniki'].append(urllib.parse.unquote(link))

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
datagather_modules/crawl_processor.py Dismissed Show dismissed Hide dismissed
datagather_modules/crawl_processor.py Dismissed Show dismissed Hide dismissed
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif 'linkedin.com' in link:
elif hostname and hostname.endswith('linkedin.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
linkedin.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that the hostname is correctly validated to belong to the intended domain. This can be done by checking if the hostname ends with the domain and is either exactly the domain or has a preceding dot to allow for subdomains.

  • We will modify the code to check if the hostname ends with .linkedin.com or is exactly linkedin.com.
  • This change will be applied to all similar checks for other social media domains to ensure consistency and security.
  • The changes will be made in the file datagather_modules/crawl_processor.py from lines 217 to 236.
Suggested changeset 1
datagather_modules/crawl_processor.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datagather_modules/crawl_processor.py b/datagather_modules/crawl_processor.py
--- a/datagather_modules/crawl_processor.py
+++ b/datagather_modules/crawl_processor.py
@@ -216,22 +216,22 @@
             hostname = urlparse(link).hostname
-            if hostname and hostname.endswith('facebook.com'):
-                sd_socials['Facebook'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('twitter.com'):
-                sd_socials['Twitter'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('instagram.com'):
-                sd_socials['Instagram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('t.me'):
-                sd_socials['Telegram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('tiktok.com'):
-                sd_socials['TikTok'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('linkedin.com'):
-                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('vk.com'):
-                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('youtube.com'):
-                sd_socials['YouTube'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('wechat.com'):
-                sd_socials['WeChat'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('ok.ru'):
-                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
+            if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
+                sd_socials['Facebook'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
+                sd_socials['Twitter'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
+                sd_socials['Instagram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
+                sd_socials['Telegram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
+                sd_socials['TikTok'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
+                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
+                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
+                sd_socials['YouTube'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
+                sd_socials['WeChat'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
+                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
 
EOF
@@ -216,22 +216,22 @@
hostname = urlparse(link).hostname
if hostname and hostname.endswith('facebook.com'):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('twitter.com'):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('instagram.com'):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('t.me'):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('tiktok.com'):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('linkedin.com'):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('vk.com'):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('youtube.com'):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('wechat.com'):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('ok.ru'):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif 'vk.com' in link:
elif hostname and hostname.endswith('vk.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
vk.com
may be at an arbitrary position in the sanitized URL.
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif 'youtube.com' in link:
elif hostname and hostname.endswith('youtube.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
youtube.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that the hostname is correctly validated to belong to the intended domain. Instead of using hostname.endswith('youtube.com'), we should check if the hostname is exactly 'youtube.com' or a subdomain of 'youtube.com'. This can be done by ensuring the hostname ends with '.youtube.com' or is exactly 'youtube.com'.

  • Parse the URL using urlparse to extract the hostname.
  • Check if the hostname is either 'youtube.com' or ends with '.youtube.com'.
  • Apply similar checks for other social media domains to ensure consistency and security.
Suggested changeset 1
datagather_modules/crawl_processor.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datagather_modules/crawl_processor.py b/datagather_modules/crawl_processor.py
--- a/datagather_modules/crawl_processor.py
+++ b/datagather_modules/crawl_processor.py
@@ -216,22 +216,22 @@
             hostname = urlparse(link).hostname
-            if hostname and hostname.endswith('facebook.com'):
-                sd_socials['Facebook'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('twitter.com'):
-                sd_socials['Twitter'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('instagram.com'):
-                sd_socials['Instagram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('t.me'):
-                sd_socials['Telegram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('tiktok.com'):
-                sd_socials['TikTok'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('linkedin.com'):
-                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('vk.com'):
-                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('youtube.com'):
-                sd_socials['YouTube'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('wechat.com'):
-                sd_socials['WeChat'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('ok.ru'):
-                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
+            if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
+                sd_socials['Facebook'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
+                sd_socials['Twitter'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
+                sd_socials['Instagram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
+                sd_socials['Telegram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
+                sd_socials['TikTok'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
+                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
+                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
+                sd_socials['YouTube'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
+                sd_socials['WeChat'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
+                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
 
EOF
@@ -216,22 +216,22 @@
hostname = urlparse(link).hostname
if hostname and hostname.endswith('facebook.com'):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('twitter.com'):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('instagram.com'):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('t.me'):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('tiktok.com'):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('linkedin.com'):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('vk.com'):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('youtube.com'):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('wechat.com'):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('ok.ru'):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif 'wechat.com' in link:
elif hostname and hostname.endswith('wechat.com'):

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
wechat.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix AI about 1 month ago

To fix the problem, we need to ensure that the hostname check is more robust and cannot be easily bypassed by malicious URLs. The best way to achieve this is to use a stricter check that ensures the hostname is exactly the expected domain or a subdomain of it. We can use the urlparse function to parse the URL and then check if the hostname ends with the expected domain, preceded by a dot or being exactly the domain.

  • Modify the hostname.endswith checks to ensure that the hostname is either exactly the expected domain or a subdomain of it.
  • Update the code in the datagather_modules/crawl_processor.py file, specifically lines 217-236, to implement this stricter check.
Suggested changeset 1
datagather_modules/crawl_processor.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/datagather_modules/crawl_processor.py b/datagather_modules/crawl_processor.py
--- a/datagather_modules/crawl_processor.py
+++ b/datagather_modules/crawl_processor.py
@@ -216,22 +216,22 @@
             hostname = urlparse(link).hostname
-            if hostname and hostname.endswith('facebook.com'):
-                sd_socials['Facebook'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('twitter.com'):
-                sd_socials['Twitter'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('instagram.com'):
-                sd_socials['Instagram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('t.me'):
-                sd_socials['Telegram'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('tiktok.com'):
-                sd_socials['TikTok'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('linkedin.com'):
-                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('vk.com'):
-                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('youtube.com'):
-                sd_socials['YouTube'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('wechat.com'):
-                sd_socials['WeChat'].append(urllib.parse.unquote(link))
-            elif hostname and hostname.endswith('ok.ru'):
-                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
+            if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
+                sd_socials['Facebook'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
+                sd_socials['Twitter'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
+                sd_socials['Instagram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
+                sd_socials['Telegram'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
+                sd_socials['TikTok'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
+                sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
+                sd_socials['VKontakte'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
+                sd_socials['YouTube'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
+                sd_socials['WeChat'].append(urllib.parse.unquote(link))
+            elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
+                sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
 
EOF
@@ -216,22 +216,22 @@
hostname = urlparse(link).hostname
if hostname and hostname.endswith('facebook.com'):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('twitter.com'):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('instagram.com'):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('t.me'):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('tiktok.com'):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('linkedin.com'):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('vk.com'):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('youtube.com'):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('wechat.com'):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and hostname.endswith('ok.ru'):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))
if hostname and (hostname == 'facebook.com' or hostname.endswith('.facebook.com')):
sd_socials['Facebook'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'twitter.com' or hostname.endswith('.twitter.com')):
sd_socials['Twitter'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'instagram.com' or hostname.endswith('.instagram.com')):
sd_socials['Instagram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 't.me' or hostname.endswith('.t.me')):
sd_socials['Telegram'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'tiktok.com' or hostname.endswith('.tiktok.com')):
sd_socials['TikTok'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'linkedin.com' or hostname.endswith('.linkedin.com')):
sd_socials['LinkedIn'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'vk.com' or hostname.endswith('.vk.com')):
sd_socials['VKontakte'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'youtube.com' or hostname.endswith('.youtube.com')):
sd_socials['YouTube'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'wechat.com' or hostname.endswith('.wechat.com')):
sd_socials['WeChat'].append(urllib.parse.unquote(link))
elif hostname and (hostname == 'ok.ru' or hostname.endswith('.ok.ru')):
sd_socials['Odnoklassniki'].append(urllib.parse.unquote(link))

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
@OSINT-TECHNOLOGIES OSINT-TECHNOLOGIES merged commit 9010620 into main Oct 16, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant