Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-6095] validate decoded url in jdbc interpreter #4838

Merged
merged 6 commits into from
Nov 7, 2024

Conversation

s2moon98
Copy link
Contributor

@s2moon98 s2moon98 commented Sep 19, 2024

What is this PR for?

Add some validation check conditions to existing url validator in jdbc interpreter. So now it can check URLs with the conditions below if it has an unallowable configuration.

  • UTF-8 encoded

What type of PR is it?

Improvement

Todos

  • - Task

What is the Jira issue?

ZEPPELIN-6095

How should this be tested?

Input the url with unallowable configurations in UTF-8 encoded in JDBC type interpreter. Then run the command in notebook and see if the command is blocked from running.

Screenshots (if appropriate)

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Copy link
Contributor

@tbonelee tbonelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good and works as intended!
I’ve checked that URLs with unallowable configurations, whether they are URL-encoded or contain white spaces, are properly blocked.

Btw, I'm curious-do we really need to remove white spaces here?
Can't we just leave the white spaces in the middle of the URL as is and let JDBC appropriately throw an error?

@pan3793
Copy link
Member

pan3793 commented Sep 19, 2024

do we really need to remove white spaces here?

I have the same question too. Is there any potential risk if we have encoded while spaces in the JDBC URL?

In Apache Kyuubi cases, it allows the user to pass the spark.app.name as part of JDBC URL, so the encoded URL might be like

jdbc:kyuubi://host:port/db;#spark.app.name=Hello%20World;spark.driver.memory=4g

For example,

$  bin/beeline -u 'jdbc:kyuubi://localhost:10009/#spark.app.name=Hello%20World'
0: jdbc:hive2://localhost:10009/#spark.app.na> set spark.app.name;

24/09/20 00:30:25 INFO ExecuteStatement:
           Spark application name: Hello World
                 application ID: local-1726763393107
                 application web UI: http://192.168.8.150:53761
                 master: local[*]
                 deploy mode: client
                 version: 3.3.1
           Start time: 2024-09-20T00:29:52.154
           User: anonymous
...
2024-09-20 00:30:26.123 INFO org.apache.kyuubi.operation.ExecuteStatement: Processing anonymous's query[a8771241-5233-44fe-8c69-ad94a8f0217d]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.128 seconds
+-----------------+--------------+
|       key       |    value     |
+-----------------+--------------+
| spark.app.name  | Hello World  |
+-----------------+--------------+
1 row selected (0.401 seconds)

@s2moon98
Copy link
Contributor Author

@tbonelee @pan3793
Thank you for mentioning the white space issue.

I put the eliminating white spaces logic to block the unallowable configurations with white space included such as :
allow%20LoadLocalInfile=true

The URL with eliminating white spaces is used only when validating, not when creating connection with the URL. So I think it does not cause risky situation like a URL containing white space is not connected.

However I confirmed that if user put url with white space contained, the configuration does not apply and JDBC throws an error. So I will remove the eliminating white spaces logic.

@s2moon98 s2moon98 force-pushed the add-jdbc-interpreter-url-validate branch from 9328c68 to 29d4921 Compare September 22, 2024 14:26
containsIgnoreCase(url, ALLOW_URL_IN_LOCAL_IN_FILE_NAME)) {
String decodedUrl;
try {
decodedUrl = URLDecoder.decode(url, "UTF-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wait for #4882 we can use decode(String s, Charset charset)
URLDecoder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't noticed that decode(String s, Charser charset) doesn't exist in JAVA8. Thank you for letting me know.
Then, if #4882 merged into master, I will request approval again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4882 was merged into master. Please use StandardCharsets as second input Parameter.

@s2moon98
Copy link
Contributor Author

s2moon98 commented Oct 27, 2024

#4882 (Java release to 11) is merged to master branch.
Can @pan3793 review this pr, so I can merge this branch?

@pan3793
Copy link
Member

pan3793 commented Oct 28, 2024

some thoughts:

  • PR title and desc are misleading, the key change of this PR is, validating the decoded URL instead of the encoded URL
  • if the decoded URL is an illegal UTF-8 sequence, the new code just skips validation, I'm not sure this is the correct behavior. from the security perspective, we should fail immediately.

containsIgnoreCase(url, ALLOW_URL_IN_LOCAL_IN_FILE_NAME)) {
String decodedUrl;
try {
decodedUrl = URLDecoder.decode(url, "UTF-8");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4882 was merged into master. Please use StandardCharsets as second input Parameter.

@s2moon98 s2moon98 changed the title [ZEPPELIN-6095] add jdbc interpreter url validation check condition [ZEPPELIN-6095] validate decoded jdbc interpreter url Nov 5, 2024
@s2moon98 s2moon98 changed the title [ZEPPELIN-6095] validate decoded jdbc interpreter url [ZEPPELIN-6095] validate decoded url in jdbc interpreter Nov 5, 2024
@s2moon98
Copy link
Contributor Author

s2moon98 commented Nov 5, 2024

  • changed the PR title to show the key change.
  • changed code to fail immediately if url decoding throws error. Previously, I thought if decoding step fails, validating logic has a fault. However since @pan3793 says above, fail immediately is more close behavior to secure, I changed it.
  • use StandardCharsets in URLDecoder.decode method's second input parameter as @Reamer says above

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor adjustment necessary.

containsIgnoreCase(url, ALLOW_URL_IN_LOCAL_IN_FILE_NAME)) {
String decodedUrl;
try {
decodedUrl = URLDecoder.decode(url, StandardCharsets.UTF_8.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decodedUrl = URLDecoder.decode(url, StandardCharsets.UTF_8.toString());
decodedUrl = URLDecoder.decode(url, StandardCharsets.UTF_8);

should be sufficient. Check https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/net/URLDecoder.html

@pan3793
Copy link
Member

pan3793 commented Nov 5, 2024

LGTM, I have no further concerns.

@s2moon98
Copy link
Contributor Author

s2moon98 commented Nov 6, 2024

Thanks for @Reamer, I fix the URLDecoder.decode method to fit for java 11.
Additionally, in java 11 URLDecoder, it never throws UnsupportedEncodingException so I removed the relevant try-catch statement.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked out your pull request, please perform a git rebase on the current master so that your pull request benefits from the JDK 11 transition.

@@ -42,6 +42,9 @@
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.UnsupportedEncodingException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary import.

String decodedUrl;
decodedUrl = URLDecoder.decode(url, StandardCharsets.UTF_8);

if (containsIgnoreCase(decodedUrl, ALLOW_LOAD_LOCAL_IN_FILE_NAME) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation

Suggested change
if (containsIgnoreCase(decodedUrl, ALLOW_LOAD_LOCAL_IN_FILE_NAME) ||
if (containsIgnoreCase(decodedUrl, ALLOW_LOAD_LOCAL_IN_FILE_NAME) ||

@s2moon98 s2moon98 force-pushed the add-jdbc-interpreter-url-validate branch from eecf6c5 to b22eafe Compare November 6, 2024 11:44
@s2moon98 s2moon98 requested a review from Reamer November 6, 2024 11:48
@s2moon98
Copy link
Contributor Author

s2moon98 commented Nov 6, 2024

@Reamer
I rebased my pr after the master branch, and applied your reviews (fix indent & remove unused import).
Please check for it!

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Reamer Reamer merged commit e463373 into apache:master Nov 7, 2024
17 checks passed
asfgit pushed a commit that referenced this pull request Nov 7, 2024
### What is this PR for?

Add some validation check conditions to existing url validator in jdbc interpreter. So now it can check URLs with the conditions below if it has an unallowable configuration.
- UTF-8 encoded

### What type of PR is it?

Improvement

### Todos
* [ ] - Task

### What is the Jira issue?

[ZEPPELIN-6095](https://issues.apache.org/jira/browse/ZEPPELIN-6095)

### How should this be tested?

Input the url with unallowable configurations in UTF-8 encoded in JDBC type interpreter.  Then run the command in notebook and see if the command is blocked from running.

### Screenshots (if appropriate)

### Questions:
* Does the license files need to update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Closes #4838 from s2moon98/add-jdbc-interpreter-url-validate.

Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
(cherry picked from commit e463373)
Signed-off-by: Philipp Dallig <philipp.dallig@gmail.com>
@Reamer
Copy link
Contributor

Reamer commented Nov 7, 2024

Merged into master/branch-0.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants