Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1785 (backport of #1782) broke some like queries on SQL DBs #1869

Open
FedericoMassaioli opened this issue Sep 24, 2020 · 17 comments
Open

#1785 (backport of #1782) broke some like queries on SQL DBs #1869

FedericoMassaioli opened this issue Sep 24, 2020 · 17 comments
Assignees
Labels

Comments

@FedericoMassaioli
Copy link

FedericoMassaioli commented Sep 24, 2020

Starting from loopback-datasource-juggler@3.33.2 queries using like, nlike, ilike, and nilike operators on SQL DBs (PostgreSQL in my case) fails, because the presence of parentheses (like in {where: {name: {nlike: 'Internal service (%'}}) or other characters forbidden by the escapeRegEx() cause the rewrite of the query string and the query to fail.

@dhmlau
Copy link
Member

dhmlau commented Sep 24, 2020

@mitsos1os, thanks for taking a look at this issue.

@mitsos1os
Copy link
Contributor

mitsos1os commented Sep 24, 2020

Hi @FedericoMassaioli . Could you please give me some extra details about the problem you are facing since I am not extremely familiar with PostgreSQL?

  • Isn't nlike supposed to accept a RegExp expression?
  • What would be the expected query to end up in PostgreSQL DB?

Thanks

@mitsos1os
Copy link
Contributor

mitsos1os commented Sep 24, 2020

Ok nevermind, I saw how the nlike is translated to PostgreSQL here

So since the actual values in the like, nlike, etc.. are not expected to be RegExp in all DB systems, should this functioality be removed and moved to connector specific logic as @FedericoMassaioli suggested?

What do you think @bajtos ?

@FedericoMassaioli
Copy link
Author

@mitsos1os is not a PostgreSQL only thing, the like operator & its friends are standard SQL, so the issue affects all connectors to SQL DBs.

@FedericoMassaioli
Copy link
Author

@mitsos1os is not a PostgreSQL only thing, the like operator & its friends are standard SQL, so the issue affects all connectors to SQL DBs.

After looking at the code you referenced, I devised the workaround of escaping with a double backslash the characters that trigger escapeRegEx(). I wonder if this works with connectors to different SQL DBs, though.

@mitsos1os
Copy link
Contributor

mitsos1os commented Sep 25, 2020

@FedericoMassaioli you are right. My SQL was a bit rusty 😄 ...

After looking at the code you referenced, I devised the workaround of escaping with a double backslash the characters that trigger escapeRegEx(). I wonder if this works with connectors to different SQL DBs, though.

Without looking into it, I think that is the same result with escapeRegEx... Does it produce the DB results you want this way?

The thing is I did the fix while using MongoDB and also got carried away by the documentation which references like operators to be used with Regular Expressions https://loopback.io/doc/en/lb3/Where-filter.html#operators so maybe @dhmlau this needs to be revised as well..

So I believe a decision should be taken regarding the usage of the like operators. I believe since they end up in different DB systems without the need to be valid Regular Expressions, we need to take the path of per connector logic.

Let's take @bajtos opinion as well and I can then move on to the fix

@FedericoMassaioli
Copy link
Author

Without looking into it, I think that is the same result with escapeRegEx... Does it produce the DB results you want this way?

It works with the escape I added. When escapeRegEx() sanitized the original string it didn't return the same results, but this was discovered on a live DB, undergoing modifications to the data in records targeted by the queries, I need to find time for a more complete analysis.
IF all connectors to SQL DBs take care to properly set the escape character to \ in LIKE strings, that would work.

For sure, the documentation is too terse on this aspect, there is no mention of the need to escape, or what character to escape, not even in the PostgreSQL specific section https://loopback.io/doc/en/lb3/Where-filter.html#ilike-and-nilike.
Moreover, wrt. documentation the console warning emitted by escapeRegEx() and caught by the log collector appears illogical.

And wrt. to tests, I think that this slipped by because no SQL tests are present in the test suite, as I commented in #1785
You can do a lot of 'noSQL' data management with modern SQL DBs, getting the best of both worlds, so do not discard them lightly ;-)

@mitsos1os
Copy link
Contributor

It works with the escape I added. When escapeRegEx() sanitized the original string it didn't return the same results, but this was discovered on a live DB, undergoing modifications to the data in records targeted by the queries, I need to find time for a more complete analysis.

@FedericoMassaioli I would appreciate it if you could have the chance if it actually produced the same results.

IF all connectors to SQL DBs take care to properly set the escape character to \ in LIKE strings, that would work.

Connector logic is deeper in the execution since this escape is done in DAO layer before calling the connector methods... It will probably need to be removed...

@FedericoMassaioli
Copy link
Author

FedericoMassaioli commented Sep 25, 2020

@mitsos1os I had extensive checks made in a controlled situation and, as far as PostgreSQL is concerned, the escape is not harmful and results are correct, only an annoying and hard to explain console warning.
So, wrt. PostgreSQL, it reduces to a documentation issue: the documentation should be made more consistent with the implementation, the spurious console warning could be changed.

However, I had a look at source code of other SQL DBs loopback connectors (I do not have time or DBMSs available at the moment to perform actual tests), with the following results:

  • a \ escape for like expressions is explicitly enforced for MS SQL Server, Oracle, and PostgreSQL, so none of them should experience query errors, just a surprising console warning;
  • no escape character for like expressions is explicitly enforced for Informix, MySQL, and SQLite3, all of them anyhow default to \, again no query errors, just a surprising console warning;
  • no escape character for like expressions is explicitly enforced for DB2, which to my knowledge has no default. Problems are foreseeable in this combination.

Moreover, let me make another remark.
Let's imagine you have LB running against memory and MongoDB datasources. basically, if something is wrong with the regex, escapeRegEx() will escape every control character, even the one which appears in a correct combination. Albeti the query will proceed smoothly to its completion, the resulting, escaped regex would probably not match what originally intended, but something different.
Now, if the developer/user is lucky, completely unexpected results will be returned, and the problem with the query will be detected. If, however, empty results or results similar to expectations are returned, the developer/user could even not notice. Unless it looks carefully at the console log, which is maybe overabundant for some other reason.
Wouldn't it be better if LB refused such a query returning a clear, meaningful error response?

FWIW in this scenario I'd definitely prefer to be rejected with an error response.

@mitsos1os
Copy link
Contributor

@FedericoMassaioli Thank you for taking the time to perform the tests.

We wad a similar discussion when first implementing the issue. So since this is an architectural decision let's wait to get an extra opinion on how to proceed.

@dhmlau What do you think?

@dhmlau
Copy link
Member

dhmlau commented Sep 27, 2020

sounds good.
Looping in @bajtos @raymondfeng @jannyHou @emonddr who commented/reviewed the original PR: #1782.

@bajtos
Copy link
Member

bajtos commented Oct 6, 2020

Thank you @FedericoMassaioli for detailed investigation of the issue and how different connector/databases handle escaping.

From my point of view, I would expect the following behavior at high-level:

  • The operator like & friends are connector specific. Some database/connectors accept a regular expression, some expect an SQL LIKE wildcard, etc. IMO, that means sanitization and input validation should be implemented at connector level, to take into account any connector specific aspects.
  • When the client provides an invalid like value, then the request should fail with a descriptive error. Ideally, this error should include machine-readable information (e.g. a code property set to INVALID_PATTERN) allowing the clients to detect this particular problem and handle it specially in the UI. (See the use case described in Make sure invalid RegExp string is properly sanitized in query #1782 (comment).)
  • To fix the crash reported in Unsanitized RegExp special content crashes application server #1781, we should modify our memory connector to properly handle the case where a string cannot be converted to a regular expression.

If we agree that's the direction to take, then I think we need to take the following steps:

  1. Improve the memory connector to reject queries with invalid like values.
  2. Revert Make sure invalid RegExp string is properly sanitized in query #1782
  3. If needed, modify connectors like MongoDB to detect errors caused by invalid like values and modify the error in such way that this particular error case can be more easily detected by the clients.

Steps 1 and 2 should be backported to loopback-datasource-juggler 3.x too.

Thoughts?

@bajtos bajtos added the major label Oct 6, 2020
@mitsos1os
Copy link
Contributor

@bajtos I agree. The initial #1781 should be solved in connector level, Memory and Mongo.
I believe that the same path should be followed, as it was in the original issue.
If the connector (Memory or Mongo) accepts a RegEx value but a non valid RegEx value is provided, escape it and print a warning.

@bajtos
Copy link
Member

bajtos commented Oct 13, 2020

The initial #1781 should be solved in connector level, Memory and Mongo.

+1

If the connector (Memory or Mongo) accepts a RegEx value but a non valid RegEx value is provided, escape it and print a warning.

Based on the discussion above, I am afraid printing a warning is not enough to let app developers notice the problem. IMO, we should implement a more robust solution and reject requests with invalid RegExp values. It's the responsibility of the client to send a valid query, escaping invalid RegExp in a way that makes sense in the client/GUI context.

@mitsos1os
Copy link
Contributor

So how would you like to proceed? Should I proceed issuing pull requests for memory and mongo?

Based on the discussion above, I am afraid printing a warning is not enough to let app developers notice the problem. IMO, we should implement a more robust solution and reject requests with invalid RegExp values. It's the responsibility of the client to send a valid query, escaping invalid RegExp in a way that makes sense in the client/GUI context.

My only objection to this, is that I find it inconsistent as behavior since it is not related to the client's knowledge of the backend.

For example, since Loopback provides a generic interface for REST API no matter the underlying datasource, a client could issue the same like query that will pass when sent to PostgreSQL Database and will be rejected as invalid Regex when sent to Mongo

@bajtos
Copy link
Member

bajtos commented Oct 13, 2020

So how would you like to proceed? Should I proceed issuing pull requests for memory and mongo?

That would be awesome! Let's start with fixing juggler (reverting your original change) and the memory connector (to handle invalid RegExp values).

My only objection to this, is that I find it inconsistent as behavior since it is not related to the client's knowledge of the backend.

For example, since Loopback provides a generic interface for REST API no matter the underlying datasource, a client could issue the same like query that will pass when sent to PostgreSQL Database and will be rejected as invalid Regex when sent to Mongo

I see your point.

Unfortunately, the inconsistency cannot be avoided, because SQL connectors expect a wildcard (e.g. Hello %), not a regular expresion (e.g. Hello .*). See PostgreSQL.prototype.buildExpression, PostgreSQL - LIKE Clause and Pattern Matching in PostgreSQL docs.

IMO, the REST API should make it clear what kind of patterns is expected for like and related operators, and it's up to the client to ensure they send a valid pattern. That means escaping RegExp-like values for memory and mongodb, escaping SQL wildcard for PostgreSQL, and so on.

@mitsos1os
Copy link
Contributor

mitsos1os commented Oct 13, 2020

IMO, the REST API should make it clear what kind of patterns is expected for like and related operators, and it's up to the client to ensure they send a valid pattern. That means escaping RegExp-like values for memory and mongodb, escaping SQL wildcard for PostgreSQL, and so on.

Hmmmm, I still consider the backend silent escaping instead of throwing excpection a more suitable solution. But no real problem with that... 😉

I can issue the pull requests that implement the query rejection, if you can revert the pull request!

@dhmlau dhmlau added the community-contribution Patches contributed by community label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants