Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dbplyr native backends instead of relying on SqlRender translation #257

Open
ablack3 opened this issue Dec 6, 2023 · 16 comments
Open
Assignees
Milestone

Comments

@ablack3
Copy link
Collaborator

ablack3 commented Dec 6, 2023

Hi @schuemie,

@edward-burn, @catalamarti, and I would like get the Darwin tools working with DatabaseConnector as a DBI database driver.
I think to do this we would like to change how DatabaseConnector integrates with dbplyr. Currently it works very differently than all other DBI drivers and we would like to do the work to make it work in the same way as other DBI drivers.

Basically the changes would be

  1. Use a DatabaseConnectorConnection class instead of the "Microsoft sql server" class for the connection.
  2. Add a DatabaseConnector-backend.R file to either dbplyr or DatabaseConnector so that dbply would generate the SQL for each database platform.

This would help a lot since we are currently having success using this approach with other DBI drivers but have issues with DatabaseConnector which many people would like to use with Darwin tools. This change allows us to use DatabaseConnector as a DBI driver backend without using SqlRender for SQL translation. dbplyr has been quite good for us in that regard so far.

Would you support these changes in principle? Ed, Marti, and I would do all the work and make a PR but we don't want to work on it if you're not supportive of the idea.

Also tagging @PRijnbeek and @rossdwilliams for awareness.

@schuemie
Copy link
Member

schuemie commented Dec 7, 2023

Obviously, having a dedicated OhdsiSql backend in dbplyr would be great, but it would also be a lot of work, implying a lot of future maintenance (have you looked at the existing backend implementations?)

What problem would this solve compared to the current solution (piggy-backing on the SQL Server backend)?

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 7, 2023

so I think we would indeed piggy back of the SQL server backend but only for Sql server.
We would use the postgres backend for postgres
Etc. for the other dbms

So I don't think I'll need to implement much. Just make sure the appropriate backend gets used based on the dbms.
This is what we do currently with other DBI drivers and we implemented very little.
If there's not strong objection to the idea then I'd suggest I'll make fork and try to implement it to see if it works. Then you can review and provide feedback.

I'm not sure all the issues we run into with the current implementaion but the double sql translation means we get different sql depending on if users are connecting with DatabaseConnector or another DBI driver so there is just more places for potential issues. Currently some functionality works and some does not. And we need to tackle each issue one by one.

@schuemie
Copy link
Member

schuemie commented Dec 7, 2023

I think I'm misunderstanding what you're proposing. I thought you wanted to create a backend that generates OhdsiSql, which would then be translated into the various supported dialects using SqlRender. But it seems you want to create a backend that generates different SQL based on the DBMS?

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 7, 2023

Yea it's hard to explain on github. Here is the PR which might help #258.
Basically we would like to use DatabaseConnector as a database driver only and use dbplyr to generate SQL. The PR above is one way to do that but there could be other approaches that could work as well. We would like to skip SqlRender.

So instead of dbplyr -> sqlrender -> database
we do dbplyr -> database

DatabaseConnector handles delivering SQL to the database and returning the result but would not modify the dbplyr generated SQL which currently works for us.

@schuemie
Copy link
Member

schuemie commented Dec 8, 2023

But if I read the PR correctly, you don't need a DatabaseConnector backend? You just give the DatabaseConnector connection the class corresponding to the DBMS, so the correct dbplyr backend will be invoked?

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 8, 2023

Yes that is correct. I'm just extending your initial approach to other dbms and removing the need for the sqlrender::translate step.

I haven't tested this much yet though so I will need to make sure it works the way I expect it to. But I think you get the idea.

@schuemie
Copy link
Member

schuemie commented Dec 8, 2023

I think I understand now. This is what I also was trying to suggest here. We could keep the backend using SqlRender for platforms that don't yet have a dbplyr backend (if there are any left).

This definitely fits with the idea of making DatabaseConnector more lightweight, so I like it. Since this is a pretty big change, I think we should not work in the develop branch for now (which can still be used for quick patches). Should I create a separate branch where you can work on it, or do you want to keep working in the fork?

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 8, 2023

Ok great! Yes please create a separate branch for me to target with my pull request. I'll do a bit more testing and make sure the fork is working as I expect.

@schuemie
Copy link
Member

Ok, I've created https://github.com/OHDSI/DatabaseConnector/tree/dbplyr_backends, and merged your PR there.

We already have lots of unit tests for dbplyr, let's see how well they perform with this new approach.

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 11, 2023

May I have the ability to push directly to the OHDSI/DatabaseConnector:dbplyr_backends branch? This would help becuase then the full set of tests will run on my PR. If I make a PR from an external branch it does not have access to the environment variables on github and only a subset of tests run meaning that I cannot accurately ascertain if tests are actually passing on my PR until you merge to a branch that is in the OHDSI organization.

@ablack3
Copy link
Collaborator Author

ablack3 commented Dec 11, 2023

Also do you have an objection to changing the default value of translate on the DBI methods? If we can we use a global option to set this default? If not it will be a breaking change and the tests will need to be modified. Personally I think it would be better for the DBI methods not to translate but that is only my opinion and as long as I can opt of that with a global option for dplyr stuff it should be fine I think.

@schuemie
Copy link
Member

You should have write access now.

I would prefer the DBI functions to continue to translate by default when the user directly calls these functions. That fits the idea that DatabaseConnector is a universal DBI driver (code once, run on any platform).

I'm sure we can make it so it doesn't translate if the SQL is coming from one of the dbplyr backends.

@schuemie schuemie added this to the v7.0.0 milestone Dec 22, 2023
@schuemie
Copy link
Member

Note that the changes to the odbc package Hadley mentioned are now (since 1.4.0) in effect. The various database classes are now formally exported, so we don't have to re-define them ourselves.

@schuemie schuemie changed the title Rethink a bit DatabaseConnector dplyr sql generation Use dbplyr native backends instead of relying on SqlRender translation Aug 15, 2024
@schuemie
Copy link
Member

@ablack3 : have you had a change to work on this, as discussed? #275

@ablack3
Copy link
Collaborator Author

ablack3 commented Aug 17, 2024

Yes I have. I'm confused about method dispatch though because it isn't working as I expect it to.

I'll try to explain with a simple example.

In DatabaseConnector I added a method for the dbplyr_edition generic function.

#' @export
dbplyr_edition.DatabaseConnectorJdbcConnection <- function(con) {
  2L
}

I regenerated package documentation and rebuilt the package. I can see that the method is exported in NAMESPACE.

I connect to a local database and check the class of the connection.

> library(DatabaseConnector)
> connectionDetails <- createConnectionDetails(dbms = "postgresql",
+                                              server = "localhost/eunomia",
+                                              port = 5432,
+                                              user = "postgres", 
+                                              password = "postgres")
> write_schema <- "scratch"
> cdm_schema <- "cdm"
> con <- connect(connectionDetails)
Connecting using PostgreSQL driver
> class(con)
[1] "DatabaseConnectorJdbcConnection"
attr(,"package")
[1] ".GlobalEnv"

When I run dbplyr_edition my method is not being called.

> dbplyr_edition(con)
[1] 1

I used sloop to investigate the dispatch and it looks as if my method should be called.

image image

I used debugonce to follow the execution of dbplyr_edition and I can see that the method I added is not being called.

image image

If I load the method into the global environment it does get called as I expect.

image

I thought maybe this has do with using S3 methods on S4 classes but the connection object seems similar to the duckdb connection. However the similar method in the duckdb R package does get called correctly.

I think I need to understand why method dispatch isn't working the way I expect in DatabaseConnector. It's probably something simple but I need it to make sure the other dbplyr methods are dispatched correctly. And I also want to understand it. Any ideas? I might try making a small package with just the methods I'm working on.

@ablack3
Copy link
Collaborator Author

ablack3 commented Aug 17, 2024

Ah my problem was that I was not importing the generic into the package.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants