Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using RMariaDB with Columnstore Tables #183

Open
Mosk915 opened this issue Nov 3, 2020 · 5 comments
Open

Using RMariaDB with Columnstore Tables #183

Mosk915 opened this issue Nov 3, 2020 · 5 comments
Labels

Comments

@Mosk915
Copy link

Mosk915 commented Nov 3, 2020

I am trying to use the RMariaDB package to query MariaDB columnstore tables. Queries that should take seconds are taking much longer.

For example a simple query like dbGetQuery(conn, "select count(*) from example_table"), where example_table has about 300 million rows, takes under a second using other IDEs or using the odbc R package. But when using RMariaDB it takes much longer.

Is there some setting or option that needs to be changed to make the RMariaDB package perform optimally with columnstore tables?

@Mosk915 Mosk915 changed the title Using RMariaDB with a Columnstore Database Using RMariaDB with Columnstore Tables Nov 18, 2020
@krlmlr
Copy link
Member

krlmlr commented Dec 26, 2020

Thanks, this is really weird (and doesn't look like a duplicate of #131). Is this still a problem? Is the client doing any work (=full load on one or more CPU cores)? How can I replicate this?

@krlmlr krlmlr added the reprex label Dec 26, 2020
@Mosk915
Copy link
Author

Mosk915 commented Dec 26, 2020

Yes, still a problem. Our DBAs took a look at this and noticed that when queries are executed using RMariaDB, it is not multi-threading. They ran the vmstat command on the MariaDB host and saw that the CPU idle time only dips slightly below 100. When the queries are executed using the odbc package or through some other IDE, the idle time dips into the single digits.

As far as how to replicate this, I'm not sure. When you run a select count(*) on a columnstore table with several hundred million rows, does the result come back within seconds or take several minutes? We use RStudio Server Pro, but I've also tested this using RStudio on my local Windows machine and I have the same issue, which is why we are thinking it has something to do with the RMariaDB package. If you don't have the same issue, I'm hoping it's just a matter of us setting some configuration that allows the queries to multi-thread when they are executed.

@krlmlr
Copy link
Member

krlmlr commented Dec 27, 2020

IIUC, the database host has multiple CPUs, and only one is utilized by SELECT COUNT(*) when issued from RMariaDB.

According to https://mariadb.com/kb/en/mariadb-columnstore-performance-concepts/, SELECT COUNT(*) performs a scan over one column. This explains why multiple processors work faster on the task.

One thing RMariaDB does differently is the use of parametrized queries throughout. I have no evidence that this might be the actual reason, it's worth a try. Would you be able to run a source installation when #147 is implemented?

@Mosk915
Copy link
Author

Mosk915 commented Dec 27, 2020

Yes I can run a source installation. Just let me know when.

Just to clarify, select count(*) was just a simple example of a query that is slower when using RmariaDB, but it's really any query run against columnstore tables that is slower. In order to reproduce an example using a more complicated query, you'd need tables of a sufficiently large size so that the query doesn't complete instantaneously. I'm not sure how to go about providing a reproducible example like that, but in the event you find a solution for select count(*) queries but not all queries in general, I can look into it more at that time.

@krlmlr krlmlr added this to the 1.1.2 milestone Aug 29, 2021
@krlmlr
Copy link
Member

krlmlr commented Sep 5, 2021

Blocked by #147 which I had to postpone.

@krlmlr krlmlr modified the milestones: 1.1.2, 1.2.0 Sep 5, 2021
@krlmlr krlmlr removed this from the 1.2.0 milestone Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants