Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIKE queries with left wildcard (%) make them unindexable (probably) #85

Open
thiell opened this issue Oct 19, 2017 · 1 comment
Open

Comments

@thiell
Copy link
Contributor

thiell commented Oct 19, 2017

I'm using rbh-report and while the filesystem is growing (but still less than 100M inodes), some commands that used to work within seconds start to be slow (several minutes). I tried to add an index on ENTRIES.fileclass but unfortunately, this doesn't seem to help.

We have slow query when using commands like:

$ rbh-report --user-info -cqS -C stanford+groups+nosnap -f /etc/robinhood.d/oak.conf
Cmd    ID      State         User      Host       DB            Time   Query                                                                                                                                                                                                                                                                                                                   
Query    1580  Sending data  robinhoo  localhost  robinhood_oa  01:32  SELECT uid as attr0,gid as
attr1,type as attr2,COUNT(*) as attr3,SUM(size) as attr4,SUM(blocks) as attr5,MIN(size) as
attr6,MAX(size) as attr7,ROUND(AVG(size)) as attr8 FROM ENTRIES WHERE ENTRIES.fileclass
LIKE BINARY '%+stanford+groups+nosnap+%' GROUP BY attr0,attr1,attr2 ORDER BY attr0
ASC,attr1 ASC,attr2 ASC

For this filesystem, MD rate is low enough that we can use additional indexes to speed up queries:

mysql> desc ENTRIES;
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| Field            | Type                                                   | Null | Key | Default | Extra |
+------------------+--------------------------------------------------------+------+-----+---------+-------+
| id               | varbinary(64)                                          | NO   | PRI | NULL    |       |
| uid              | varbinary(127)                                         | YES  | MUL | unknown |       |
| gid              | varbinary(127)                                         | YES  | MUL | unknown |       |
| size             | bigint(20) unsigned                                    | YES  |     | 0       |       |
| blocks           | bigint(20) unsigned                                    | YES  |     | 0       |       |
| creation_time    | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_access      | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_mod         | int(10) unsigned                                       | YES  |     | NULL    |       |
| last_mdchange    | int(10) unsigned                                       | YES  |     | NULL    |       |
| type             | enum('symlink','dir','file','chr','blk','fifo','sock') | YES  |     | file    |       |
| mode             | smallint(5) unsigned                                   | YES  |     | NULL    |       |
| nlink            | int(10) unsigned                                       | YES  |     | NULL    |       |
| md_update        | int(10) unsigned                                       | YES  |     | NULL    |       |
| invalid          | tinyint(1)                                             | YES  |     | NULL    |       |
| fileclass        | varbinary(1023)                                        | YES  | MUL | NULL    |       |
| class_update     | int(10) unsigned                                       | YES  |     | NULL    |       |
| alert_status     | enum('','clear','alert')                               | YES  |     |         |       |
| modeguard_status | enum('','ok','invalid')                                | YES  |     |         |       |
| alert_lstchk     | int(10) unsigned                                       | YES  |     | 0       |       |
| alert_lstalrt    | int(10) unsigned                                       | YES  |     | 0       |       |
+------------------+--------------------------------------------------------+------+-----+---------+-------+

According to some online web pages like the ones below, a left wildcard makes LIKE queries unable to use the column index, which is pretty coherent to what I'm seeing with MySQL 5.7:

https://makandracards.com/makandra/10843-mysql-can-i-speed-up-like-queries-by-adding-an-index
http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning

Would it be possible to remove the left wildcard to make it indexable and probably faster then?

@thiell
Copy link
Contributor Author

thiell commented Oct 19, 2017

Or what about getting rid of LIKE at all?

This is a link that could help:
https://stackoverflow.com/questions/2885564/ways-to-implement-tags-pros-and-cons-of-each

Way 3 would be more common for an SQL database, it is simply to use 2 additional tables: one containing the file classes (tags in the example above), and one other doing the mapping (taggings in the example).

This would probably be better than the current LIKE that needs to scan all entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant