[FEATURE REQUEST]: It would be very nice to use index when underlying table is applied with CACHE TABLE #130
Replies: 5 comments
-
Thanks @SynapsePOC for reporting this. AFAIK, caching the base table doesn't make sense in Hyperspace, since our index can have a subset of the original columns. (but I see that it could be beneficial if we could cache the index directly via some mechanism, but we need to investigate more on this.) You could always cache the result of query: val t = spark.sql("SELECT C1, C2 FROM tbl WHERE C1=1")
t.cache , in which case, cache can be populated with index, but I don't think this is the scenario you are targeting. cc-ing few people if I missed anything: @rapoth @apoorvedave1 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
We will investigate on what is required and whether this is feasible and get back. Thanks! |
Beta Was this translation helpful? Give feedback.
-
The value of this proposal can go manyfold, if cache is co-located with partition, thus retaining in the hands of user control over distribution of data amongst the nodes, participating in cluster. Thus, providing for the applications requiring most intensive query activity, to harvest significant volumes of data, without having to deal with shuffling and limitations of the driver node. |
Beta Was this translation helpful? Give feedback.
-
For the scenario in the description, the problem is that Spark's Cache Manager changed "Relation" of cached df to "InMemoryTableScan" before the optimizer which applies indexes if possible. These are some possible scenarios:
(new api change required #254)
In this way, we could apply index data while using caching. Or would you still like to utilize "cached" table along with "cached" index data though inefficient memory footprint? |
Beta Was this translation helpful? Give feedback.
-
Consider this script:
The last explain produces plan as following:
In my mind this is truly reduces index usage scenario set. It would be very nice if caching table actually promoted index use, at the moment caching index actually turns index use off.
I think one of the issues logged already, can help to address it
#30
but I would like to make sure this is not lost in translation.
Beta Was this translation helpful? Give feedback.
All reactions