Clustered index #354
-
Does Hyperspace support clustered indexes? This feature is important in quick select all (
Can Hyperspace help in this respect? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Since the index data is sorted & bucketed, pushed down filters can skip to read the parquet by using min/max stats in footer. Otherwise, we could consider z-order index for this. |
Beta Was this translation helpful? Give feedback.
-
I don't know if I was clear enough when I used the "clustered index" term. A clustered index is an index with leaves that do point to the position (like a C++ pointer) in the table where to find the data (ie: full record). In my opinion, a Hyperspace clustered index should be an index that would positionally map the indexed column value to one or many files + positions in those files of the full row matching the indexed column value. I specified "files" (plural) because in columnar formats a row may land in multiple places. Let me give you the following example... In Hyperspace case, given a table with 50 columns 10 of them nested - similar to the following schema:
I have create the following index:
When I do
The index will not be used because it won't match the projection fields against the To add all the 50 columns to the I would expect Hyperspace to go the the index and fetch the real data files that contains those 10 |
Beta Was this translation helpful? Give feedback.
-
Yes, we are exploring this as a non-covering index where we store indexed columns + pointers back to the source files (we will start with file-level granularity). We will update this thread as we make progress. Related issue: #342 |
Beta Was this translation helpful? Give feedback.
-
@andrei-ionescu I was thinking more about this and want to make sure I'm not missing any scenarios. In your question, were you referring to a From a traditional database standpoint, there can only be one clustered index because when we choose to build a clustered index on a particular column (or set of columns), the underlying table will be physically reorganized according to the clustering key. While this kind of an index will give a lot of performance boost for certain classes of queries, it may also require that Hyperspace On the other hand, if we go with a |
Beta Was this translation helpful? Give feedback.
Yes, we are exploring this as a non-covering index where we store indexed columns + pointers back to the source files (we will start with file-level granularity). We will update this thread as we make progress. Related issue: #342