Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Accessing an indexed (ragged) column by attribute returns VectorData instead of VectorIndex #1210

Open
rly opened this issue Nov 14, 2024 · 1 comment
Assignees
Labels
category: bug errors in the code or code behavior
Milestone

Comments

@rly
Copy link
Contributor

rly commented Nov 14, 2024

What happened?

The attribute syntax table.col_name currently returns the VectorData instead of the VectorIndex for a ragged array. It should return the same VectorIndex as in table[col_name] and table.get(col_name). All three methods should return the same result. Otherwise this is confusing. See also NeurodataWithoutBorders/pynwb#1990

Steps to Reproduce

from hdmf.common import DynamicTable

dt = DynamicTable(name="test", description="desc")
dt.add_column(name="col1", description="desc", index=True)
dt.add_row(col1=[0, 1, 2])

print(dt["col1"])  # returns VectorIndex
print(dt.get("col1"))  # returns VectorIndex
print(dt.col1)  # returns VectorData

print(dt["col1"][0])  # returns [0, 1, 2]
print(dt.get("col1")[0])  # returns [0, 1, 2]
print(dt.col1[0])  # returns 0

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.12

Package Versions

No response

@rly rly added the category: bug errors in the code or code behavior label Nov 14, 2024
@rly rly added this to the Future milestone Nov 14, 2024
@rly
Copy link
Contributor Author

rly commented Nov 14, 2024

@oruebel, @stephprince, and I discussed this today. We agree that the current methods are inconsistent and should be addressed. The plan is to make a breaking change for HDMF 5.0 (not the one this week):

  1. Change the dot accessor to return the VectorIndex to be consistent with the other two methods of accessing the column.
  2. Remove the VectorIndex columns from being accessible through these three methods. It's confusing to have dt.col1 and dt.col1_index return the same thing. These methods should only return the high-level columns after ragged/index processing.
  3. Users still need an easy way to get the raw columns. Add a new attribute on the table that is a dictionary that maps the name of the column to the column, whether it is a VectorData or VectorIndex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: bug errors in the code or code behavior
Projects
None yet
Development

No branches or pull requests

2 participants