Skip to content

Commit

Permalink
Use custom resolver for query and eval with nested frames.
Browse files Browse the repository at this point in the history
Verify preflighting of nested expressions using AST visitation.

Remove logic for splitting queries by string.  Now the evaluation is
handled by a nested column resolver, and the mixed-mode expressions
are preflighted by examining the parsed abstract syntax tree for
the query expression.
  • Loading branch information
gitosaurus committed Oct 9, 2024
1 parent 12d1293 commit 850bd03
Show file tree
Hide file tree
Showing 10 changed files with 320 additions and 93 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,9 @@ dmypy.json
# vscode
.vscode/

# PyCharm
.idea/

# dask
dask-worker-space/

Expand Down
16 changes: 8 additions & 8 deletions docs/tutorials/data_loading_notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"With a valid Python environment, nested-pandas and it's dependencies are easy to install using the `pip` package manager. The following command can be used to install it:"
"With a valid Python environment, nested-pandas and its dependencies are easy to install using the `pip` package manager. The following command can be used to install it:"
]
},
{
Expand Down Expand Up @@ -47,7 +47,7 @@
"\n",
"We can use the `NestedFrame` constructor to create our base frame from a dictionary of our columns.\n",
"\n",
"We can then create an addtional pandas dataframes and pack them into our `NestedFrame` with `NestedFrame.add_nested`"
"We can then create an addtional pandas dataframes and pack them into our `NestedFrame` with `NestedFrame.add_nested`."
]
},
{
Expand Down Expand Up @@ -97,7 +97,7 @@
"# Note: that we use the `tempfile` module to create and then cleanup a temporary directory.\n",
"# You can of course remove this and use your own directory and real files on your system.\n",
"with tempfile.TemporaryDirectory() as temp_path:\n",
" # Generates parquet files with random data within our temporary directorye.\n",
" # Generates parquet files with random data within our temporary directory.\n",
" generate_parquet_file(10, {\"nested1\": 100, \"nested2\": 10}, temp_path, file_per_layer=True)\n",
"\n",
" # Read each individual parquet file into its own dataframe.\n",
Expand Down Expand Up @@ -148,7 +148,7 @@
"source": [
"So inspect `nf`, a `NestedFrame` we created from our call to `read_parquet` with the `to_pack` argument, we're able to pack nested parquet files according to the shared index values with the index in `base.parquet`.\n",
"\n",
"The resulting `NestedFrame` having the same number of rows as `base.parquet` and with `nested1.parquet` and `nested2.parquet` packed into the 'nested1' and 'nested2' columns respectively."
"The resulting `NestedFrame` having the same number of rows as `base.parquet` and with `nested1.parquet` and `nested2.parquet` packed into the `nested1` and `nested2` columns respectively."
]
},
{
Expand All @@ -164,7 +164,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we loaded each individual parquet file into its own dataframe, we can also verify that using `read_parquet` with the `to_pack` argument is equivalent to the following method of packing the dataframes directly with `NestedFrame.add_nested`"
"Since we loaded each individual parquet file into its own dataframe, we can also verify that using `read_parquet` with the `to_pack` argument is equivalent to the following method of packing the dataframes directly with `NestedFrame.add_nested`."
]
},
{
Expand All @@ -189,11 +189,11 @@
"source": [
"# Saving NestedFrames to Parquet Files\n",
"\n",
"Additionally we can save an existing `NestedFrame` as one of more parquet files using `NestedFrame.to_parquet``\n",
"Additionally we can save an existing `NestedFrame` as one of more parquet files using `NestedFrame.to_parquet`.\n",
"\n",
"When `by_layer=True` we save each individual layer of the NestedFrame into its own parquet file in a specified output directory.\n",
"\n",
"The base layer will be outputted to \"base.parquet\", and each nested layer will be written to a file based on its column name. So the nested layer in column `nested1` will be written to \"nested1.parquet\"."
"The base layer will be outputted to `base.parquet`, and each nested layer will be written to a file based on its column name. So the nested layer in column `nested1` will be written to `nested1.parquet`."
]
},
{
Expand Down Expand Up @@ -233,7 +233,7 @@
"source": [
"We also support saving a `NestedFrame` as a single parquet file where the packed layers are still packed in their respective columns.\n",
"\n",
"Here we provide `NestedFrame.to_parquet` with the desired path of the *single* output file (rather than the path of a directory to store *multiple* output files) and use `per_layer=False'\n",
"Here we provide `NestedFrame.to_parquet` with the desired path of the *single* output file (rather than the path of a directory to store *multiple* output files) and use `per_layer=False`.\n",
"\n",
"Our `read_parquet` function can load a `NestedFrame` saved in this single file parquet without requiring any additional arguments. "
]
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/data_manipulation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we can directly fetch a column from our nested column (aptly called \"nested\"). For example, below we can fetch the time column, \"t\", by specifying `\"nested.t\"` as the column to retrieve. This returns a \"flat\" view of the nested t column, where all rows from all dataframes are present in one dataframe."
"First, we can directly fetch a column from our nested column (aptly called \"nested\"). For example, below we can fetch the time column, \"t\", by specifying `\"nested.t\"` as the column to retrieve. This returns a \"flat\" view of the nested `t` column, where all rows from all dataframes are present in one dataframe."
]
},
{
Expand Down Expand Up @@ -170,7 +170,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is functionally equivalent to using `add_nested`"
"This is functionally equivalent to using `add_nested`:"
]
},
{
Expand Down
18 changes: 9 additions & 9 deletions docs/tutorials/low_level.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"# Lower-level interface for performance and flexibility\n",
"## Reveal the hidden power of nested Series\n",
"\n",
"This section is for users looking to optimize the performance, both computationally and in memory-usage, of their workflows. This section also details a broader suite of data representations usable within `nested-pandas`.\n",
"This section is for users looking to optimize both the compute and memory performance of their workflows. This section also details a broader suite of data representations usable within `nested-pandas`.\n",
"It shows how to deal with individual nested columns: add, remove, and modify data using both \"flat-array\" and \"list-array\" representations.\n",
"It also demonstrates how to convert nested Series to and from different data types, like `pd.ArrowDtype`d Series, flat dataframes, list-array dataframes, and collections of nested elements."
]
Expand Down Expand Up @@ -36,7 +36,7 @@
"source": [
"## Generate some data and get a Series of `NestedDtype` type\n",
"\n",
"We are going to use built-in data generator to get a `NestedFrame` with a \"nested\" column being a `Series` of `NestedDtype` type.\n",
"We are going to use the built-in data generator to get a `NestedFrame` with a \"nested\" column being a `Series` of `NestedDtype` type.\n",
"This column would represent [light curves](https://en.wikipedia.org/wiki/Light_curve) of some astronomical objects. "
]
},
Expand Down Expand Up @@ -94,7 +94,7 @@
"id": "33d8caacf0bf042e",
"metadata": {},
"source": [
"You can also get a list of fields with `.fields` attribute"
"You can also get a list of fields with `.fields` attribute:"
]
},
{
Expand Down Expand Up @@ -130,7 +130,7 @@
"id": "7167f5a9c947d96f",
"metadata": {},
"source": [
"You can also get a subset of nested columns as a new nested Series"
"You can also get a subset of nested columns as a new nested Series:"
]
},
{
Expand Down Expand Up @@ -479,7 +479,7 @@
"source": [
"#### pd.Series from an array\n",
"\n",
"Construction with `pyarrow` struct arrays is the cheapest way to create a nested Series. It is very semilliar to initialisation of a `pd.Series` of `pd.ArrowDtype` type."
"Construction with `pyarrow` struct arrays is the cheapest way to create a nested Series. It is very similar to the initialization of a `pd.Series` of `pd.ArrowDtype` type."
]
},
{
Expand Down Expand Up @@ -611,21 +611,21 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
Expand Down
10 changes: 5 additions & 5 deletions docs/tutorials/nested_spectra.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
"flux = np.array([])\n",
"err = np.array([])\n",
"index = np.array([])\n",
"# Loop over each spectrum, adding it's data to the arrays\n",
"# Loop over each spectrum, adding its data to the arrays\n",
"for i, hdu in enumerate(sp):\n",
" wave = np.append(wave, 10 ** hdu[\"COADD\"].data.loglam) # * u.angstrom\n",
" flux = np.append(flux, hdu[\"COADD\"].data.flux * 1e-17) # * u.erg/u.second/u.centimeter**2/u.angstrom\n",
Expand Down Expand Up @@ -115,7 +115,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And we can see that each object now has the \"coadd_spectrum\" nested column with the full spectrum available."
"And we can see that each object now has the `coadd_spectrum` nested column with the full spectrum available."
]
},
{
Expand Down Expand Up @@ -161,7 +161,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -175,9 +175,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
Loading

0 comments on commit 850bd03

Please sign in to comment.