-
Notifications
You must be signed in to change notification settings - Fork 0
Input Data Formats
NOTE: Some of the information in this page is also summarized here For each of the input files, the following formatting should be taken into consideration:
Well Intervals with Descriptions: well_data A file or files that can be read by pandas.read_csv() should have the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):
- Unique well ID: Unique identifying name or number for each well (e.g., API Number)
- Top: the depth (or elevation) at the top of the well interval in each row
- Bottom: the depth (or elevation) at the bottom of the well interval in each row
- Description: a geologic description of the well interval in each row
Individual well information and their location data: metadata This file (or another metadata file) should have the following data and columns:
- Unique well ID (if another file, it will need to be joined with the other table using this ID)
- X location: the x-coordinate for each well (often Longitude)
- Y location: the y-coordinate for each well (often Latitude)
- Z location: the z-coordinate for each well (i.e., the surface elevation)
- Coordinate Reference System: The coordinate reference system of the coordinates above (does not need to be a separate column if all coordinates are in the same CRS)
Examples of different file/column/field organization are shown below. The first line of each table is the field names, and the following lines have example data.
Data (i.e., the descriptions for each well interval) and metadata (the well location) are contained in separate files Well Data File
API_NUMBER | FORMATION | TOP | BOTTOM | ... |
---|---|---|---|---|
120010001400 | dolomite, sandy, silty, light gray, fine | 0 | 25 | ... |
120010001400 | limestone, cherty, fine/coarse, medium gray | 25 | 71 | ... |
120010001500 | clay, sandy, & silty | 0 | 60 | ... |
120010001500 | shale | 60 | 80 | ... |
... | ... | ... | ... | ... |
Description of each column in well data file:
- API_NUMBER: unique well identifier
- FORMATION: description of the geology at each interval
- TOP: Depth at top of well interval
- BOTTOM: Depth at bottom of well interval
- ...: other potential columns (as desired, not needed)
Well Metadata File
API_NUMBER | LATITUDE | LONGITUDE | ELEVATION_FT | ... |
---|---|---|---|---|
120010000400 | 40.051772 | -90.995909 | 731 | ... |
120010001400 | 40.021306 | -91.086866 | 760 | ... |
120010001500 | 40.109237 | -91.40366 | 493 | ... |
... | ... | ... | ... | ... |
Description of each column in well metadata file:
- API_NUMBER: unique well identifier
- LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
- LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
- ELEVATION_FT: Z coordinate of well (feet above sea level)
- ...: other potential columns (as desired, not needed)
Well Data File
API_NUMBER | LATITUDE | LONGITUDE | SURF_ELEV_FT | FORMATION | TOP | BOTTOM | ... |
---|---|---|---|---|---|---|---|
120010001400 | 40.051772 | -90.995909 | 731 | dolomite, sandy, silty, light gray, fine | 0 | 25 | ... |
120010001400 | 40.051772 | -90.995909 | 731 | limestone, cherty, fine/coarse, medium gray | 25 | 71 | ... |
120010001500 | 40.109237 | -91.40366 | 493 | clay, sandy, & silty | 0 | 60 | ... |
120010001500 | 40.109237 | -91.40366 | 493 | shale | 60 | 80 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
Description of each column in well data file:
- API_NUMBER: unique well identifier
- LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
- LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
- SURF_ELEV_FT : Z coordinate of well (feet above sea level)
- FORMATION: description of the geology at each interval
- TOP: Depth at top of well interval
- BOTTOM: Depth at bottom of well interval
- ...: other potential columns (as desired, not needed)
The following rasters files/grids are needed for this package:
- Surface Elevation: The surface elevation in the area of interest
- Bedrock Elevation: The bedrock elevation in the area of interest
- Model Grid: A grid (can be empty) with the CRS, node/cell size, and node/cell locations that the other data will be aligned with.
- This can be one of the other rasters above, or it can be its own grid/raster, or the parameters can be specified and the grid can be created
- This should align with a grid being used in MODFLOW or other modeling software, if that is the purpose of using this package
Data is read in using the rioxarray.open_rasterio() function, which uses rasterio.open() to open a raster file as a xarray.DataSet or xarray.DataArray (or a list of xarray.Datasets, see rasterio.open()). rasterio in turn uses GDAL, which claims to support the file formats listed here.
The study area is read in using the geopandas python package. Specifically, the geopandas.read_file() function is used to read geospatial data into a geopandas.GeoDataFrame. The geopandas.read_file() function uses fiona.open() to open the geospatial files and insert their geometry into a geopandas.GeoDataFrame. Fiona, in turn, also uses GDAL drivers, and supports the file formats listed here.
This will be converted to the Coordinate Reference System (CRS) of your choosing using geopandas.GeoDataFrame.to_crs(), which uses any CRS format accepted by pyproj.
import geopandas as gpd
study_area = gpd.read_file(study_area_filepath)
studyAreaIN.to_crs(study_area_crs, inplace=True)
File(s) with descriptions in one column, and a lithological classification in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):
- Description: geological descriptions that match all or parts of descriptions in the main well_data file.
- These descriptions may be used for either exact matches with the well_data descriptions, a match of an initial substring, or a match of a substring anywhere in the well_data description
- Lithology: the lithology of the geologic description to be matched, narrowed down to a handful of useful (hydrogeologically or otherwise) categories
- For example, a description of "blue dolomite" could be Bedrock; a description of "coarse sand w/gravel" could be "Sand and Gravel"
- Definition Type: a column containing either 'exact', 'start', 'substring'/'wildcard', or 'all' specifying what type of match should be attempted [currently not implemented]
- If more than one type of match specified, will search in order: exact, starting, wildcard.
NOTE: These descriptions should match whatever is in your well_data file, even if there are typos. For example, if your well database contains a description "course gravvel", you will probably want that to register as something like "gravel." So, unless you plan to change the source database, you will want that incorrectly spelled description in your dictionary
File with lithology in one column, and a target code in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):
- Lithology: One row for each of the lithological classifications specified in your dictionary file(s)
- Target Code: some indicator (boolean or otherwise) as to whether each lithological class fits the target of interest
- There are likely many ways to do this. The advantage of using a boolean or 0/1 scale is that you can calculate/interpolate it along a numerical scale in later steps