Input Data Formats

NOTE: Some of the information in this page is also summarized here For each of the input files, the following formatting should be taken into consideration:

Well Data

Well Intervals with Descriptions: well_data A file or files that can be read by pandas.read_csv() should have the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

Unique well ID: Unique identifying name or number for each well (e.g., API Number)
Top: the depth (or elevation) at the top of the well interval in each row
Bottom: the depth (or elevation) at the bottom of the well interval in each row
Description: a geologic description of the well interval in each row

Individual well information and their location data: metadata This file (or another metadata file) should have the following data and columns:

Unique well ID (if another file, it will need to be joined with the other table using this ID)
X location: the x-coordinate for each well (often Longitude)
Y location: the y-coordinate for each well (often Latitude)
Z location: the z-coordinate for each well (i.e., the surface elevation)
Coordinate Reference System: The coordinate reference system of the coordinates above (does not need to be a separate column if all coordinates are in the same CRS)

Examples:

Examples of different file/column/field organization are shown below. The first line of each table is the field names, and the following lines have example data.

OPTION 1: Separate data file and metadata file

Data (i.e., the descriptions for each well interval) and metadata (the well location) are contained in separate files Well Data File

API_NUMBER	FORMATION	TOP	BOTTOM	...
120010001400	dolomite, sandy, silty, light gray, fine	0	25	...
120010001400	limestone, cherty, fine/coarse, medium gray	25	71	...
120010001500	clay, sandy, & silty	0	60	...
120010001500	shale	60	80	...
...	...	...	...	...

Description of each column in well data file:

API_NUMBER: unique well identifier
FORMATION: description of the geology at each interval
TOP: Depth at top of well interval
BOTTOM: Depth at bottom of well interval
...: other potential columns (as desired, not needed)

Well Metadata File

API_NUMBER	LATITUDE	LONGITUDE	ELEVATION_FT	...
120010000400	40.051772	-90.995909	731	...
120010001400	40.021306	-91.086866	760	...
120010001500	40.109237	-91.40366	493	...
...	...	...	...	...

Description of each column in well metadata file:

API_NUMBER: unique well identifier
LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
ELEVATION_FT: Z coordinate of well (feet above sea level)
...: other potential columns (as desired, not needed)

OPTION 2: Single data file for both well data and metadata

Well Data File

API_NUMBER	LATITUDE	LONGITUDE	SURF_ELEV_FT	FORMATION	TOP	BOTTOM	...
120010001400	40.051772	-90.995909	731	dolomite, sandy, silty, light gray, fine	0	25	...
120010001400	40.051772	-90.995909	731	limestone, cherty, fine/coarse, medium gray	25	71	...
120010001500	40.109237	-91.40366	493	clay, sandy, & silty	0	60	...
120010001500	40.109237	-91.40366	493	shale	60	80	...
...	...	...	...	...	...	...	...

Description of each column in well data file:

API_NUMBER: unique well identifier
LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
SURF_ELEV_FT : Z coordinate of well (feet above sea level)
FORMATION: description of the geology at each interval
TOP: Depth at top of well interval
BOTTOM: Depth at bottom of well interval
...: other potential columns (as desired, not needed)

Raster/Grid Files

The following rasters files/grids are needed for this package:

Surface Elevation: The surface elevation in the area of interest
Bedrock Elevation: The bedrock elevation in the area of interest
Model Grid: A grid (can be empty) with the CRS, node/cell size, and node/cell locations that the other data will be aligned with.
- This can be one of the other rasters above, or it can be its own grid/raster, or the parameters can be specified and the grid can be created
- This should align with a grid being used in MODFLOW or other modeling software, if that is the purpose of using this package

Raster/grid formats:

Data is read in using the rioxarray.open_rasterio() function, which uses rasterio.open() to open a raster file as a xarray.DataSet or xarray.DataArray (or a list of xarray.Datasets, see rasterio.open()). rasterio in turn uses GDAL, which claims to support the file formats listed here.

Study Area Files

The study area is read in using the geopandas python package. Specifically, the geopandas.read_file() function is used to read geospatial data into a geopandas.GeoDataFrame. The geopandas.read_file() function uses fiona.open() to open the geospatial files and insert their geometry into a geopandas.GeoDataFrame. Fiona, in turn, also uses GDAL drivers, and supports the file formats listed here.

This will be converted to the Coordinate Reference System (CRS) of your choosing using geopandas.GeoDataFrame.to_crs(), which uses any CRS format accepted by pyproj.

import geopandas as gpd
study_area = gpd.read_file(study_area_filepath)
studyAreaIN.to_crs(study_area_crs, inplace=True)

Dictionary Files

File(s) with descriptions in one column, and a lithological classification in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

Description: geological descriptions that match all or parts of descriptions in the main well_data file.
- These descriptions may be used for either exact matches with the well_data descriptions, a match of an initial substring, or a match of a substring anywhere in the well_data description
Lithology: the lithology of the geologic description to be matched, narrowed down to a handful of useful (hydrogeologically or otherwise) categories
- For example, a description of "blue dolomite" could be Bedrock; a description of "coarse sand w/gravel" could be "Sand and Gravel"
Definition Type: a column containing either 'exact', 'start', 'substring'/'wildcard', or 'all' specifying what type of match should be attempted [currently not implemented]
- If more than one type of match specified, will search in order: exact, starting, wildcard.

NOTE: These descriptions should match whatever is in your well_data file, even if there are typos. For example, if your well database contains a description "course gravvel", you will probably want that to register as something like "gravel." So, unless you plan to change the source database, you will want that incorrectly spelled description in your dictionary

Lithology File

File with lithology in one column, and a target code in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

Lithology: One row for each of the lithological classifications specified in your dictionary file(s)
Target Code: some indicator (boolean or otherwise) as to whether each lithological class fits the target of interest
- There are likely many ways to do this. The advantage of using a boolean or 0/1 scale is that you can calculate/interpolate it along a numerical scale in later steps

011 04_GEOLO_FullColor_RGB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly