Skip to content

Input Data Formats

RJbalikian edited this page Jun 29, 2023 · 15 revisions

NOTE: Some of the information in this page is also summarized here For each of the input files, the following formatting should be taken into consideration:

Well Data

Well Intervals with Descriptions: well_data A file or files that can be read by pandas.read_csv() should have the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

  • Unique well ID: Unique identifying name or number for each well (e.g., API Number)
  • Top: the depth (or elevation) at the top of the well interval in each row
  • Bottom: the depth (or elevation) at the bottom of the well interval in each row
  • Description: a geologic description of the well interval in each row

Individual well information and their location data: metadata This file (or another metadata file) should have the following data and columns:

  • Unique well ID (if another file, it will need to be joined with the other table using this ID)
  • X location: the x-coordinate for each well (often Longitude)
  • Y location: the y-coordinate for each well (often Latitude)
  • Z location: the z-coordinate for each well (i.e., the surface elevation)
  • Coordinate Reference System: The coordinate reference system of the coordinates above (does not need to be a separate column if all coordinates are in the same CRS)

Examples:

Examples of different file/column/field organization are shown below. The first line of each table is the field names, and the following lines have example data.

OPTION 1: Separate data file and metadata file


Data (i.e., the descriptions for each well interval) and metadata (the well location) are contained in separate files Well Data File

API_NUMBER FORMATION TOP BOTTOM ...
120010001400 dolomite, sandy, silty, light gray, fine 0 25 ...
120010001400 limestone, cherty, fine/coarse, medium gray 25 71 ...
120010001500 clay, sandy, & silty 0 60 ...
120010001500 shale 60 80 ...
... ... ... ... ...

Description of each column in well data file:

  • API_NUMBER: unique well identifier
  • FORMATION: description of the geology at each interval
  • TOP: Depth at top of well interval
  • BOTTOM: Depth at bottom of well interval
  • ...: other potential columns (as desired, not needed)

Well Metadata File

API_NUMBER LATITUDE LONGITUDE ELEVATION_FT ...
120010000400 40.051772 -90.995909 731 ...
120010001400 40.021306 -91.086866 760 ...
120010001500 40.109237 -91.40366 493 ...
... ... ... ... ...

Description of each column in well metadata file:

  • API_NUMBER: unique well identifier
  • LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
  • LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
  • ELEVATION_FT: Z coordinate of well (feet above sea level)
  • ...: other potential columns (as desired, not needed)

OPTION 2: Single data file for both well data and metadata


Well Data File

API_NUMBER LATITUDE LONGITUDE SURF_ELEV_FT FORMATION TOP BOTTOM ...
120010001400 40.051772 -90.995909 731 dolomite, sandy, silty, light gray, fine 0 25 ...
120010001400 40.051772 -90.995909 731 limestone, cherty, fine/coarse, medium gray 25 71 ...
120010001500 40.109237 -91.40366 493 clay, sandy, & silty 0 60 ...
120010001500 40.109237 -91.40366 493 shale 60 80 ...
... ... ... ... ... ... ... ...

Description of each column in well data file:

  • API_NUMBER: unique well identifier
  • LATITUDE: y coordinate of well (all rows are known to be EPSG:4269)
  • LONGITUDE: X coordinate of well (all rows are known to be EPSG:4269)
  • SURF_ELEV_FT : Z coordinate of well (feet above sea level)
  • FORMATION: description of the geology at each interval
  • TOP: Depth at top of well interval
  • BOTTOM: Depth at bottom of well interval
  • ...: other potential columns (as desired, not needed)

Raster/Grid Files

The following rasters files/grids are needed for this package:

  • Surface Elevation: The surface elevation in the area of interest
  • Bedrock Elevation: The bedrock elevation in the area of interest
  • Model Grid: A grid (can be empty) with the CRS, node/cell size, and node/cell locations that the other data will be aligned with.
    • This can be one of the other rasters above, or it can be its own grid/raster, or the parameters can be specified and the grid can be created
    • This should align with a grid being used in MODFLOW or other modeling software, if that is the purpose of using this package

Raster/grid formats:

Data is read in using the rioxarray.open_rasterio() function, which uses rasterio.open() to open a raster file as a xarray.DataSet or xarray.DataArray (or a list of xarray.Datasets, see rasterio.open()). rasterio in turn uses GDAL, which claims to support the file formats listed here.

Study Area Files

The study area is read in using the geopandas python package. Specifically, the geopandas.read_file() function is used to read geospatial data into a geopandas.GeoDataFrame. The geopandas.read_file() function uses fiona.open() to open the geospatial files and insert their geometry into a geopandas.GeoDataFrame. Fiona, in turn, also uses GDAL drivers, and supports the file formats listed here.

This will be converted to the Coordinate Reference System (CRS) of your choosing using geopandas.GeoDataFrame.to_crs(), which uses any CRS format accepted by pyproj.

import geopandas as gpd
study_area = gpd.read_file(study_area_filepath)
studyAreaIN.to_crs(study_area_crs, inplace=True)

Dictionary Files

File(s) with descriptions in one column, and a lithological classification in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

  • Description: geological descriptions that match all or parts of descriptions in the main well_data file.
    • These descriptions may be used for either exact matches with the well_data descriptions, a match of an initial substring, or a match of a substring anywhere in the well_data description
  • Lithology: the lithology of the geologic description to be matched, narrowed down to a handful of useful (hydrogeologically or otherwise) categories
    • For example, a description of "blue dolomite" could be Bedrock; a description of "coarse sand w/gravel" could be "Sand and Gravel"
  • Definition Type: a column containing either 'exact', 'start', 'substring'/'wildcard', or 'all' specifying what type of match should be attempted [currently not implemented]
    • If more than one type of match specified, will search in order: exact, starting, wildcard.

NOTE: These descriptions should match whatever is in your well_data file, even if there are typos. For example, if your well database contains a description "course gravvel", you will probably want that to register as something like "gravel." So, unless you plan to change the source database, you will want that incorrectly spelled description in your dictionary

Lithology File

File with lithology in one column, and a target code in another column A file or files that can be read by pandas.read_csv() should have at least the following data and columns (the columns can be named whatever you like, but the column names will need to be specified in the appropriate function calls):

  • Lithology: One row for each of the lithological classifications specified in your dictionary file(s)
  • Target Code: some indicator (boolean or otherwise) as to whether each lithological class fits the target of interest
    • There are likely many ways to do this. The advantage of using a boolean or 0/1 scale is that you can calculate/interpolate it along a numerical scale in later steps