Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

182 read logger files from solinst levelogger as obs #184

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions hydropandas/io/solinst.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
import logging
import os
import zipfile

import numpy as np
import pandas as pd
from pyproj import Transformer

logger = logging.getLogger(__name__)


def read_solinst_file(
path,
transform_coords=True,
):
"""Read Solinst logger file (XLE)

Parameters
----------
path : str
path to Solinst file (.xle)
transform_coords : boolean
convert coordinates from WGS84 to RD

Returns
-------
df : pandas.DataFrame
DataFrame containing file content
meta : dict, optional
dict containing meta
"""

# open file
name = os.path.splitext(os.path.basename(path))[0]
if path.endswith(".xle"):
f = path
elif path.endswith(".zip"):
zf = zipfile.ZipFile(path)
f = zf.open("{}.xle".format(name))
else:
raise NotImplementedError(
"File type '{}' not supported!".format(os.path.splitext(path)[-1])
)

logger.info("reading -> {}".format(f))

# read channel 1 data header
df_ch1_data_header = pd.read_xml(path, xpath="/Body_xle/Ch1_data_header")
series_ch1_data_header = df_ch1_data_header.T.iloc[:, 0]
colname_ch1 = (
series_ch1_data_header.Identification.lower()
+ "_"
+ series_ch1_data_header.Unit.lower()
)

# read channel 2 data header
df_ch2_data_header = pd.read_xml(path, xpath="/Body_xle/Ch2_data_header")
series_ch2_data_header = df_ch2_data_header.T.iloc[:, 0]
colname_ch2 = (
series_ch2_data_header.Identification.lower()
+ "_"
+ series_ch2_data_header.Unit.lower()
)

# read observations
df = pd.read_xml(
path,
xpath="/Body_xle/Data/Log",
)
df.rename(columns={"ch1": colname_ch1, "ch2": colname_ch2}, inplace=True)
if "ms" in df.columns:
df["date_time"] = pd.to_datetime(
df["Date"] + " " + df["Time"]
) + pd.to_timedelta(df["ms"], unit="ms")
drop_cols = ["id", "Date", "Time", "ms"]
else:
df["date_time"] = pd.to_datetime(df["Date"] + " " + df["Time"])
drop_cols = ["id", "Date", "Time"]
df.set_index("date_time", inplace=True)

df.drop(columns=drop_cols, inplace=True)

# parse meta into dict, per group in XLE file
meta = {}
# read file info
df_file_info = pd.read_xml(path, xpath="/Body_xle/File_info")
dict_file_info = df_file_info.T.iloc[:, 0].to_dict()

# read instrument info
df_instrument_info = pd.read_xml(path, xpath="/Body_xle/Instrument_info")
dict_instrument_info = df_instrument_info.T.iloc[:, 0].to_dict()

# read instrument info
df_instrument_info_data_header = pd.read_xml(
path, xpath="/Body_xle/Instrument_info_data_header"
)
dict_instrument_info_data_header = df_instrument_info_data_header.T.iloc[
:, 0
].to_dict()

meta = {
**dict_file_info,
**dict_instrument_info,
**dict_instrument_info_data_header,
}

if transform_coords:
# lat and lon has 0,000 when location is not supplied
# replace comma with point first
if isinstance(meta["Latitude"], str):
meta["Latitude"] = float(meta["Latitude"].replace(",", "."))
if isinstance(meta["Longtitude"], str):
meta["Longtitude"] = float(meta["Longtitude"].replace(",", "."))
if (meta["Latitude"] != 0) & (meta["Longtitude"] != 0):
# NOTE: check EPSG:28992 definition and whether location is showing up in
# the right spot.
transformer = Transformer.from_crs("epsg:4326", "epsg:28992")
dbrakenhoff marked this conversation as resolved.
Show resolved Hide resolved
x, y = transformer.transform(meta["Latitude"], meta["Longtitude"])
x = np.round(x, 2)
y = np.round(y, 2)
else:
logger.warning("file has no location included")
x = None
y = None
else:
x = meta["Latitude"]
y = meta["Longtitude"]
meta["x"] = x
meta["y"] = y
meta["filename"] = f
meta["source"] = meta["Created_by"]
meta["name"] = name
meta["monitoring_well"] = name
meta["unit"] = series_ch1_data_header.Unit.lower()
meta["metadata_available"] = True

return df, meta
41 changes: 41 additions & 0 deletions hydropandas/observation.py
Original file line number Diff line number Diff line change
Expand Up @@ -777,6 +777,47 @@ def from_pastastore(cls, pstore, libname, name, metadata_mapping=None):

return cls(data, meta=metadata, **kwargs)

@classmethod
def from_solinst(
cls,
path,
transform_coords=True,
screen_bottom=None,
screen_top=None,
ground_level=None,
tube_nr=None,
tube_top=None,
):
"""Read data from Solinst xle file.

Parameters
----------
path : str
path to file (file can zip or xle)

"""
from .io import solinst

df, meta = solinst.read_solinst_file(path, transform_coords=transform_coords)

return cls(
df,
meta=meta,
name=meta.pop("name"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add full metadata dict first, then pop the items?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this working order from an other read package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably also be changed there then, I think :). But i modified the order, so it should be fine now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to change the order Davíd? I used this order because I don't want the metadata that is added as attributes in the 'meta' dictionary. But now that I think about it I think the order does not even matter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you would want the metadata to be available both in the Obs object and as columns in the ObsCollection?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you say it doesn't matter? Does the dictionary get emptied regardless of the order you provide the keyword arguments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure we have the same idea of what is happening: When we use x = meta.pop('x') we remove x from the meta dict and add x as an attribute to the obs object. Then we use meta=meta at the end to add the meta dict as the attribute meta to the Obs object. When we do it in this order the meta attribute of an Obs object does not have any keys that are also attributes of that Obs object.

I like this approach because I don't like to store the same information in multiple places. If we would change the order we might end up with an Obs object that has a meta attribute which is a dictionary with a key x and an attribute x which have the same value.

But then I am not sure what happens if you use a dictionary as the first function argument. I think a reference to that dictionary is passed to the function and not a copy so if you modify the dictionary in other function arguments (by popping items) I think it will have an influence the first argument. This should be easy to test.

I hope this makes it more clear. For now I think we can leave it as is and I can pick this up later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I somehow thought it would be nice to have both metadata available as a single dictionary containing everything, and certain attributes for easy quick access. But I get that storing the same data in multiple locations is a bit redundant and maybe more confusing. So all good to keep things as they are, and maybe add a comment why the data is being popped :). For future me, or future someone else.

x=meta.pop("x"),
y=meta.pop("y"),
filename=meta.pop("filename"),
source=meta.pop("source"),
unit=meta.pop("unit"),
screen_bottom=screen_bottom,
screen_top=screen_top,
ground_level=ground_level,
metadata_available=meta.pop("metadata_available"),
monitoring_well=meta.pop("monitoring_well"),
tube_nr=tube_nr,
tube_top=tube_top,
)


class WaterQualityObs(Obs):
"""class for water quality ((grond)watersamenstelling) point
Expand Down
Loading