Releases: kaburia/filter-stations
Releases · kaburia/filter-stations
Kieni weather Data and Aggregate Variables
Kieni Data
"""
Retrieves weather data from the Kieni API endpoint and returns it as a pandas DataFrame after processing.
Parameters:
-----------
- start_date (str, optional): The start date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns from the beginning of the data.
- end_date (str, optional): The end date for retrieving weather data in 'YYYY-MM-DD' format. Defaults to None if None returns to the end of the data.
- variable (str, optional): The weather variable to retrieve same as the weather shortcodes by TAHMO e.g., 'pr', 'ap', 'rh'
- method (str, optional): The aggregation method to apply to the data ('sum', 'mean', 'min', 'max' and custom functions). Defaults to 'sum'.
- freq (str, optional): The frequency for data aggregation (e.g., '1D' for daily, '1H' for hourly). Defaults to '1D'.
Returns:
-----------
- pandas.DataFrame: DataFrame containing the weather data for the specified parameters, with columns containing NaN values dropped.
Usage:
-----------
To retrieve daily rainfall data from January 1, 2024, to January 31, 2024:
```python
# Instantiate the Kieni class
api_key, api_secret = '', '' # Request DSAIL for the API key and secret
kieni = Kieni(api_key, api_secret)
kieni_weather_data = kieni.kieni_weather_data(start_date='2024-01-01', end_date='2024-01-31', variable='pr', freq='1D', method='sum')
```
To retrieve hourly temperature data from February 1, 2024, to February 7, 2024:
```python
kieni_weather_data = kieni.kieni_weather_data(start_date='2024-02-01', end_date='2024-02-07', variable='te', method='mean', freq='1H')
```
"""
Aggregate Variables
"""
Aggregates a pandas DataFrame of weather variables by applying a specified method across a given frequency.
Parameters:
-----------
- dataframe (pandas.DataFrame): DataFrame containing weather variable data.
- freq (str, optional): Frequency to aggregate the data by. Defaults to '1D'.
Examples include '1H' for hourly, '12H' for every 12 hours, '1D' for daily, '1W' for weekly, '1M' for monthly, etc.
- method (str or callable, optional): Method to use for aggregation. Defaults to 'sum'.
Acceptable string values are 'sum', 'mean', 'min', 'max'.
Alternatively, you can provide a custom aggregation function (callable).
Example of a custom method:
```python
def custom_median(x):
return np.nan if x.isnull().all() else x.median()
daily_median_data = aggregate_variables(dataframe, freq='1D', method=custom_median)
```
Returns:
-----------
- pandas.DataFrame: DataFrame containing aggregated weather variable data according to the specified frequency and method.
Usage:
-----------
Define the DataFrame containing the weather variable data:
```python
dataframe = ret.get_measurements('TA00001', '2020-01-01', '2020-01-31', ['pr']) # data comes in 5 minute interval
```
To aggregate data hourly:
```python
hourly_data = aggregate_variables(dataframe, freq='1H')
```
To aggregate data by 12 hours:
```python
half_day_data = aggregate_variables(dataframe, freq='12H')
```
To aggregate data by day:
```python
daily_data = aggregate_variables(dataframe, freq='1D')
```
To aggregate data by week:
```python
weekly_data = aggregate_variables(dataframe, freq='1W')
```
To aggregate data by month:
```python
monthly_data = aggregate_variables(dataframe, freq='1M')
```
To use a custom aggregation method:
```python
def custom_median(x):
return np.nan if x.isnull().all() else x.median()
daily_median_data = aggregate_variables(dataframe, freq='1D', method=custom_median)
```
"""
Water Level Class
The Water_level
class is used to retrieve water level data and coordinates of gauging stations
Example:
Getting the coordinates
wl = Water_level()
coords = wl.coordinates('muringato')
print(coords) # Output: (-0.406689, 36.96301)
Retrieving water level data
from filter_stations import water_level
wl = Water_level()
# get water level data for the muringato gauging station
muringato_data = wl.water_level_data('muringato')
# get water level data for the ewaso gauging station
ewaso_data = wl.water_level_data('ewaso')
Duplicate Coordinates for stations with multiple sensors
Retrieve longitudes,latitudes for a list of station_sensor names and duplicated for stations with multiple sensors.
Parameters:
-----------
- station_sensor (list): List of station_sensor names.
- normalize (bool): If True, normalize the coordinates using MinMaxScaler to the range (0,1).
Returns:
-----------
- pd.DataFrame: DataFrame containing longitude and latitude coordinates for each station_sensor.
Usage:
-----------
To retrieve coordinates
```python
start_date = '2023-01-01'
end_date = '2023-12-31'
country= 'KE'
# get the precipitation data for the stations
ke_pr = filt.filter_pr(start_date=start_date, end_date=end_date,
country='Kenya').set_index('Date')
# get the coordinates
xs = ret.get_coordinates(ke_pr.columns, normalize=True)
Retrieve Data from BigQuery
"""
Retrieves precipitation data from BigQuery based on specified parameters.
Parameters:
-----------
- start_date (str): Start date for data query.
- end_date (str): End date for data query.
- country (str): Country name for filtering stations.
- region (str): Region name for filtering stations.
- radius (str): Radius for stations within a specified region.
- multiple_stations (str): Comma-separated list of station IDs.
- station (str): Single station ID for data filtering.
Returns:
-----------
- pd.DataFrame: A Pandas DataFrame containing the filtered precipitation data.
Usage:
-----------
To get precipitation data for a specific date range:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
pr_data = fs.filter_pr(start_date, end_date)
```
To get precipitation data for a specific date range and country:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
country = 'Kenya'
pr_data = fs.filter_pr(start_date, end_date, country=country)
```
To get precipitation data for a specific date range and region:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
pr_data = fs.filter_pr(start_date, end_date, region=region)
```
To get precipitation data for a specific date range and region with a radius:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
region = 'Nairobi'
radius = 100
pr_data = fs.filter_pr(start_date, end_date, region=region, radius=radius)
```
To get precipitation data for a specific date range and multiple stations:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
multiple_stations = ['TA00001', 'TA00002', 'TA00003']
pr_data = fs.filter_pr(start_date, end_date, multiple_stations=multiple_stations)
```
To get precipitation data for a specific date range and a single station:
```python
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
start_date = '2021-01-01'
end_date = '2021-01-31'
station = 'TA00001'
pr_data = fs.filter_pr(start_date, end_date, station=station)
```
"""
filter-stations v0.5.3
Include the ground truth data and maps configuration path fixed
Ground truth data
Retrieves ground truth data for a specified date range.
Parameters:
-----------
- start_date (str): The start date for the report in 'yyyy-mm-dd' format.
- end_date (str, optional): The end date for the report in 'yyyy-mm-dd' format.
If not provided, only data for the start_date is returned.
Returns:
-----------
- pandas.DataFrame: A DataFrame containing ground truth data with columns 'startDate',
'station_sensor', 'description' and 'level'. The 'startDate' column is used as the index.
Raises:
-----------
- Exception: If there's an issue with the API request.
Usage:
-----------
To retrieve ground truth data for a specific date range:
```python
start_date = '2023-01-01'
end_date = '2023-01-31'
report_data = ret.ground_truth(start_date, end_date)
```
To retrieve ground truth data for a specific date:
```
start_date = '2023-01-01'
report_data = ret.ground_truth(start_date)
```
stations by region
Subsets weather stations by a specific geographical region and optionally plots them on a map with a scale bar.
Parameters:
-----------
- region (str): The name of the region to subset stations from (47 Kenyan counties).
- plot (bool, optional): If True, a map with stations and a scale bar is plotted. Default is False.
Returns:
-----------
- list or None: If plot is False, returns a list of station codes in the specified region. Otherwise, returns None.
Usage:
-----------
To get a list of station codes in the 'Nairobi' region without plotting:
```
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
station_list = fs.stations_region('Nairobi')
```
To subset stations in the 'Nairobi' region and display them on a map with a scale bar:
```
fs = Filter(api_key, api_secret, maps_key) # Create an instance of your class
fs.stations_region('Nairobi', plot=True)
```