Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset calculation and correction for irradiance data #170

Open
kperrynrel opened this issue Nov 22, 2022 · 18 comments · May be fixed by #173
Open

Offset calculation and correction for irradiance data #170

kperrynrel opened this issue Nov 22, 2022 · 18 comments · May be fixed by #173
Assignees

Comments

@kperrynrel
Copy link
Member

Hey all, Josh Peterson has a method for calculating nighttime offsets in irradiance data and using these offsets to correct irradiance data. He has published on it here: This code is adapted from the following publication: http://solardat.uoregon.edu/download/Papers/Structure_of_a_comprehensive_solar_radiation_dataset_ASES_2017_Final.pdf

He requested adding this functionality to the package. I went ahead and adapted the code he sent over, which is as follows (it's still rough and needs work):


import pandas as pd
import numpy as np
from datetime import timedelta


def _filter_nighttime_data(irradiance_series, sza_series,
                           lower_limit,
                           upper_limit,
                           sza_night_limit):
    """
    Filter irradiance data to nighttime periods only.
    """
    # Find the times when the sun is above the SZA limit, representing
    # daytime minutes
    mask = (sza_night_limit > sza_series)
    # Set daytime values to np.nan
    irradiance_series[mask] = np.nan
    # Find the times when the data is below the lower data limit
    # or above the upper data limit and filter these periods out
    mask = ((irradiance_series < lower_limit) |
            (irradiance_series > upper_limit))
    # Set outlier periods to np.nan
    irradiance_series[mask] = np.nan
    return irradiance_series


def _groupby_nighttime_data(irradiance_series):
    """
    Group the irradiance nighttime data by day.
    """
    # Group the data together by date
    irradiance_dates = pd.Series(irradiance_series.index.date,
                                 index=irradiance_series.index)
    # Include midnight values as the previous date, not the current date.
    # This is at Josh's request!
    irradiance_dates.loc[(irradiance_dates.index.minute == 0) &
                         (irradiance_dates.index.hour == 0)] = \
        irradiance_dates - timedelta(days=1)
    return irradiance_dates


def calculate_nighttime_offset_uncertainty(irradiance_series, sza_series,
                                           lower_limit=-10,
                                           upper_limit=10,
                                           sza_night_limit=108):
    """
    Computes the standard deviation of nighttime data, to act as the
    uncertainty for each night period. Meant to act in unison with the
    calculate_nighttime_offset() function.
    
    For best results, filter out outliers before running this function.

    Parameters
    ----------
    irradiance_series : Pandas Series
        Pandas Series of irradiance data with datetime index
    sza_series : Pandas series
        Pandas series of zenith angles with datetime index.
        Used to determine when nighttime periods occur
    lower_limit : float, optional
        Outlier detection lower limit. The default is -10.
    upper_limit : float, optional
        Outlier detection upper limit. The default is 10.
    sza_night_limit : float, optional
        Solar zenith angle boundary limit. The default
        is 108.
    offset_type: String.
        Can be 'median' or 'mean'. If median is selected, then the median
        of nighttime values is computed as the offset. Likewise, if mean
        is selected, the mean of nighttime values is computed as the offset.

    Returns
    -------
    uncertainty_irradiance : Pandas series
        Pandas datetime series of standard deviation of daily irradiance
        nighttime values, to act as a proxy for uncertainty in the nighttime
        offset calculations
    """
    # Filter to nighttime periods only
    irradiance_series = _filter_nighttime_data(irradiance_series,
                                               sza_series,
                                               lower_limit,
                                               upper_limit,
                                               sza_night_limit)
    # Group the irradiance data by daily intervals, from [midnight, midnight)
    # each day
    irradiance_dates = _groupby_nighttime_data(irradiance_series)
    # Calculate the standard deviation of the nightly data, which is equal to
    # the uncertainty
    uncertainty_irradiance = irradiance_series.groupby(
        irradiance_dates).transform('std')
    return uncertainty_irradiance


def calculate_nighttime_offset(irradiance_series, sza_series,
                               lower_limit=-10,
                               upper_limit=10,
                               sza_night_limit=108,
                               offset_type='median'):
    """
    Determines the nighttime offset value of a data set (can be median or
    mean). Computes the offset value for each night.
    If a nighttime value can not be found for a particular date,
    the median offset value for the entire data set is used.

    For best results, filter out outliers before running this function.

    Parameters
    ----------
    irradiance_series : Pandas Series
        Pandas Series of irradiance data with datetime index
    sza_series : Pandas series
        Pandas series of zenith angles with datetime index.
        Used to determine when nighttime periods occur
    lower_limit : float, optional
        Outlier detection lower limit. The default is -10.
    upper_limit : float, optional
        Outlier detection upper limit. The default is 10.
    sza_night_limit : float, optional
        Solar zenith angle boundary limit. The default
        is 108.
    offset_type: String.
        Can be 'median' or 'mean'. If median is selected, then the median
        of nighttime values is computed as the offset. Likewise, if mean
        is selected, the mean of nighttime values is computed as the offset.

    Returns
    -------
    offset_irradiance : Pandas series
        Pandas datetime series of offset nighttime values for the data set
    """
    # Throw an error if the offset_type isn't 'median' or 'mean'
    if (offset_type not in ['median', 'mean']):
        raise ValueError(
            "Please check offset_type variable. Must be 'mean' or 'median'."
        )
    # Filter to nighttime periods only
    irradiance_series = _filter_nighttime_data(irradiance_series,
                                               sza_series,
                                               lower_limit,
                                               upper_limit,
                                               sza_night_limit)
    # Group the irradiance data by daily intervals, from [midnight, midnight)
    # each day
    irradiance_dates = _groupby_nighttime_data(irradiance_series)
    offset_irradiance = irradiance_series.groupby(
        irradiance_dates).transform(offset_type)
    # Fill any NaN periods, representing days with no adequate nighttime data,
    # with the median value associated with the time series
    offset_irradiance = offset_irradiance.fillna(offset_irradiance.median())
    return offset_irradiance


def correct_offset(irradiance_series, sza_series,
                   lower_limit=-10,
                   upper_limit=10,
                   sza_night_limit=108,
                   offset_type='median'):
    """

    Parameters
    ----------
    irradiance_series : Pandas Series
        Pandas Series of irradiance data with datetime index
    sza_series : Pandas series
        Pandas series of zenith angles with datetime index.
        Used to determine when nighttime periods occur
    lower_limit : float, optional
        Outlier detection lower limit. The default is -10.
    upper_limit : float, optional
        Outlier detection upper limit. The default is 10.
    sza_night_limit : float, optional
        Solar zenith angle boundary limit. The default
        is 108.
    offset_type: String
        Can be 'median' or 'mean'. If median is selected, then the median
        of nighttime values is computed as the offset. Likewise, if mean
        is selected, the mean of nighttime values is computed as the offset.

    Returns
    -------
    irradiance_series_offset_corrected : Pandas series
        Pandas datetime series of irradiance data, corrected using the daily
        nighttime offset values.

    Notes
    -----
    This method is adapted from the conference publication "Structure of a
    comprehensive solar radiation dataset" by J. Peterson and F. Vignola,
    in particular the methods for calculating nighttime offsets and subtracting
    them from solar irradiance data. The sza_night_limit value of 108 is
    directly taken from this publication.

    References
    ----------
    .. [1] J. Peterson and F. Vignola. Structure of a comprehensive solar
        radiation dataset, ASES National Solar Conference 2017,
        Denver, Colorado, 9-12 October 2017.
        http://solardat.uoregon.edu/download/Papers/Structure_of_a_comprehensive_solar_radiation_dataset_ASES_2017_Final.pdf
    """
    # Get the nightttime medians for the irradiance time series
    nighttime_offsets = calculate_nighttime_offset(irradiance_series,
                                                   sza_series,
                                                   lower_limit,
                                                   upper_limit,
                                                   sza_night_limit)
    # Subtract the nighttime offset from the time series
    # TODO: Josh tell me if this logic is right
    irradiance_series_offset_corrected = irradiance_series - nighttime_offsets
    return irradiance_series_offset_corrected


# Load a sample data set
input_file = 'HEO_01_2019-02.csv'
# Load the csv file
df = pd.read_csv(input_file, index_col=0, parse_dates=True)

# Test the functions
irradiance_offset_corrected = correct_offset(
    irradiance_series=df['GHI_(PSP)'],
    sza_series=df['SZA'],
    lower_limit=-10,
    upper_limit=10,
    sza_night_limit=108)

# calculate daily uncertainty in offset calculations
irradiance_offset_uncertainty = calculate_nighttime_offset_uncertainty(
    irradiance_series=df['GHI_(PSP)'], sza_series=df['SZA'])

Essentially, the above set of functions calculates the daily nighttime offset for irradiance data (can be median or mean, settable), and subtracts the offset from the irradiance data to correct it.

How do we feel about adding it to the irradiance module? @PetersonUOregon let me know if I missed anything here; I did take a stab at the actual offset correction in the correct_offset() function.

@kperrynrel kperrynrel self-assigned this Nov 22, 2022
@PetersonUOregon
Copy link

The equation to compute the offset corrected irradiance looks good to me.

irradiance_series_offset_corrected = irradiance_series - nighttime_offsets
where
irradiance_series_offset_corrected is the adjusted irradiance. The values at night should average out to zero.
irradiance_series is the unadjusted irradiance data. Irradiance data direct from the sensor.
nighttime_offsets is the median nighttime value. This is often a negative number on the order of a -1 W/m^2.

The effect of subtracting negative number will increase the adjusted irradiance by a Watt.

@kandersolar
Copy link
Member

The reference mentions that the thermal offset at night can be different from the daytime offset, depending on sky condition. This other paper I found goes so far as to say that the difference is large enough to make the nighttime offset unsuitable for correcting daytime data. Is the idea here that using the nighttime offset as a partial correction to daytime data is better than nothing?

A couple comments about the code:

  • _filter_nighttime_data currently modifies its inputs, which is generally to be avoided. Better to create modified copies of the inputs than modify the inputs themselves.
  • I think the group by day operation assumes input timestamps to be in local time; if so, that should be documented (and perhaps enforced in the code somehow?)

@cwhanse
Copy link
Member

cwhanse commented Nov 23, 2022

thermal offset at night can be different from the daytime offset

This is also the question in my mind. I have no objection in principle to adding this function, but the documentation needs to inform the user when it is appropriate to apply the nighttime adjustment.

@PetersonUOregon
Copy link

PetersonUOregon commented Nov 23, 2022

@kperrynrel When you made the adjustment to the midnight value, how will this handle the data if it is not in one minute intervals. For example, how will second level data be handled with the adjustment you have in place? I suspect that it will change the entire minute of data in this case.

@PetersonUOregon
Copy link

@kanderso-nrel Yes that is the thinking here. That a partial correction is better than nothing.

You are correct the timestamps are assumed to be in local time.

@kperrynrel
Copy link
Member Author

kperrynrel commented Nov 28, 2022

@PetersonUOregon I reran the above logic with 1-second resampled data and you are correct. The entire midnight minute is included in the previous day when calculating the average offset:
image
I can filter down to second, but I think filtering further may be overkill given that most data is not sub-second. I don't want to make the logic too complicated as well so as to not be understood.

@cwhanse
Copy link
Member

cwhanse commented Nov 28, 2022

This business of aligning data to the timestamp (it appears that @PetersonUOregon prefers left-labeled intervals, others prefer right-labeled) suggests that maybe the implementation could be done using Series.resample(..., label=..., closed=...)' label tells pandas how the data relate to each timestamp; closed tells the resampler whether to use "<" or "<=" on the right end (similar for left end).

@PetersonUOregon
Copy link

@cwhanse

I agree different users may have different preferences as to how they timestamp their data. I prefer timestamps that occur at the end of the time interval. My understanding is this would be right-labeled.

I think the resample function you mentioned could work. All the data would be timestamped at the start of the interval (instead of the end). In doing this, the time stamps would all be shifted one time interval to the left.

For one minute data this would look like:
00:01 (right labeled) -->00:00 (left labeled)
00:02 (right labeled) -->00:01 (left labeled)
...
24:00 (right labeled) -->23:59 (left labeled)

In doing this the left labeled time stamps for a day would all be correctly selected.

I suggest that a user input be added to the function that gets how the data is labeled. Then the function would convert the right and center labeled data sets to left labeled and the function would then proceed as we have it.

@AdamRJensen
Copy link
Member

AdamRJensen commented Nov 28, 2022

Just as a comment, Pandas use the term "closed" for the bin edge (e.g., in the resample function).

closed {‘right’, ‘left’}, default None
    Which side of bin interval is closed.

Calling the parameter labeled or label might be better (the pvlib get_cams function calls it label).

In general, I think this would be a great function, even if it is not always the right choice to apply it!

On another note, would it be an option to do the nighttime grouping based on the solar zenith angle instead of the time index (I think this would solve a lot of issues in regards to time-zone localization)?

Update: I create a gist demonstrating how nighttime offset could be determined based on the zenith angle instead of the time.

@AdamRJensen
Copy link
Member

AdamRJensen commented Nov 29, 2022

I came upon another recently published study that uses a very similar method:

For this study, corrections have been made to both GHI and DIF after estimating their respective thermal offset by using only their own readings. As suggested before [6], the offset for a specific day is estimated using (typically negative) irradiance values observed after midnight during the previous night until SZA reaches 100° (i.e., sometime before sunrise) and, in a symmetrical way, after sunset until midnight of the following night. The average of the values measured during these two periods is calculated and its absolute value is added to each 1-min diurnal GHI and DIF observation. This correction does not take into account possible variations in the conditions between day-vs.-night moments but represents an improvement compared to if it were not considered.

In general, I think it is rather common to calculate a nighttime offset, and certainly would be worth having available.

It might be a good idea to have the function return the corrected irradiance and not the nighttime offset. This way we avoid users getting the sign wrong (i.e., is the offset negative and should be subtracted, or is it positive and should be added).

@kperrynrel
Copy link
Member Author

Thanks for posting this @AdamRJensen--the method you describe is almost identical to @PetersonUOregon's method, so it's encouraging that multiple people are approaching the problem similarly. I do like your idea of grouping on zenith angles instead of datetimes, I'll have to dig into that gist and see if I can adapt some of it in (perhaps adding a passed parameter for grouping?). I do have a function for calculating the corrected irradiance (correct_offset()), but perhaps I should make the naming more explicit here.

@AdamRJensen
Copy link
Member

AdamRJensen commented Nov 29, 2022

@kperrynrel I'll happily draft up a PR if that would be of help? (I have pasted my code below, which all is contained in one neat function - I see that you have three different ones).

The following code snippet should be able to do both the zenith and time based method:

def offset_correction(irradiance, zenith, sza_night_limit=100,
                      midnight_method='zenith', aggregation_method='median'):
    """
    Apply nighttime correction to irradiance time series.

    Parameters
    ----------
    irradiance : pd.Series
        Pandas Series of irradiance data.
    zenith : pd.Series
        Pandas Series of zenith angles corresponding to the irradiance time
        series.
    sza_night_limit : float, optional
        Solar zenith angle boundary limit. The default is 100.
    midnight_method : {'time', 'zenith'}
        Method for determining midnight.
    aggregation_method : {'median', 'mean'}, optional
        Method for calculating nighttime offset. The default is 'median'.

    Returns
    -------
    corrected_irradiance : pd.Series
        Pandas Series of nighttime corrected irradiance time series.
    """
    # Create boolean series where nighttime is one (calculated based on the
    # zenith angle)
    zenith_midnight = (zenith.diff().apply(np.sign).diff() < 0)

    time_midnight = pd.Series(data=irradiance.index.time == dt.time(0),
                              index=irradiance.index)

    if midnight_method == 'zenith':
        midnight = zenith_midnight
    else:
        midnight = time_midnight

    # Assign unique number to each day
    day_number = midnight.cumsum()
    # Create Pandas Series only containing nighttime irradiance
    nighttime_irradiance = irradiance.copy()
    nighttime_irradiance  = nighttime_irradiance [zenith < sza_night_limit]
    # Calculate nighttime offset
    nighttime_offset = nighttime_irradiance.groupby(day_number).transform(aggregation_method)
    # Calculate corrected irradiance time series
    corrected_irradiance = irradiance - nighttime_offset
    return corrected_irradiance

@kperrynrel
Copy link
Member Author

@AdamRJensen I'm happy to go the route you're suggesting here as it simplifies the code considerably. Since I'm just directly adapting @PetersonUOregon's logic here though, I want to be respectful of his input. To better inform our decision, I did run some calculations on how Josh's function compares to Adam's, to see if changing the logic has a large effect on the overall correction. Here are the results:
image
I also made a histogram of the distribution of difference (Josh's offset-Adam's offset):
image

Overall, the differences are fairly minimal. I do want to point out that Adam's currently doesn't do the daytime correction, but Josh's does. Maybe we'd want to include a parameter called correct_daytime or something (for either function) to correct daytime values if desired, even though there may be issues with the daytime offset as mentioned above (but, as Josh mentioned, a partial correction may be better than nothing).

@cwhanse
Copy link
Member

cwhanse commented Nov 29, 2022

If the two methods (@PetersonUOregon and in the reference @AdamRJensen found) give very similar results, they could both be included in the documentation.

@AdamRJensen AdamRJensen linked a pull request Nov 30, 2022 that will close this issue
8 tasks
@AdamRJensen
Copy link
Member

@kperrynrel In regards to your comment

Overall, the differences are fairly minimal. I do want to point out that Adam's currently doesn't do the daytime correction

I think there may be some confusion - the purpose of my function is to do exactly the same as Josh/yours - the only difference being how midnight is determined. Both functions correct the entire day by adding the nighttime offset.

The PR that I have opened proposes one function that is capable of both Josh's method with midnight as 00:00 and my method (midnight when the zenith angle is at its max). I've attempted to add a parameter called label as previously discussed, such that the user can choose between left/right labeled data.

@kperrynrel
Copy link
Member Author

kperrynrel commented Nov 30, 2022

@AdamRJensen apologies--I just noticed when I ran your function on the HEO data that daytime values weren't filling in and thought it was intentional:
image
Do you want me to send you the HEO data to test on?

@AdamRJensen
Copy link
Member

@kperrynrel That is simply due to poor coding on my part :) I have updated the PR and the example above. Please do share the data though.

@AdamRJensen
Copy link
Member

AdamRJensen commented Nov 30, 2022

When I test my updated function with the code written by @kperrynrel above, then I get exactly the same results. This of course requires that the same sza_night_limit is used (some use 108 others 100) and that my function has label='right'. Also, do note that my function does currently NOT implement the lower and upper limits. I also did not recreate the calculate_nighttime_offset_uncertainty as I have my doubts about its usefulness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants