Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spectral_factor_firstsolar changes user input data #2086

Open
RDaxini opened this issue Jun 14, 2024 · 13 comments · May be fixed by #2100
Open

spectral_factor_firstsolar changes user input data #2086

RDaxini opened this issue Jun 14, 2024 · 13 comments · May be fixed by #2100
Labels
enhancement GSoC Contributions related to Google Summer of Code.
Milestone

Comments

@RDaxini
Copy link
Contributor

RDaxini commented Jun 14, 2024

Is your feature request related to a problem? Please describe.
pvlib.spectrum.spectral_factor_firstsolar contains data screens for precipitable_water and airmass_absolute that change input data. Max/min values can be set for precipitable_water by the user but not airmass_absolute.

I see two issues:

  1. Changing data in any circumstance is a serious question, but in this case specifically I do not think it is pvlib's job to change user data.
  2. User control and the function's flexibility is impeded by the absence of module parameters to control min/max airmass_absolute .

Describe the solution you'd like
I can think of several potential solutions but would like community feedback on what should be implemented. Based on the feedback, I can create a PR with proposed changes. Current ideas:

  1. Do nothing.
  2. Remove all data screens for airmass_absolute and precipitable_water
  3. Retain data screens but replace actions to change user data with only warnings to the user that the input data exceeds the thresholds, and these thresholds can either be:
  • Defined by the user, or
  • Defined by the developer
  1. Add new arguments (max_airmass_absolute and min_airmass_absolute) to pvlib.spectrum.spectral_factor_firstsolar so that the user can define min/max values for airmass_absolute in the same way they can define max_precipitable_water and min_precipitable_water, and their input data will be changed accordingly.
  2. Something else?

Additional context
I understand this issue touches on a question about the role of pvlib as a toolbox and to what extent it should hold the user's hand through the application of its functions. I touched on the subject in this informal blog post where I used the same spectral factor function as an example. At the very least, I don't think we should be changing user data without letting the user control this change. Going a step further, I don't think this function should include within it an action to change user data. It's job is to calculate the spectral mismatch factor given a set of airmass_absolute and precipitable_water values. I see room for including warnings to advise the user.

@wholmgren
Copy link
Member

that change input data.

Can you clarify what you mean by this? To my knowledge, pvlib is not changing any user data. The test below demonstrates that pvlib does not change the input pw data. Rather, it applies a minimum value before passing the data on to the polynomial (aside: I think the choice of "replaced" in the warning message is not ideal).

In [8]: s = pd.Series([0, 1])

In [9]: pvlib.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
/Users/holmgren/git_repos/pvlib-python/pvlib/spectrum/mismatch.py:359: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
  warn('Exceptionally low pw values replaced with '
Out[9]: array([0.93459226, 0.9905102 ])

In [10]: s
Out[10]:
0    0
1    1
dtype: int64

Can you please provide an example of pvlib changing user data? Or are we only talking about pvlib restricting the valid input data to the polynomial? Note that First Solar contributed the pvlib implementation and considers this restriction to be part of the model.

I'm not opposed to adding similar limits parameters for airmass.

@mikofski
Copy link
Member

@RDaxini correlations like firstsolar spectral function frequently have narrow bounds in which they are valid. The documentation states over what range the correlations are valid. I think a faithful implementation should enforce these limits and not allow extrapolation

@RDaxini
Copy link
Contributor Author

RDaxini commented Jun 19, 2024

Thanks both for the comments. I'm lumping my responses together there's some overlap.

@mikofski For sure there are published limits, but the limits in the publication and docs do not match the pvlib implementation. For example, airmass between 0 and 10 in pvlib but 0 and 5 in the paper/docs. Is there another document that justifies the implemented range? Then if any limits/restrictions are to be enforced, I'm not convinced calculating a mismatch value based on a value the user did not enter is the best enforcement. @wholmgren mentioned First Solar considers the limits part of the model, which I understand, but do they advise on how inputs outside of the prescribed range should be handled? At least with pw inputs exceeding max_precipitable_water, these default to np.nan --- this seems like a better approach if the consensus is that the computation should not take place with anything other than values in the specified range.

@wholmgren Ah, I did not mean the variable is redefined. I meant the input is changed for the calculation of mismatch, ie mismatch is calculated for an input that differs from the user input. In the case of pw, the user is warned and has the option to change the limits. In the case of airmass, the user is not warned and is unable to change the limits.

Eg1: Warning, M calculated from default pw 0.1 rather than user input 0:

In [28]: s = pd.Series([0, 0.1])

In [29]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(s, 1, module_type='cdte')
C:\Users\ezxrd3\Documents\GitHub\pvlib-python\pvlib\spectrum\mismatch.py:369: UserWarning: Exceptionally low pw values replaced with 0.1 cm to prevent model divergence
  warn('Exceptionally low pw values replaced with '

In [30]: M
Out[30]: array([0.93459226, 0.93459226])

Eg2: No warning, M calculated from default ama 10 rather than user input 11, default cannot be changed:

In [34]: a = pd.Series([11, 10])

In [35]: M = pv.spectrum.mismatch.spectral_factor_firstsolar(0.5, a, module_type='cdte')

In [36]: M
Out[36]: 
0    0.778779
1    0.778779

So, just my view, for sure the docs should explain the data ranges over which the model was developed, but currently I still lean more towards letting the user execute their own data filtering. I am not entirely opposed to retaining data screens in the function though, but then in this case I think the user should have control over the air mass limits (as with pw already) and be warned if they are triggered. What do you think?

@AdamRJensen
Copy link
Member

Given the discrepancy between using a max airmass of 5 vs 10, I am in support of adding a max_airmass_absolute parameter.

First, please go through the discussion on GH of the initial addition of the function to see whether there is arguments for why the max airmass is 10 and not 5 as per the paper.

If you cannot find anything on this, then I suggest asking the authors. If it turns out we have not good reason for using 10 and not 5, then I suggest a deprecating cycle where we will change the default value max_airmass_absolute from 10 to 5. For an example on this, then see lies 351-357 of the changes in #2094.

@cwhanse
Copy link
Member

cwhanse commented Jun 20, 2024

please go through the discussion on GH of the initial addition of the function

#208

Looks to me that AM<10 is a choice that the author (M. Lee) made when implementing the model in code. Same limit is present in the Matlab function.

@AdamRJensen
Copy link
Member

@RDaxini Could you paste the quote from the paper where a limit of 5 is described?

@RDaxini
Copy link
Contributor Author

RDaxini commented Jun 20, 2024

@RDaxini Could you paste the quote from the paper where a limit of 5 is described?

fs_am_pw
Image 1 (above): pw and am filter requested @AdamRJensen (extracted from Section II of the manuscript)


Image 2 (below): just some other filters/assumptions in the paper, not sure if relevant but when talking about ensuring the implementation in pvlib is true to the original model, it got me thinking about what other limits there are. How do we decide which limits are implemented in pvlib? This is just a general question from me as a new contributor, I want to understand the thinking behind this a bit more
fs_other

@AdamRJensen
Copy link
Member

@RDaxini I read it as simulations were done for 1 to 5 AM, and NOT as an explicit statement that the model is valid for 1 to 5 AM. Therefore, I think it's probably ok to keep the current upper limit of 10. However, I still think there's benefit to adding the parameter to the function.

@cwhanse
Copy link
Member

cwhanse commented Jun 20, 2024

The regression was performed on the simulations with AM<5. In M. Lee's judgement, its extrapolation to 5<AM<10 was considered reasonable. I don't know if the extrapolation is specifically described in the paper.

The second excerpt reads (to me) like a description of how they cleaned up measurements to either fit or verify the model.

@RDaxini
Copy link
Contributor Author

RDaxini commented Jun 20, 2024

@AdamRJensen agreed, it does not state that it is a specific range of model validity, nowhere in the paper are there any specific statements pertaining to model validity within any particular range (as far as I see...).
@cwhanse unless I missed it, the extrapolation is not described in the paper (neither is the upper pw limit extrapolation)
So, I was not sure how these upper limits were decided or linked to the paper to be justified as part of the implementation.

Based on this discussion, I think for now I will create a simple PR to add the parameter into the function and revise the docs accordingly.

What are your thoughts on the handling of inputs exceeding the limits --- return np.nan as is currently the case for exceeding max_precipitable_water, or stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water?

@AdamRJensen
Copy link
Member

I vote

stick with setting the user input to equal the specified parameter value, as is currently the case with min_precipitable_water?

@RDaxini RDaxini linked a pull request Jun 21, 2024 that will close this issue
7 tasks
@RDaxini
Copy link
Contributor Author

RDaxini commented Jun 21, 2024

Created PR #2100

@adriesse
Copy link
Member

Good and important observations and discussion. Recognizing and managing the limits of model validity and validation are high on my priority/wish list.

The clipping of pwat at 0.1 (not 8.0) and airmass at 10 was also done in a spreadsheet FS once distributed, so it was presumably also done during model validation. We don't know whether any data points were affected by pwat clipping in the validation data sets, so we don't know whether this behaviour is truly validated. I would consider the clipping/extrapolation to be part of the model.

@kandersolar kandersolar added enhancement GSoC Contributions related to Google Summer of Code. labels Jul 3, 2024
@kandersolar kandersolar added this to the v0.11.1 milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement GSoC Contributions related to Google Summer of Code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants