Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clipping levels function #103

Open
csptvlt opened this issue Mar 5, 2021 · 4 comments
Open

Clipping levels function #103

csptvlt opened this issue Mar 5, 2021 · 4 comments

Comments

@csptvlt
Copy link

csptvlt commented Mar 5, 2021

I was trying the clipping detection functions, and clipping.levels shows a behavior that I was not expecting.
The default number of levels is 2, and in my case (see figure below, a small part of the used data for a full year) it identifies night periods as clipping. If I force levels=1, clipping is being identified only during the night because it's the interval with the most data points (?).

Am I using the function wrong? Or should there be, for example, a previous step where daytime is identified?

@cwhanse
Copy link
Member

cwhanse commented Mar 5, 2021

Or should there be, for example, a previous step where daytime is identified?

Short answer: Yes.

An explanation:

The clipping.levels function is looking for peaks in a histogram of the data. It doesn't filter nighttime periods. So when nighttime periods are included in the time series, these periods will create the highest peak at power=0 in the histogram.

The reason that clipping.levels doesn't filter nighttime periods is to keep the function to one task: detecting levels in the data. There are functions in features.daytime that can help label day/night periods if you don't already have an indicator (power, solar angle) that you are confident in.

We designed pvanalytics to emphasize re-use, which is the reason for the one-function, one task rule.

We have thought about providing pre-built workflows (sequences of functions) such as what I think your case needs: first filter night time periods, then apply clipping.levels. I am interested to hear your thoughts about the value of pre-built workflows.

@camsilva
Copy link

camsilva commented Mar 7, 2021

Great, understood, the re-use is indeed relevant.

The workflows you mention is something I can see having a good value. Small workflows like the one we were discussing would be valuable, at least if we know that a feature or a metric will be greatly affected when a specific function is not used in a previous step, which is the case for clipping in this case.

In an extreme case, using data through pvanalytics with a final purpose (calculation of some metric, for example) will probably need a previous focus on a set of problems (gaps, consistency, filtering, inference/imputation) in most cases, which some of them you already have some solutions here, and maybe there are more general workflows that could be thought of. I don't know if this is feasible or if it belongs to the scope of the project, but I certainly see value in it. (https://onlinelibrary.wiley.com/doi/10.1002/pip.3349)

On another note, I saw issue #68, and having functions for parsing names of the physical values measured could probably lead to some semi-automated processes, at least for functions that refer to physical limits/consistency/inference, which could also fall on this workflows topic.

(now writing from my main account)

@cwhanse
Copy link
Member

cwhanse commented Mar 8, 2021

@camsilva thanks for sharing your views.

I am picturing a layer of functions built on top of the basic clipping library, something in the spirit of

def label_clipping(data, how='method name', filters={'night':True, 'outliers': False, ...})

where the filters argument controls any subsetting of the data prior to applying the clipping detection method.

@wfvining
Copy link
Collaborator

wfvining commented Mar 8, 2021

It might fill some of the gap, for these kind of small workflows, if we added a "cookbook" to the docs. I think it is still early days and without a solid footing, building an API like @cwhanse suggests could be hard to get right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants