You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am new to the symbolic regression. I have questions after I used PySR. My object is to discover analytic equations using observational data. I have 12 features and 1 targets. However, I failed to find some useful equations.
Q1: I try to input 5 or 8 features, but in the final equations, it always only appears 3 same features. Does it indicate my target variables is only influenced by these features. But all features physically could influence the variation of target variable.
Q2: Another issue is that the complexity of equations are always far small than the maxsize parameter (I set max size to 90 but only get equations with no more than 40). I tried to use more complex operator (e.g., tanh, atan, erf), but it did not improved. The result equations are too simple. Is there any solution for this problem?
Q3: I have more than two billions sample to train the sample, but PySR hardly processes more than 10,000 samples. I set the batching to True, but I did not observe obvious difference of the result. Is there any method to process big data? Is there any useful sub-sample method beyond randomly selected?
Q4: Did the features need to be normalized? I found that normalized feature could result in better equations, but it is hardly to explain the relationship between target variable and normalized features. Any suggestions?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I am new to the symbolic regression. I have questions after I used PySR. My object is to discover analytic equations using observational data. I have 12 features and 1 targets. However, I failed to find some useful equations.
Q1: I try to input 5 or 8 features, but in the final equations, it always only appears 3 same features. Does it indicate my target variables is only influenced by these features. But all features physically could influence the variation of target variable.
Q2: Another issue is that the complexity of equations are always far small than the
maxsize
parameter (I set max size to 90 but only get equations with no more than 40). I tried to use more complex operator (e.g., tanh, atan, erf), but it did not improved. The result equations are too simple. Is there any solution for this problem?Q3: I have more than two billions sample to train the sample, but PySR hardly processes more than 10,000 samples. I set the
batching
to True, but I did not observe obvious difference of the result. Is there any method to process big data? Is there any useful sub-sample method beyond randomly selected?Q4: Did the features need to be normalized? I found that normalized feature could result in better equations, but it is hardly to explain the relationship between target variable and normalized features. Any suggestions?
Best regards,
Lu Li
Beta Was this translation helpful? Give feedback.
All reactions