A priori functional form with repeated functions #528
-
How would I force PySR to output functions only of the form y = f(x1) + f(x2) + f(x3) (it is important that f is the same for all three). I've been following discussion #291. I've been trying to achieve this by modifying the objective function. So far, I have the following. # See if features in an expression:
function contains(t, features)
if t.degree == 0
return !t.constant && t.feature in features
elseif t.degree == 1
return contains(t.l, features)
else
return contains(t.l, features) || contains(t.r, features)
end
end
function objective_function(tree, dataset::Dataset{T,L}, options) where {T,L}
#Ensures that the top node has two children
tree.degree != 2 && return L(Inf)
left = tree.l
right = tree.r
#Sets that the left side also has 2 children
left.degree != 2 && return L(Inf)
bot_left = left.l
bot_right = left.r
#Evaluates each of the parts
bot_left_pred, flag = eval_tree_array(bot_left, dataset.X, options)
!flag && return L(Inf)
bot_right_pred, flag = eval_tree_array(bot_right, dataset.X, options)
!flag && return L(Inf)
right_pred, flag = eval_tree_array(right, dataset.X, options)
!flag && return L(Inf)
#Sets form f(r12) + f(r13) + f(r23)
prediction = right_pred .+ bot_left_pred .+ bot_right_pred
#Takes out the wrong r from each equation
right_violating = Int(contains(right, (2,3))) + Int(!contains(right, 1))
bot_left_violating = Int(contains(bot_left, (1,3))) + Int(!contains(bot_left, 2))
bot_right_violating = Int(contains(bot_right, (1, 2))) + Int(!contains(bot_right, 3))
#Punishes having the wrong variables in the wrong equations
regularization = L(10000) * (right_violating .+ bot_left_violating .+ bot_right_violating)
#Returns squared error
diffs = prediction - dataset.y
return sum(diffs .^ 2) / length(diffs) + regularization
end This has problems however. First of all, many functions generated do not even take the form y = f(x1) + g(x2) + h(x3), let alone having f = g = h. I'm stuck on how to proceed, and any help would be appreciated. Please note that this is also my first time seriously working with Julia, so any pointers on the language itself would also be very welcome. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Oh, if For example: function objective_function(tree, dataset::Dataset{T,L}, options) where {T,L}
# Want base tree to have x1 as only feature; any other feature node will return early:
if any(node -> node.degree == 0 && !node.constant && node.feature != 1, tree)
return L(1e9)
end
# Evaluate once with only the feature passed
# which is like you are setting x1=x1, then x1=x2, then x1=x3.
f_x1, flag = eval_tree_array(tree, (@view dataset.X[[1], :]), options) # Or just `dataset.X` is good too as it will take the first col anyways
!flag && return L(1e8)
f_x2, flag = eval_tree_array(tree, (@view dataset.X[[2], :]), options)
!flag && return L(1e8)
f_x3, flag = eval_tree_array(tree, (@view dataset.X[[3], :]), options)
!flag && return L(1e8)
prediction = f_x1 .+ f_x2 .+ f_x3
...
end Just keep in mind the printed output is just going to be Hopefully this helps. Btw the |
Beta Was this translation helpful? Give feedback.
Oh, if
f
is the same for all three, it might be easier to just use the originaltree
passed and simply evaluate with different inputs each time?For example: