You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe we can get better accuracy in the meal outlier classifier (as far as I can remember, the only one in which the value of the reimbursement is relevant) by adjusting the prices overtime to the inflation.
Probably something like that would do the adjustment df["ajusted_value"] = df.apply(lambda row: ipca.adjust(row['expense_date'], row['total_value']), axis=1) and then we compare the results to see if there is a better accuracy as my hypothesis suggests.
Hi @cuducos,
honest question, how far back in the past are we analyzing data? Does inflation would make a noticeable difference in a spam of of less than five years?
And yes, an library like you mentioned would give the most accurate correction, but a simple table defining accumulated inflation for quarters (or even semesters) would already improve accuracy. Maybe it's a good first test.
Does inflation would make a noticeable difference in a spam of of less than five years?
In five year the imapct was 26% (according to IPCA), 34% (according to IGPM) or 56% (according to SELIC). Not sure which one better serves this purpose, but given the nature of the expenses, I would guess IPCA (but it is still merely a guess).
And yes, an library like you mentioned would give the most accurate correction, but a simple table defining accumulated inflation for quarters (or even semesters) would already improve accuracy.
IMHO opinion, given this library already exists (see nested example above), it will mean more work to build this table than to automatize the thing… the library allows you to use the date already present in our dataset while working with a quarter/semester would need an extra conversion from date to quarter/semester.
What is the problem?
Maybe we can get better accuracy in the meal outlier classifier (as far as I can remember, the only one in which the value of the reimbursement is relevant) by adjusting the prices overtime to the inflation.
How can this be addressed?
There's a package that can easily do that (using IPCA at this point), probably in the
fit
ortransform
stages of the classifier (sorry,scikit-learn
, I never remember the differente between these two).Probably something like that would do the adjustment
df["ajusted_value"] = df.apply(lambda row: ipca.adjust(row['expense_date'], row['total_value']), axis=1)
and then we compare the results to see if there is a better accuracy as my hypothesis suggests.Who could help with this issue?
Anyone interested in doing some exploratory work with data and, maybe, contributing to https://github.com/okfn-brasil/notebooks ; )
The text was updated successfully, but these errors were encountered: