Use monetary adjustment to try a better accuracy in meal outlier classifier #514

cuducos · 2020-01-14T15:21:48Z

What is the problem?

Maybe we can get better accuracy in the meal outlier classifier (as far as I can remember, the only one in which the value of the reimbursement is relevant) by adjusting the prices overtime to the inflation.

How can this be addressed?

There's a package that can easily do that (using IPCA at this point), probably in the fit or transform stages of the classifier (sorry, scikit-learn, I never remember the differente between these two).

Probably something like that would do the adjustment df["ajusted_value"] = df.apply(lambda row: ipca.adjust(row['expense_date'], row['total_value']), axis=1) and then we compare the results to see if there is a better accuracy as my hypothesis suggests.

Who could help with this issue?

Anyone interested in doing some exploratory work with data and, maybe, contributing to https://github.com/okfn-brasil/notebooks ; )

The text was updated successfully, but these errors were encountered:

willianpaixao · 2020-04-24T08:35:43Z

Hi @cuducos,
honest question, how far back in the past are we analyzing data? Does inflation would make a noticeable difference in a spam of of less than five years?

And yes, an library like you mentioned would give the most accurate correction, but a simple table defining accumulated inflation for quarters (or even semesters) would already improve accuracy. Maybe it's a good first test.

cuducos · 2020-04-24T12:16:53Z

how far back in the past are we analyzing data?

Data goes back to 2009.

Does inflation would make a noticeable difference in a spam of of less than five years?

In five year the imapct was 26% (according to IPCA), 34% (according to IGPM) or 56% (according to SELIC). Not sure which one better serves this purpose, but given the nature of the expenses, I would guess IPCA (but it is still merely a guess).

In [1]: from calculadora_do_cidadao import Ipca, Igpm, Selic

In [2]: from datetime import datetime

In [3]: from datetime import timedelta

In [4]: for Adapter in (Ipca, Igpm, Selic):
   ...:     adapter = Adapter()
   ...:     diff = adapter.adjust((datetime.now() - timedelta(days=365 * 5)).date())
   ...:     print(diff)
   ...: 
1.259894139013801502406252724
1.340223541188715175178488001
1.558435971207303521700667897

And yes, an library like you mentioned would give the most accurate correction, but a simple table defining accumulated inflation for quarters (or even semesters) would already improve accuracy.

IMHO opinion, given this library already exists (see nested example above), it will mean more work to build this table than to automatize the thing… the library allows you to use the date already present in our dataset while working with a quarter/semester would need an extra conversion from date to quarter/semester.

cuducos added enhancement question analysis labels Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use monetary adjustment to try a better accuracy in meal outlier classifier #514

Use monetary adjustment to try a better accuracy in meal outlier classifier #514

cuducos commented Jan 14, 2020

willianpaixao commented Apr 24, 2020

cuducos commented Apr 24, 2020

Use monetary adjustment to try a better accuracy in meal outlier classifier #514

Use monetary adjustment to try a better accuracy in meal outlier classifier #514

Comments

cuducos commented Jan 14, 2020

willianpaixao commented Apr 24, 2020

cuducos commented Apr 24, 2020