GitHub - observo/Modeling

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Data		Data
ReadMe.txt		ReadMe.txt

Repository files navigation

Here I have added a dataset containing several CSV files on Training and Testing both on Individual and Household level where the Heading and Data both are obfuscated, I do not know where you can call this Encoded or Unlabeled, that has been a different discussion or perhaps a serious debate. The good news is that we need not concentrate on Economic Theory or any Domain Knowledge is entirely meaningless for this task at hand, this has been by all means a Data Mining only however if the Meaning of Data is concerned. The CSV tells us here we have a Structural form in Tabular way and I am not telling anything more on what Learning Strategy is amenable to fulfill our Objective.

Our Objective is to finding all Subsets of a Set where all Subsets are Statistically Significant as well as can act like as a Classifier.
Let me put this mathematically-
Here are N vectors containing M features each and we are trying to find all subsets containing K features where 1<=K<M.
So we are trying to find a Bag eventually.

Now you may guess we can apply KNN, that is good. But this will not lead us to anywhere.
Now may be you are guessing we can apply Best-Subset Regression. That is great too.
I can tell you there is a Stepwise Regression and perhaps this is leading to Model Selection, perhaps LASSO.
Is this advancing?
May be or may be not.
Perhaps you are thinking about Genetic Algorithms to select the Regression.
Hopefully that is great too.
Now may be you are focusing on Deep Learning, we all know Deep Learning is just a Representation Learning.
Great.
Should I suggest you Tensorflow or Theano for Deep Learning?
I can suggest but how far we can go?
Now we can take other approaches.
We can think on Restricted Boltzmann Machine(RBM) but how do we make a Bipartite Graph, a serious challenge so far.
Do we know the Predictive Power of RBM? How to make the Partition Function and Energy Function?
A serious thing perhaps you know but I am not sure you know.
The central question is can we use RBM on Tabular Data?
Do we know Graph-Theoretic Regularization?

Combining all these things together, can we write some code either in R or Python to make a solution that will maintain the constraints imposed by any Random Forest Mean Log Loss?

Hopefully yes but I am not sure when this comes out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

observo/Modeling

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages