Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I created this pull request to give some hints on CF generation @gaugup.
Regarding Random, although there is very clear and fast, finding combinations between feature sampling and substitution is unclear. The Loop inside, instead of gradually replacing more features actually in your code, only replaces one feature as :
selected_features = np.random.choice(self.features_to_vary, (sample_size, 1), replace=True)
1 should be replaced by num_features_to_vary and then .loc instead of .at.
This method is slower but certainly more complete and still faster than Genetic/KDtree (I have deliberately left it commented out for you).
If you want to leave a single variation, I suggest changing .at to ._get_value in the replacement for faster access.
As far as genetic is concerned, in the case of datasets with many features, a random initialization is very slow and seems never to end. For this reason, I suggest increasing the population of the KDtree initialization (which is also lowering the initialization time a lot). In addition, I recommend switching to a binary search in the case of requests for a large number of CFs.