featurefuse is Python package that allows you to choose which feature you make for machine learning models using yaml configuration.
This implementation is based on the below references.
- Ref
- Implement feature engineering class inheriting Feature abstract class.
- Make config file.
# Path to implemented feature class. use_feature: - .tmp.TmpFeature1 - .tmp.TmpFeature2 # Params for each implemented class. feature_params: TmpFeature1: shift_b_to: 3 TmpFeature2: const: 1000
- Pass config and base/original input DataFrame to
featurefuse.generator.run()
. - You can get feature DataFrame and description of feature.
Following code is minimum example.
# Saved this code as tmp.py
import pandas as pd
from featurefuse.base import Feature
class TmpFeature1(Feature):
def create_feature(self, tmp_df: pd.DataFrame, shift_b_to: int, **kwargs) -> pd.DataFrame:
fe = tmp_df[["col1", "col2"]].copy(deep=True)
self.create_description(
f"feature col names generated by TmpFeature1", f"description of each cols of TmpFeature1"
)
for i in range(1, shift_b_to + 1):
fe[f"col2_b{i}"] = fe["col2"].shift(i)
return fe
class TmpFeature2(Feature):
def create_feature(self, tmp_df: pd.DataFrame, const: int, **kwargs) -> pd.DataFrame:
fe = tmp_df[["col1", "col2"]].copy(deep=True)
self.create_description(
f"feature col names generated by TmpFeature2", f"description of each cols of TmpFeature2"
)
fe[f"col2_add_{const}"] = fe["col2"] + const
return fe
def main():
import yaml
from featurefuse.generator import run
config = yaml.safe_load(
"""
use_feature:
# Path to TmpFeature1. In this case, this file saved as tmp.py, so path is .tmp.TmpFeature1
- .tmp.TmpFeature1
- .tmp.TmpFeature2
feature_params:
TmpFeature1:
shift_b_to: 3
TmpFeature2:
const: 1000
"""
)
tmp_df = pd.DataFrame([["aaa", 123], ["bbb", 456], ["ccc", 789], ["ddd", 101]], columns=["col1", "col2"])
feature, description = run(config, join_key="col1", tmp_df=tmp_df)
print(feature)
# you can use feature DataFrame for machine learning model etc.
# model.fit(feature.drop("col1", axis="columns"), feature[["col1"]])
if __name__ == "__main__":
main()
exec and then you get following output.
$ python tmp.py
col1 col2 col2_b1 col2_b2 col2_b3 col2_add_1000
0 aaa 123 NaN NaN NaN 1123
1 bbb 456 123.0 NaN NaN 1456
2 ccc 789 456.0 123.0 NaN 1789
3 ddd 101 789.0 456.0 123.0 1101
This tool name is given by ChatGPT.
prompt: tell me own develop library name idea which select feature for machine learning in experiment phase.
Sure, here are a few potential names for a library that helps with feature selection for machine learning experiments:
- FeatureFuse
- Selectify
- FeatureCraft
- MachinaSelect
- FeatLab
- ModelMuse
- SmartFeat
- FeatureFlex
- MindFeat
- IntelliSelect I hope these ideas inspire you and help you find the perfect name for your library!
prompt: Which is your recommendation?
As an AI language model, I don't have personal preferences or biases, but I can provide some guidance based on the names I suggested earlier.
If you're looking for a name that highlights the library's ability to fuse or combine different features, "FeatureFuse" or "Selectify" could be good options. "FeatureCraft" and "MachinaSelect" convey a sense of precision and automation, which could be suitable if those are important qualities for your library. "FeatLab" and "ModelMuse" both sound like they could be useful tools for researchers or data scientists, while "SmartFeat" and "IntelliSelect" suggest a focus on intelligent and efficient feature selection. Finally, "MindFeat" could work if your library is designed to help researchers gain a deeper understanding of the features used in their models.
Ultimately, the name you choose will depend on your target audience and the specific features and benefits of your library. I hope this helps!
- englishnize
- test
- write how to use
- write example with hydra
- setting up GitHub Actions or CI things