GitHub - imyutaro/featurefuse: select feature for machine learning models using yaml configuration.

featurefuse

featurefuse is Python package that allows you to choose which feature you make for machine learning models using yaml configuration.

This implementation is based on the below references.

Ref
- https://amalog.hateblo.jp/entry/kaggle-feature-management
- https://tech.connehito.com/entry/2019/11/15/110000

How to use

Implement feature engineering class inheriting Feature abstract class.

Make config file.

# Path to implemented feature class.
use_feature:
  - .tmp.TmpFeature1
  - .tmp.TmpFeature2
# Params for each implemented class.
feature_params:
  TmpFeature1:
    shift_b_to: 3
  TmpFeature2:
    const: 1000

Pass config and base/original input DataFrame to featurefuse.generator.run().
You can get feature DataFrame and description of feature.

Following code is minimum example.

# Saved this code as tmp.py
import pandas as pd

from featurefuse.base import Feature


class TmpFeature1(Feature):
    def create_feature(self, tmp_df: pd.DataFrame, shift_b_to: int, **kwargs) -> pd.DataFrame:
        fe = tmp_df[["col1", "col2"]].copy(deep=True)
        self.create_description(
            f"feature col names generated by TmpFeature1", f"description of each cols of TmpFeature1"
        )
        for i in range(1, shift_b_to + 1):
            fe[f"col2_b{i}"] = fe["col2"].shift(i)
        return fe


class TmpFeature2(Feature):
    def create_feature(self, tmp_df: pd.DataFrame, const: int, **kwargs) -> pd.DataFrame:
        fe = tmp_df[["col1", "col2"]].copy(deep=True)
        self.create_description(
            f"feature col names generated by TmpFeature2", f"description of each cols of TmpFeature2"
        )
        fe[f"col2_add_{const}"] = fe["col2"] + const
        return fe


def main():
    import yaml

    from featurefuse.generator import run

    config = yaml.safe_load(
        """
    use_feature:
      # Path to TmpFeature1. In this case, this file saved as tmp.py, so path is .tmp.TmpFeature1
      - .tmp.TmpFeature1
      - .tmp.TmpFeature2
    feature_params:
      TmpFeature1:
        shift_b_to: 3
      TmpFeature2:
        const: 1000
    """
    )

    tmp_df = pd.DataFrame([["aaa", 123], ["bbb", 456], ["ccc", 789], ["ddd", 101]], columns=["col1", "col2"])
    feature, description = run(config, join_key="col1", tmp_df=tmp_df)
    print(feature)

    # you can use feature DataFrame for machine learning model etc.
    # model.fit(feature.drop("col1", axis="columns"), feature[["col1"]])


if __name__ == "__main__":
    main()

exec and then you get following output.

$ python tmp.py
  col1  col2  col2_b1  col2_b2  col2_b3  col2_add_1000
0  aaa   123      NaN      NaN      NaN           1123
1  bbb   456    123.0      NaN      NaN           1456
2  ccc   789    456.0    123.0      NaN           1789
3  ddd   101    789.0    456.0    123.0           1101

Other

This tool name is given by ChatGPT.

prompt: tell me own develop library name idea which select feature for machine learning in experiment phase.

Sure, here are a few potential names for a library that helps with feature selection for machine learning experiments:

FeatureFuse

Selectify

FeatureCraft

MachinaSelect

FeatLab

ModelMuse

SmartFeat

FeatureFlex

MindFeat

IntelliSelect I hope these ideas inspire you and help you find the perfect name for your library!

prompt: Which is your recommendation?

As an AI language model, I don't have personal preferences or biases, but I can provide some guidance based on the names I suggested earlier.

If you're looking for a name that highlights the library's ability to fuse or combine different features, "FeatureFuse" or "Selectify" could be good options. "FeatureCraft" and "MachinaSelect" convey a sense of precision and automation, which could be suitable if those are important qualities for your library. "FeatLab" and "ModelMuse" both sound like they could be useful tools for researchers or data scientists, while "SmartFeat" and "IntelliSelect" suggest a focus on intelligent and efficient feature selection. Finally, "MindFeat" could work if your library is designed to help researchers gain a deeper understanding of the features used in their models.

Ultimately, the name you choose will depend on your target audience and the specific features and benefits of your library. I hope this helps!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
example		example
src/featurefuse		src/featurefuse
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

featurefuse

How to use

Other

TODO

About

Releases

Packages

Languages

imyutaro/featurefuse

Folders and files

Latest commit

History

Repository files navigation

featurefuse

How to use

Other

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages