Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform job aggregations for missing field causing issues... #1273

Open
MahendraAkkina opened this issue Oct 9, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@MahendraAkkina
Copy link

What is the bug?
In the transforms job, the min, max, avg aggregations on a missing field is resulting in -Infinity, Infinity, NaN. Also value_count and sum results in 0.

The issue is - the target index is being populated with the new fields with such values (along with making some of the fields mapping set to TEXT). It behaves better with setting “missing”: 0 for numeric fields in the agg function but it’s not ideal as it misrepresents the data.

What I really want is for the missing fields based fields not to be in target index at all for those documents. Is there a way to accomplish this? This is kind of a show stopper the default behavior will not work and using missing misrepresents the data.

How can one reproduce the bug?
Steps to reproduce the behavior:
Here is an example:
Transform job:

{
    "transform": {
        "enabled": true,
        "continuous": true,
        "schedule": {
            "interval": {
                "period": 5,
                "unit": "Minutes"
            }
        },
        "description": "Sample transform job",
        "source_index": "sample",
        "target_index": "sample_transform",
        "data_selection_query": {
            "match_all": {}
        },
        "page_size": 1,
        "groups": [
            {
                "date_histogram": {
                    "source_field": "timestamp",
                    "fixed_interval": "60m",
                    "timezone": "UTC"
                }
            },
            {
                "terms": {
                    "source_field": "device.keyword",
                    "target_field": "device"
                }
            }
        ],
        "aggregations": {
            "m1_value_count": {
                "value_count": {
                    "field": "m1"
                }
            },
            "m1_avg": {
                "avg": {
                    "field": "m1"
                }
            },
            "m1_max": {
                "max": {
                    "field": "m1"
                }
            },
            "m1_min": {
                "min": {
                    "field": "m1"
                }
            },
            "m1_sum": {
                "sum": {
                    "field": "m1"
                }
            },
            "m3_value_count": {
                "value_count": {
                    "field": "m3"
                }
            },
            "m3_avg": {
                "avg": {
                    "field": "m3"
                }
            },
            "m3_max": {
                "max": {
                    "field": "m3"
                }
            },
            "m3_min": {
                "min": {
                    "field": "m3"
                }
            },
            "m3_sum": {
                "sum": {
                    "field": "m3"
                }
            }
        }
    }
}

In the target index, you can see m3 related fields showing up a certain way when m3 is missing in the time interval.

"_source": {
                    "transform._id": "metric_all_3_transform_job",
                    "_doc_count": 22,
                    "transform._doc_count": 22,
                    "timestamp": 1728007200000,
                    "device": "1.1.1.1",
                    "m1_max": 99.13,
                    "m1_min": 17.66,
                    "m1_avg": 56.58500000000001,
                    "m1_value_count": 22.0,
                    "m1_sum": 1244.8700000000001
                    "m3_max": "-Infinity",
                    "m3_min": "Infinity",
                    "m3_avg": "NaN",
                    "m3_sum": 0.0,
                    "m3_value_count": 0.0
                }

What is the expected behavior?
Not generate any of the m3* fields in such cases.

What is your host/environment?
v2.11

@bharath-techie
Copy link

[ Triage attendees - 1 2 3 4]

One solution is to add a flag to skip adding missing fields as part of the transormed documents.

@MahendraAkkina
Copy link
Author

@bharath-techie Is this option available now (or is there a work around to achieve this) or you are just suggesting enhancing the support by adding an option to skip?

@MahendraAkkina
Copy link
Author

This is kind of a show stopper for us. Any thoughts from anyone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

2 participants