Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_cost_by_service_daily - date filtering parameters not sent to the API #2149

Open
shaicoleman opened this issue Mar 27, 2024 · 21 comments · May be fixed by #2168
Open

aws_cost_by_service_daily - date filtering parameters not sent to the API #2149

shaicoleman opened this issue Mar 27, 2024 · 21 comments · May be fixed by #2168
Assignees
Labels
bug Something isn't working

Comments

@shaicoleman
Copy link

shaicoleman commented Mar 27, 2024

For the following query:

SELECT   service, period_start, period_end
FROM     aws_cost_by_service_daily
WHERE    period_start >= current_timestamp - interval '30d' AND
         period_end >= current_timestamp - interval '30d' AND
         service = 'Tax'

Generates the following filter request:

{
  "Filter": { "Dimensions": {"Key": "SERVICE", "Values": ["Tax"] }},
  "Granularity": "DAILY",
  "GroupBy": [{ "Key": "SERVICE", "Type": "DIMENSION" }],
  "Metrics": ["BlendedCost", "UnblendedCost", "NetUnblendedCost", "AmortizedCost", 
              "NetAmortizedCost", "UsageQuantity", "NormalizedUsageAmount"],
  "TimePeriod": { "End": "2024-03-27", "Start": "2023-03-27" }
}

This requests data for a whole year, instead of just for the requested period, and thus causes many unnecessary API requests which each cost $0.01.

Steampipe v0.22.0
turbot/aws v0.132.0

@shaicoleman shaicoleman added the bug Something isn't working label Mar 27, 2024
@ParthaI
Copy link
Contributor

ParthaI commented Mar 29, 2024

Hi @shaicoleman, the aws_cost_by_service_daily table provides you with a simplified view of cost for services in your account (or all linked accounts when run against the organization master), summarized by day, for the last year.

It's crafted to deliver cost and usage insights per service on a daily basis for the previous year.

From what I gather, you're looking to obtain cost and usage details over a specific time frame.

Regrettably, the aws_cost_usage_* tables lack the capability to filter by a specified time period. Enhancing the aws_cost_usage table or introducing a new table could be potential improvements to support this feature.

I'm currently addressing this issue and will keep you posted once the table design is finalized.

Thank you!

@ParthaI
Copy link
Contributor

ParthaI commented Mar 29, 2024

Hello @shaicoleman, I've made some updates in the issue-2149 branch for the aws_cost_usage table:

  • The table now accepts a specific time frame as input through the search_start_time and search_end_time columns in the WHERE clause, allowing you to retrieve results within a particular period.
  • In the absence of a specified time range, the table will revert to its default behavior, where the time range is determined by the granularity specified in the WHERE clause.

Query result:

> select
  period_start,
  period_end,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time = '2023-04-01T05:30:00+05:30'
  and search_end_time = '2023-04-05T05:30:00+05:30'
order by
  dimension_1,
  period_start;
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| period_start              | period_end                | account_id   | service_name                           | net_unblended_cost_amount |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Tax                                    | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Storage Service          | $1.84                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Elastic Compute Cloud - Compute | $2.23                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Glacier                         | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Security Hub                       | $0.01                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS CloudTrail                         | $1.30                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Cost Explorer                      | $0.45                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Amplify                            | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Queue Service            | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Service Catalog                    | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Inspector                       | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Workflow Service         | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Step Functions                     | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon GuardDuty                       | $0.02                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Simple Notification Service     | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Glue                               | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS WAF                                | $1.47                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Config                             | $0.39                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | Amazon Security Lake                   | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Secrets Manager                    | $0.05                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AWS Key Management Service             | $0.94                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | AmazonCloudWatch                       | $0.23                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 33333333333 | EC2 - Other                            | $0.53                     |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+

Time: 7.3s. Rows fetched: 23. Hydrate calls: 23.

We'd greatly appreciate it if you would like to test the changes in the PR branch and share your feedback with us to ensure the code changes meet the requirements.

Here are the steps to test the PR branch:

  1. Clone the repository: git clone https://github.com/turbot/steampipe-plugin-aws.git
  2. Change directory: cd steampipe-plugin-aws
  3. Switch to the specific branch: git checkout issue-2149
  4. Execute the make command: make
  5. Run the query

Thank you!

@shaicoleman
Copy link
Author

shaicoleman commented Mar 29, 2024

I'm having issues testing that branch with the instructions above:

$ steampipe -v
Steampipe v0.22.1
$ go version
go version go1.22.1 linux/amd64

$ rm -rf ~/.steampipe
$ git clone https://github.com/turbot/steampipe-plugin-aws.git
$ cd steampipe-plugin-aws
$ git checkout issue-2149
$ git show --summary
commit b2d235cb035cf1ca5fba0a018c06920bf19b2bff (HEAD -> issue-2149, origin/issue-2149)
    Enhanced the table aws_cost_usage to accept custom time frame as input param Closes #2149

$ make
go build -o ~/.steampipe/plugins/hub.steampipe.io/plugins/turbot/aws@latest/steampipe-plugin-aws.plugin -tags "" *.go

$ steampipe query
> .inspect aws
Error: could not find connection or table called 'aws'. Is the plugin installed? Is the connection configured?

$ cp config/* ~/.steampipe/config
$ vi ~/.steampipe/config/aws.spc # no changes
$ steampipe query
> .inspect aws
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+
| connection | plugin                                     | schema mode | state | error                | state updated        |
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+
| aws        | hub.steampipe.io/plugins/turbot/aws@latest |             | error | plugin not installed | 2024-03-29T17:40:03Z |
+------------+--------------------------------------------+-------------+-------+----------------------+----------------------+

@ParthaI
Copy link
Contributor

ParthaI commented Mar 30, 2024

Hey, @shaicoleman, could you please try running the command steampipe plugin install aws first and then run the make command from the dir ~/steampipe-plugin-aws?

@shaicoleman
Copy link
Author

shaicoleman commented Mar 30, 2024

I can't get it to run:

  • aws_cost_by_service_daily doesn't have a search_start_time field. The date filters should be applicable to all relevant tables.
SELECT service, unblended_cost_amount, period_start, period_end
FROM   aws_cost_by_service_daily
WHERE  service = 'Tax' AND
       search_start_time >= current_timestamp - interval '30d'

Error: column "search_start_time" does not exist (SQLSTATE 42703)

  • Doesn't work with aws_cost_usage either:
SELECT *
FROM   aws_cost_usage
WHERE  search_start_time >= curerent_timestamp - interval '30d'

No results - no network activity - query doesn't seem to run

  • I tried also with a date, but still doesn't work
SELECT *
FROM   aws_cost_usage
WHERE  search_start_time >= '2024-03-01'

No results - no network activity - query doesn't seem to run

@ParthaI
Copy link
Contributor

ParthaI commented Apr 1, 2024

Hi @shaicoleman, we haven't yet implemented support for the search_start_time and search_end_time columns across all aws_cost_* tables. Currently, these changes have been applied to the aws_cost_usage table only.

I've made additional updates in the issue-2149 branch. Could you please fetch these updates to your local setup, rebuild the plugin, and run your queries again?

Note: When querying the aws_cost_usage table, it's mandatory to include granularity, dimension_type_1, and dimension_type_2 in the WHERE clause. You may also optionally include search_start_time or search_end_time. For further details, kindly refer to the Important Notes section in the table documentation.

Here are the query results with different operators(=, >=, <=):


> select
  period_start,
  period_end,
  search_start_time,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time >= '2023-08-01'
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| period_start              | period_end                | search_start_time         | account_id   | service_name                                   | net_unblended_cost_amount |
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Config                                     | $3.03                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Security Hub                               | $0.06                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Key Management Service                     | $2.87                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Cost Explorer                              | $2.69                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Cloud Map                                  | $0.20                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Secrets Manager                            | $0.40                     |
| 2023-08-01T05:30:00+05:30 | 2023-09-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Step Functions                             | $0.00                     |

 select
  period_start,
  period_end,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time = '2023-04-01T05:30:00+05:30'
  and search_end_time = '2023-04-05T05:30:00+05:30'
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| period_start              | period_end                | account_id   | service_name                           | net_unblended_cost_amount |
+---------------------------+---------------------------+--------------+----------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | Tax                                    | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | AWS Config                             | $0.39                     |
| 2023-04-01T05:30:00+05:30 | 2023-04-05T05:30:00+05:30 | 333333333333 | AWS Amplify                            | $0.00                     |


 select
  period_start,
  period_end,
  search_start_time,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time <= '2023-08-01'
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| period_start              | period_end                | search_start_time         | account_id   | service_name                                   | net_unblended_cost_amount |
+---------------------------+---------------------------+---------------------------+--------------+------------------------------------------------+---------------------------+
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudShell                                 | $0.00                     |
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Amplify                                    | $0.00                     |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Config                                     | $5.47                     |
| 2023-04-01T05:30:00+05:30 | 2023-05-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudTrail                                 | $10.88                    |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS CloudTrail                                 | $11.35                    |
| 2023-05-01T05:30:00+05:30 | 2023-06-01T05:30:00+05:30 | 2023-08-01T05:30:00+05:30 | 333333333333 | AWS Security Hub                               | $0.06                     |

Please feel free to share your feedback.

Thanks!

@shaicoleman
Copy link
Author

shaicoleman commented Apr 20, 2024

It seems that there's an issue returning empty cached results when repeating the same query and changing the date filters.

e.g. the first query will return results correctly:

select
  period_start,
  period_end,
  search_start_time,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time >= '2023-08-01'

But then repeating the same query with a different date will return zero resuts and will not actually execute the query.

select
  period_start,
  period_end,
  search_start_time,
  dimension_1 as account_id,
  dimension_2 as service_name,
  net_unblended_cost_amount::numeric::money
from
  aws_cost_usage
where
  granularity = 'MONTHLY'
  and dimension_type_1 = 'LINKED_ACCOUNT'
  and dimension_type_2 = 'SERVICE'
  and search_start_time >= '2023-09-01'
+--------------+------------+-------------------+------------+--------------+---------------------------+
| period_start | period_end | search_start_time | account_id | service_name | net_unblended_cost_amount |
+--------------+------------+-------------------+------------+--------------+---------------------------+
+--------------+------------+-------------------+------------+--------------+---------------------------+

And also would be good to get that functionality into the other tables as well

@ParthaI
Copy link
Contributor

ParthaI commented Apr 23, 2024

Hi @shaicoleman, thank you for testing the changes.

Regarding the issue where repeating the same query with a different date returns zero results, I have pushed a fix to the same branch (issue-2149). I hope this resolves the problem.

Additionally, I've expanded the functionality to include search_start_time and search_end_time columns for all aws_cost_* tables in the issue-2149 branch.

Could you please pull the latest changes from the issue-2149 branch to your local setup and build the plugin locally?

Let me know if you encounter any further issues.

Thanks!

@shaicoleman
Copy link
Author

shaicoleman commented Apr 24, 2024

Hi,

It doesn't filter correctly the following query:

SELECT   service, unblended_cost_amount, period_start, period_end
FROM     aws_cost_by_service_daily
WHERE    search_start_time >= current_timestamp - interval '30d' AND service = 'Tax'

It requests the default 1 year period, e.g.

{"TimePeriod":{"End":"2024-04-24","Start":"2023-04-24"}}

It also requests unnecessary metrics which aren't part of the query:

{"Metrics":["BlendedCost","UnblendedCost","NetUnblendedCost","AmortizedCost","NetAmortizedCost","UsageQuantity","NormalizedUsageAmount"]}

It would be good if it would only request the necessary metrics.

Also, consider making the search_start_time a required field. Although it's a breaking change, I think it makes sense in this case as it currently chooses an unintuitive default, which generates a lot of extra queries and can incur additional costs.

@ParthaI
Copy link
Contributor

ParthaI commented Apr 26, 2024

@shaicoleman, thank you for your insightful feedback. I've made the following updates to the branch issue-2149:

It doesn't filter correctly the following query

Apologies, that was my oversight. I neglected to push a commit from my local machine. Now, all tables should be able to accept a custom time range based on the query parameter.

It would be good if it would only request the necessary metrics.

Excellent suggestion! I've implemented it, and the results are impressive. I've added an optional string type column called metrics that can be used in the WHERE clause. For selecting a single metric, use ... where metrics = 'BlendedCost'. For multiple metrics, it should be ... where metrics = 'BlendedCost,AmortizedCost,NetUnblendedCost' (listed without spaces and separated by commas).

For example, the query select * from aws_cost_by_service_usage_type_daily where metrics = 'BlendedCost' (for a single metric) took 94.1s to complete, while the query select * from aws_cost_by_service_usage_type_daily (for all metrics) took 535.7s.

I would appreciate it if you could pull the latest changes from the issue-2149 branch, build it locally, and share your feedback.

Note: The metrics optional quals support has not been added to the tables aws_cost_forecast_daily and aws_cost_forecast_monthly, as these tables are designed to return statistics for the UNBLENDED_COST metric only.

Thanks again for your valuable input!

@shaicoleman
Copy link
Author

Looks good to me! Thanks

@ParthaI
Copy link
Contributor

ParthaI commented May 8, 2024

Hello, @shaicoleman, just to let you know, we've finalized the table design and implementation, and I've pushed the updates to the branch issue-2149.

Changes details:

  • Removed the search_start_time and search_end_time columns, as their functionality is effectively replicated by the period_start and period_end columns.
  • Eliminated the metrics column. Now, the table is designed to dynamically pass metric values based on the columns specified in the select statement:
    • A query like select * from aws_cost_by_account_monthly will internally call the API passing all supported metric values.
    • A query such as select period_end, blended_cost_amount from aws_cost_by_account_monthly will trigger an API call for only the BlendedCost metric.

Query performance outcomes:

select * from aws_cost_by_account_monthly (12.6s)
select period_end, blended_cost_amount from aws_cost_by_account_monthly (5.6s)

It would be great if you could pull the latest changes from the branch issue-2149 and share your observations or feedback.

Thanks!

@shaicoleman
Copy link
Author

shaicoleman commented May 9, 2024

It doesn't seem to calculate correctly the TimePeriod, e.g. for the following query:

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start >= current_timestamp - interval '32d' AND
       period_end BETWEEN current_timestamp - interval '31d' AND current_timestamp - interval '1d' AND
       unblended_cost_amount <> 0

It sends the following {"TimePeriod":{"End":"2024-04-08","Start":" 2024-05-08"}} which generates an error: response error StatusCode: 400, RequestID: 8fe1f2ed-3ea1-49c4-8f95-8c6c3be05df6, api error ValidationException: Start date (and hour) should be before end date (and hour)

The error message suggests it might accept a timestamp, but I haven't tried it, and the documentation says otherwise.

Also note: The start date is inclusive, but the end date is exclusive.
https://docs.aws.amazon.com/aws-cost-management/latest/APIReference/API_GetCostAndUsage.html#awscostmanagement-GetCostAndUsage-request-TimePeriod
The dates are in UTC timezone.

@ParthaI
Copy link
Contributor

ParthaI commented Jun 10, 2024

Hi @shaicoleman,

Apologies for the late response. I have pushed another change to the issue-2149 branch to handle the edge case mentioned above.

I have tested the following query parameter combinations.

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_end >= current_timestamp - interval '32d' AND
       unblended_cost_amount <> 0


SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_end >= current_timestamp - interval '32d' AND
       unblended_cost_amount <> 0

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start >= current_timestamp - interval '32d' AND
       period_end <= current_timestamp - interval '25d' AND
       unblended_cost_amount <> 0

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start <= current_timestamp - interval '32d' AND
       period_end >= current_timestamp - interval '35d' AND
       unblended_cost_amount <> 0


SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start <= current_timestamp - interval '32d' AND
       unblended_cost_amount <> 0

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start >= current_timestamp - interval '32d' AND
       unblended_cost_amount <> 0

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'DAILY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       period_start >= current_timestamp - interval '32d' AND
       period_end BETWEEN current_timestamp - interval '31d' AND current_timestamp - interval '1d' AND
       unblended_cost_amount <> 0

Could you please give it a shot again by pulling the latest changes and let me know if I missed any edge cases?

Also note: The start date is inclusive, but the end date is exclusive.

Yes, we are passing the input as it is given in the query parameter, and the results are being displayed based on query parameter conditions. we are not modifying any of the rows.

Could you please expand more, particularly for which query you see the deviation?
Also, Do the relevant AWS CLI commands return the result as expected?

Thanks

@shaicoleman
Copy link
Author

shaicoleman commented Jun 17, 2024

@ParthaI , This works for most cases (including my usecase), but it fails with some edge cases.

e.g. this will fall back to the default one year period, thus missing some expected results.

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'MONTHLY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND
       unblended_cost_amount <> 0

The ideal solution would be to handle the common cases automatically, but have fields for a manual override as before.
If it can't be detected automatically, and no manual dates were provided, it should give an error rather than falling back to an arbitary date range.

I'm not sure it will always be possible to correctly detect if it is indeed a simple case or not (OR conditions, CTEs, unions, subqueries, etc.)

Explicitly setting the fields as before is the most reliable way to do it.

@ParthaI
Copy link
Contributor

ParthaI commented Jun 17, 2024

Thanks @shaicoleman, For your feedback. I will retake a look.

@shaicoleman
Copy link
Author

shaicoleman commented Jun 17, 2024

One thing that would be nice to keep from the current solution is the ability to specify timestamps instead of just dates

@ParthaI
Copy link
Contributor

ParthaI commented Jun 21, 2024

Hello, @shaicoleman. I appreciate your suggestions. I have been comparing the query results from the main branch with those from the issue-2149 branch. I did not notice any deviations in the results. Additionally, I observed significant improvements in query timing. Please take a look at my observations below.

Query:

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'MONTHLY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND
       unblended_cost_amount <> 0

Plugin Environment:

  • Single connection with a specific AWS profile
  • Plugin built from the current main branch
  • Plugin built from the issue-2149 branch

There is no difference in the results from either branch.

Branch main Branch issue-2149
Query: SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type Query: SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type
Number Of Rows return: 6687 Number Of Rows return: 6687
Hydrated calls: 6687 Hydrated calls: 6687
Query Timing: 25257ms Query Timing: 7013ms

Also, I have validated the result as well, I did not see any deviation between the results.

  • From the main branch plugin build ran the command: steampipe query "SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type" --output json > current_main_result.json
  • From the issue-2149 branch plugin build ran the command: steampipe query "SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end FROM aws_cost_usage WHERE granularity = 'MONTHLY' AND dimension_type_1 = 'SERVICE' AND dimension_type_2 = 'USAGE_TYPE' AND (period_end >= current_timestamp - interval '32d' OR period_end >= current_timestamp - interval '500d') AND unblended_cost_amount <> 0 order by period_start, service, usage_type" --output json > branch_2149_result.json
  • Then I checked the diff between the files current_main_result.json and branch_2149_result.json I did not notice any deviation between the results.

Screenshot 2024-06-21 at 9 46 57 AM

Could you please let me know if you noticed any mismatch in your case?

The ideal solution would be to handle the common cases automatically, but have fields for a manual override as before.
If it can't be detected automatically, and no manual dates were provided, it should give an error rather than falling back to an arbitary date range.

I believe it would be a breaking change for the user to give an error rather than fall back to an arbitrary date range. Instead, let the API return the results and perform the filtration at the Steampipe query level based on the query parameters (this approach looks good to me).

One thing that would be nice to keep from the current solution is the ability to specify timestamps instead of just dates

Great idea. According to our current code, if the granularity is set to HOURLY, we use timestamps; otherwise, we use dates. The changes in this PR branch follow the same approach. However, handling timestamps in different formats can be challenging. We have made these changes while also considering backward compatibility.

Thank you!

@shaicoleman
Copy link
Author

shaicoleman commented Jun 21, 2024

Could you please let me know if you noticed any mismatch in your case?

I checked the date range that was sent to the API, if the date range is wrong, there is no way the results can be correct.
It could be that you have less than 12 months of billing data (AWS retains 13 months), which explains why you're getting the same result. You may need to change the granuality to daily to see the differences.

SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
FROM   aws_cost_usage
WHERE  granularity = 'MONTHLY' AND
       dimension_type_1 = 'SERVICE' AND
       dimension_type_2 = 'USAGE_TYPE' AND
       (period_end >= current_timestamp - interval '1d' OR period_end >= current_timestamp - interval '2d') AND
       unblended_cost_amount <> 0

The example above requests a whole year instead of 2 days. If the granuality is daily it can be very expensive, thus doesn't fix the issue it was supposed to solve in the first place.

I understand not wanting to break compatibility, but the current behaviour can cause silently excessive costs, silently truncated data and inconsistent behaviour. This justifies breaking compatibility for the small amount of users who are using this table. Giving a clear error that can be easily fixed is much better than the silent bugs above, which required spending a lot of my time and yours to figure out.

If breaking compatibility is not something you're willing to consider, then at least allow to optionally set the search_start_time and search_end_time to override the calculated value.

It could also print a warning to stderr (e.g. WARNING: future versions will require explicitly setting the search_start_time and search_end_time, defaulting to a year period)

@ParthaI
Copy link
Contributor

ParthaI commented Jun 27, 2024

Hi @shaicoleman,

I hope this message finds you well.

I have reviewed your requirements from a costing perspective and would like to address the following points:

  • The API behavior of GetCostAndUsage accepts a single date range and does not support any operator. Both the start time and end time are required to make the API call.
  • Given this API behavior, we adjust the time period based on the operator and conditions from the query parameter.
  • Specifically for the OR operator, while Steampipe FDW parses the query correctly, the quals value remains empty.

If breaking compatibility is not something you're willing to consider, then at least allow to optionally set the search_start_time and search_end_time to override the calculated value.

  • As far as I know, we already have period_start and period_end columns to support passing the time period. Adding search_start_time and search_end_time columns for a specific query requirement might not be the best approach due to redundancy.
  • If we do add search_start_time and search_end_time columns, the column type would be timestamp.
  • We are unsure how users will utilize these columns, which might lead to similar handling issues as with period_start and period_end.
  • Additionally, may I suggest revising the query plan to achieve the desired outcome?

For the query you mentioned, I have made a slight modification to make the API call based on the query parameter, achieving the same result as the original query.

WITH period_end_1d AS (
  SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
  FROM   aws_cost_usage
  WHERE  granularity = 'MONTHLY' AND
         dimension_type_1 = 'SERVICE' AND
         dimension_type_2 = 'USAGE_TYPE' AND
         period_end >= current_timestamp - interval '1d' AND
         unblended_cost_amount <> 0
),
period_end_2d AS (
  SELECT dimension_1 AS service, dimension_2 AS usage_type, unblended_cost_amount, period_start, period_end
  FROM   aws_cost_usage
  WHERE  granularity = 'MONTHLY' AND
         dimension_type_1 = 'SERVICE' AND
         dimension_type_2 = 'USAGE_TYPE' AND
         period_end >= current_timestamp - interval '2d' AND
         unblended_cost_amount <> 0
)
SELECT * FROM period_end_1d
UNION
SELECT * FROM period_end_2d;

Thank you for your attention to these details. I look forward to your feedback.

@shaicoleman
Copy link
Author

shaicoleman commented Jun 27, 2024

Hi @ParthaI ,

I don't really have anything more to add to what I've already said.
From your point of view breaking compatibility is a dealbreaker, and the current solution works most of the time, and it requires one less parameter in the query.

In my opinion, having the possibility of silently truncating data or silently causing unforeseen expenses is a huge dealbreaker, and is much worse than breaking compatibility and having some redundancy in the query. There is no easy way to know if a query will trigger the correct behaviour or not without tracing the API calls.

This means adding the search_start_time and search_end_time columns as required fields. These are required fields in the API, and thus they need to be required fields in steampipe as well, as they cannot always be inferred correctly.

@misraved , maybe give your feedback on what approach you prefer, and we'll go with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants