Skip to content

Files

Latest commit

 

History

History
378 lines (292 loc) · 14.7 KB

how-to-write-queries.md

File metadata and controls

378 lines (292 loc) · 14.7 KB

How to write queries

Table of content

Intro

Google Ads API Report Fetcher uses GAQL syntax with some extended capabilities.

This is how a generic query might look like:

SELECT
    ad_group.id,
    ad_group.name
FROM ad_group

When running this query and saving the results we get pretty long and unreadable column names - ad_group.id and ad_group.name.

Things might be more complicated if you want to extract and save such objects as unselectable elements, complex messages and resource names.

In order to simplify data extraction and processing when querying data from Ads API we introduce additional syntax (see an example below):

SELECT
    resource.attribute AS column_name_1,
    resource.attribute:nested.resource AS column_name_3
    resource.attribute~1 AS column_name_4
FROM resource

Elements

  • Aliases (AS column_name)
  • Nested resources (:nested.resource.name)
  • Resource indices (~position)
  • Virtual columns (metric.name / metric.name_2 AS alias)

Aliases

Alias is used to give a descriptive name to a metric or attribute fetched from API when saving data. So instead of column name campaign.app_campaign_setting.bidding_strategy_goal_type you may use something more user friendly, like bidding_type.

Aliases are specified using AS keyword as shown below:

SELECT
    campaign.app_campaign_setting.bidding_strategy_goal_type AS bidding_type
FROM campaign

If you don't specify an alias it will be generated as full column name where "." replaced with "_".

In NodeJS version autogenerated aliases won't contains current resource name (for example while selecting "campaign.app_campaign_setting.bidding_strategy_goal_type" from campaign there will be a column "app_campaign_setting_bidding_strategy_goal_type")

Nested Resources

Some fields return structs, and if you want to get a nested attribute scalar value you can use nested resource selectors. One particular example is working with change_event - change_event.new_resource consists of various changes made to an entity and looks something like that:

new_resource {
    campaign {
        target_cpa {
            target_cpa_micros: 1000000
        }
    }
}

In order to extract a particular element (i.e., final value for target_cpa_micros) we use the : syntax - change_event.new_resource:campaign.target_cpa.target_cpas_micros:

SELECT
    change_event.old_resource:campaign.target_cpa.target_cpa_micros AS old_target_cpa,
    change_event.new_resource:campaign.target_cpa.target_cpa_micros AS new_target_cpa
FROM change_event

Resource Indices

Resource indices are used to extract a particular element from data type RESOURCE_NAME. I.e., if we want to get resource name for campaign_audience_view.resource_name and save it somewhere, the saved result will contain a string customers/{customer_id}/campaignAudienceViews/{campaign_id}~{criterion_id}. Usually we want to get only the last element from (criterion_id) and it can be extracted with ~N syntax where N is a position of an element you want to extract (indexing is starting from 0).

If the resource you're selecting looks like this customers/111/campaignAudienceViews/222~333 you can specify campaign_audience_view.resource_name~1 to extract the second element (333). If you specify campaign_audience_view.resource_name~0 you'll get '222' (the last resource id before ~).

SELECT
    campaign_audience_view.resource_name~1 AS criterion_id
FROM campaign_audience_view

Virtual Columns

Virtual columns allow to specify in GAQL query some fields or expressions that are not present in Google Ads API.

SELECT
    1 AS counter,
    metrics.clicks / metrics.impressions AS ctr,
    metrics.cost_micros * 1e6 AS cost,
    campaign.app_campaign_setting.bidding_strategy_goal_type AS bidding_type
FROM campaign

Virtual columns can contain constants (i.e. 1 AS counter will add new column counter filled with 1) or expressions. Expressions can contain field selectors, constants and any arithmetics operations with them. For example metrics.clicks / metrics.impressions AS ctr will calculate metrics.clicks / metrics.impressions for each GoogleAdsRow and store the results in a new column ctr. For this the fields metrics.clicks and metrics.impressions will be fetched implicitly. Or for example campaign.target_cpa.target_cpa_micros / 1000000 AS target_cpa expression will fetch campaign.target_cpa.target_cpa_micros field but return the result of its division by 1000000.

The query parser parses a query and remove all columns which are not simple field accessors (i.e. contains anything except field names). For constants columns they will be re-added into result after executing the query. For more complex columns with expressions (i.e. some operations with fields) the result will evaluated using the response from Ads API (GoogleAdsRow).

NodeJS versions:

For constants you can also use same syntax as for macros - ${..}. For example, a query SELECT '${today()}' as date returns a column 'date' with values like '2023-02-11'. For virtual columns expressions you can use same syntax as inside ${} with all operators from MathJS library but operating any of fields available for a current resource.

There can be a confusion between two types of expressions - in ${} blocks and virtual columns. First of all virtual columns are supported by both versions (Python/NodeJS) while ${}-expressions are supported by NodeJS only. Expressions in ${} blocks are parsed in the very beginning of query parsing, before submitting a query to Ads API. They are more like templates.

Macros

You queries can contain macros. Macro is just a substitution in script text, i.e.

SELECT
    campaign.id AS campaign_id,
    metrics.clicks AS clicks
FROM campaign
WHERE
    segments.date BETWEEN "{start_date}" AND "{end_date}"

When this query is executed it's expected that two macros --macros.start_date=... and --macros.end_date=... are supplied to gaarf.

Macros in virtual columns

Macros can be used not only in WHERE statements as in the example above but also in the SELECT statement. In that case this macros will be expanded and then treated as a virtual column.

SELECT
    "{current_date}" AS date,
    campaign.id AS campaign_id,
    campaign_budget.budget_amount_micros AS budget
FROM campaign

This will return all campaign budgets and attach current date (i.e. 2023-06-01) as a date column in the output.

Common macros

gaarf by default has several common macros:

  • date_iso - current date in YYYYMMDD format (i.e. 19700101)
  • yesterday_iso - previous day date in YYYY-MM-DD format (i.e. 19700101)
  • current_date - current_date in YYYY-MM-DD format (i.e. 1970-01-01)
  • current_datetime - current datetime in YYYY-MM-DD HH:mm-ss format (i.e. 1970-01-01 00:00:00)

Templates

Your SQL scripts can be templates using a template engine: Jinja for Python and Nunjucks for NodeJS. Inside templates you can use appropriate syntax and control structures of a template engine (Jinja/Nunjucks). They are mostly compatible but please consult the documentations if you migrate between platforms (Python <-> NodeJS).

SELECT
  customer_id AS
  {% if level == "0"  %}
  root_account_id
  {% else %}
  leaf_account_id
  {% endif %}
FROM dataset1.table1
WHERE name LIKE @name

When this query is executed it's expected to have template --template.level=... is supplied to gaarf.

This will create a column named either root_account_id since the specified level is 0.

Template are great when you need to create multiple column based on condition:

SELECT
    {% for day in cohort_days %}
        SUM(GetCohort(lag_data.installs, {{day}})) AS installs_{{day}}_day,
    {% endfor %}
FROM asset_performance

When this query is executed it's expected to have template --template.cohort_days=0,1,3,7 is supplied to gaarf.

Please note that all values passed through CLI arguments are strings. But there's a special case - a value containing "," - it's converted to an array. It will create 4 columns (named installs_0_day, installs_1_day, etc).

Expressions and Macros

Note: currently expressions are supported only in NodeJS version.

Your queries can contain expressions. The syntax for expressions is ${expression}. They will be executed right after macros substitution. So macros can contain expressions inside. Both expressions and macros deal with query text before submitting it for execution. Inside expression block we can do anything that the MathJS library supports - see https://mathjs.org/docs/index.html, plus work with date and time. It's all sort of arithmetic operations, strings and dates manipulations.

One typical use-case - evaluate date/time expressions to get dynamic date conditions in queries. These are when you don't provide a specific date but evaluate it right in the query. For example, applying a condition for date range for last month, which can be expressed as a range from today minus 1 month to today (or yesterday):

WHERE start_date >= '${today()-period('P1M')}' AND end_date <= '${today()}'

will be evaluated to: WHERE start_date >= '2022-06-20 AND end_date <= '2022-07-20' if today is 2022 July 20th.

Also you can use expressions for making table names dynamic (in BQ scripts), e.g.

CREATE OR REPLACE TABLE `{bq_dataset}_bq.assetssnapshots_${format(yesterday(),'yyyyMMdd')}` AS

Supported functions:

  • datetime - factory function to create a DateTime object, by default in ISO format (datetime('2022-12-31T23:59:59')) or in a specified format in the second argument (datetime('12/31/2022 23:59','M/d/yyyy hh:mm'))
  • date - factory function to create a Date object, supported formats: date(2022,12,31), date('2022-12-31'), date('12/31/2022','M/d/yyyy')
  • duration - returns a Duration object for a string in ISO_8601 format (PnYnMnDTnHnMnS)
  • period - returns a Period object for a string in ISO_8601 format (PnYnMnD)
  • today - returns a Date object for today date
  • yesterday - returns a Date object for yesterday date
  • tomorrow - returns a Date object for tomorrow date
  • now - returns a DateTime object for current timestamp (date and time)
  • format - formats Date or DateTime using a provided format, e.g. ${format(date('2022-07-01'), 'yyyyMMdd')} returns '20220701'

Please note functions without arguments still should called with brackets (e.g. today())

For dates and datetimes the following operations are supported:

  • add or subtract Date and Period, e.g. today()-period('P1D') - subtract 1 day from today (i.e. yesterday)
  • add or subtract DateTime and Duration, e.g. now()-duration('PT12H') - subtract 12 hours from the current datetime
  • for both Date and DateTime add or subtract a number meaning it's a number of days, e.g. today()-1
  • subtract two Dates to get a Period, e.g. tomorrow()-today() - subtract today from tomorrow and get 1 day, i.e. 'P1D'
  • subtract two DateTimes to get a Duration - similar to subtracting dates but get a duration, i.e. a period with time (e.g. PT10H for 10 hours)

By default all dates will be parsed and converted from/to strings in [ISO format]((https://en.wikipedia.org/wiki/ISO_8601) (yyyy-mm-dd for dates and yyyy-mm-ddThh:mm:ss.SSS for datetimes). But additionally you can specify a format explicitly (for parsing with datetime and date function and formatting with format function) using standard Java Date and Time Patterns:

  • G Era designator
  • y Year
  • Y Week year
  • M Month in year (1-based)
  • w Week in year
  • W Week in month
  • D Day in year
  • d Day in month
  • F Day of week in month
  • E Day name in week (e.g. Tuesday)
  • u Day number of week (1 = Monday, ..., 7 = Sunday)
  • a Am/pm marker
  • H Hour in day (0-23)
  • k Hour in day (1-24)
  • K Hour in am/pm (0-11)
  • h Hour in am/pm (1-12)
  • m Minute in hour
  • s Second in minute
  • S Millisecond
  • z Time zone - General time zone (e.g. Pacific Standard Time; PST; GMT-08:00)
  • Z Time zone - RFC 822 time zone (e.g. -0800)
  • X Time zone - ISO 8601 time zone (e.g. -08; -0800; -08:00)

Examples:

${today() - period('P2D')}

output: today minus 2 days, e.g. '2022-07-19' if today is 2022-07-21

${today()+1}

output: today plus 1 days, e.g. '2022-07-22' if today is 2022-07-21

${date(2022,7,20).plusMonths(1)}

output: "2022-08-20"

Functions

NodeJS version support in-place functions in JavaScript. The functions support consists of two parts:

  • function execution - it's suffix :$func at any select field (where func is a funciton name)
  • function definition - a block below the main query starting with FUNCTIONS after which a normal JS code with a function definition follows

Example:

SELECT
  campaign.id AS campaign_id,
  campaign_criterion.ad_schedule.day_of_week:$format AS ad_schedule_day_of_week
FROM campaign_criterion
FUNCTIONS
function format(val) {
  let days = ['пн', 'вт', 'ср', 'чт', 'пт', 'сб', 'вс'];
  if (!val) return '';
  return val === 8 ? days[6] : days[val-2];
}

Here we defined a function format (converts a numeric week day into a localized day name) and then called it for campaign_criterion.ad_schedule.day_of_week column. So functions always accept only one parameter - a field value.

Built-in queries

Google Ads API Report Fetcher can also works with built-in queries, which use the following syntax:

SELECT * FROM builtin.builtin_query_name

It expacts to provide a built-in query name when selecting from special builtin namespace.

Currently the following queries are available:

  • ocid_mapping - return account_id and ocid from each child account under MCC; ocid can be used to build links to Google Ads UI.

SQL Parameters

You can use normal sql type parameters with sql argument (NodeJS only):

SELECT *
FROM {dst_dataset}.{table-src}
WHERE name LIKE @name

and to execute: gaarf-bq --macro.table-src=table1 --macro.dst_dataset=dataset1 --sql.name='myname%'

it will create a parameterized query to run in BQ:

SELECT *
FROM dataset1.table1
WHERE name LIKE @name