Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: allow DDL statements to run for unlimited time #16710

Closed
shlomi-noach opened this issue Sep 4, 2024 · 0 comments · Fixed by #16735
Closed

Feature Request: allow DDL statements to run for unlimited time #16710

shlomi-noach opened this issue Sep 4, 2024 · 0 comments · Fixed by #16735

Comments

@shlomi-noach
Copy link
Contributor

Feature Description

The vttablet variable --queryserver-config-query-timeout sets a timeout on all statements served in TabletServer/QueryExecutor. It makes sense for most queries to have some reasonable timeouts.

However, for DDL statements this is problematic, and even wrong. Consider the following arguments:

  • Some DDLs like ALTER TABLE or OPTIMIZE TABLE can runs for hours and days, well beyond any reasonable queryserver-config-query-timeout setting.
  • Say we do run such a long running DDL. The way the timeout is implemented is like so:
    1. Using a context.WithTimeout

    2. Check for timeout here:

      case <-ctx.Done():
      dbc.terminate(ctx, insideTxn, now)
      if !insideTxn {
      // wait for the execute method to finish to make connection reusable.
      <-ch
      }
      return nil, dbc.Err()

    3. Fast forward a few function calls, issue a kill query statement here:

      sql := fmt.Sprintf("kill query %d", dbc.conn.ID())
      go func() {
      _, err := killConn.Conn.ExecuteFetch(sql, -1, false)
      ch <- err
      close(ch)
      }()

The problem is that KILL QUERY does not work reliably with long running DDLs. The good scenario is that MySQL does rollback the ALTER TABLE statement, but it can and will take a proportionate time to the timeframe the statement ran before beign killed. So from the moment of KILL QUERY it may yet take hours for the original query to return/complete, which leaves the killing code waiting still.

To summarize:

  • DDLs have very different timeout requirements compared with DML
  • Timing out a DDL may not work as expected.

In light of that, I'm suggesting DDL should run without timeout at all.

It should go without saying that we advocate the use of Online DDL in vitess.

Use Case(s)

Being able to protect against normal queries unreasonable runtime, while still allowing long running DDLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment