Deprecate relying on the current implementation of the database object name parser #6592

morozov · 2024-11-12T01:58:05Z

Q	A
Type	deprecation

Currently, when building DDL, the DBAL quotes only the identifies that are explicitly quoted or are reserved keywords. The reason for not quoting all of them is that the databases that respect the SQL-92 standard normalize the case of unquoted identifiers but preserve the case of the quoted ones. If the DBAL quoted all identifiers, then an object containing lower-case letters in its name would have to be referenced as quoted in SQL (e.g, "id") on Oracle and IBM DB2. Similarly, a column containing upper-case letters would have to be quoted on PostgreSQL.

The current design has the following major downsides:

The need to maintain lists of reserved keywords for supported platforms.
Inconsistency of the resulting object names on Oracle and IBM DB2 depending on whether the name is a reserved keyword. For instance, an unquoted column named "id" will be created in the database as upper-case ID but unquoted (and implicitly quoted) "select" will be created as lower-case select.
Lack of auto-quoting the names that aren't keywords but need to be quoted (e.g. the ones that begin with a digit).
Potential security issues.

Additionally, the current object name parser is rudimentary and exposes various issue (see #4357 for examples).

While the solution proposed in #4772 is hard to pull off at once, this change should be a manageable first step. The implementation is drafted in #6589.

The logic is the following:

Parse the name according to simplified SQL syntax (see the details in the parser's PHPDoc).
Normalize unquoted identifiers according to the rules of the destination database platforms.
Consistently quote all identifiers. At this point, knowing whether the identifier is a reserved keyword is unnecessary.

The upgrade path:

Compare the results of parsing the name using the current and the new implementations.
Trigger a deprecation notice in case of mismatch. Besides incorrect parsing of quoted identifiers containing dots, all other deprecations should be possible to address by properly formatting the object name.

The overhead of using two parsers should be insignificant since schema management is usually not a hot path.

morozov force-pushed the new-asset-name-parser branch from 0fc75bb to 4e67695 Compare November 12, 2024 02:04

morozov added 3 commits November 11, 2024 18:07

Introduce new object name parser

d4acb22

Introduce normalizeUnquotedIdentifier()

98617a6

Add deprecations for current parser behavior

358c6ea

morozov force-pushed the new-asset-name-parser branch from 4e67695 to 358c6ea Compare November 12, 2024 02:07

morozov added Quoting Reserved Keywords Deprecation Identifiers labels Nov 12, 2024

morozov marked this pull request as ready for review November 12, 2024 02:17

morozov added this to the 4.3.0 milestone Nov 12, 2024

morozov requested review from greg0ire and derrabus November 12, 2024 02:17

greg0ire approved these changes Nov 12, 2024

View reviewed changes

morozov merged commit 43d8490 into doctrine:4.3.x Nov 13, 2024
90 of 91 checks passed

morozov deleted the new-asset-name-parser branch November 13, 2024 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate relying on the current implementation of the database object name parser #6592

Deprecate relying on the current implementation of the database object name parser #6592

morozov commented Nov 12, 2024 •

edited

Loading

Deprecate relying on the current implementation of the database object name parser #6592

Deprecate relying on the current implementation of the database object name parser #6592

Conversation

morozov commented Nov 12, 2024 • edited Loading

morozov commented Nov 12, 2024 •

edited

Loading