Skip to content

Commit

Permalink
Merge pull request ClickHouse#58274 from ClickHouse/revert-58267
Browse files Browse the repository at this point in the history
  • Loading branch information
alexey-milovidov authored Dec 28, 2023
2 parents 224e937 + c7efd2a commit 583b963
Show file tree
Hide file tree
Showing 57 changed files with 69 additions and 786 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, clean_deleted_rows=value, ...]
[SETTINGS name=value, ...]
```

For a description of request parameters, see [statement description](../../../sql-reference/statements/create/table.md).
Expand Down Expand Up @@ -88,53 +88,6 @@ SELECT * FROM mySecondReplacingMT FINAL;
└─────┴─────────┴─────────────────────┘
```

### is_deleted

`is_deleted` — Name of a column used during a merge to determine whether the data in this row represents the state or is to be deleted; `1` is a “deleted“ row, `0` is a “state“ row.

Column data type — `UInt8`.

:::note
`is_deleted` can only be enabled when `ver` is used.

The row is deleted when `OPTIMIZE ... FINAL CLEANUP` or `OPTIMIZE ... FINAL` is used, or if the engine setting `clean_deleted_rows` has been set to `Always`.

No matter the operation on the data, the version must be increased. If two inserted rows have the same version number, the last inserted row is the one kept.

:::

Example:
```sql
-- with ver and is_deleted
CREATE OR REPLACE TABLE myThirdReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime,
`is_deleted` UInt8
)
ENGINE = ReplacingMergeTree(eventTime, is_deleted)
ORDER BY key;

INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 01:01:01', 0);
INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 01:01:01', 1);

select * from myThirdReplacingMT final;

0 rows in set. Elapsed: 0.003 sec.

-- delete rows with is_deleted
OPTIMIZE TABLE myThirdReplacingMT FINAL CLEANUP;

INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 00:00:00', 0);

select * from myThirdReplacingMT final;

┌─key─┬─someCol─┬───────────eventTime─┬─is_deleted─┐
1 │ first │ 2020-01-01 00:00:000
└─────┴─────────┴─────────────────────┴────────────┘
```

## Query clauses

When creating a `ReplacingMergeTree` table the same [clauses](../../../engines/table-engines/mergetree-family/mergetree.md) are required, as when creating a `MergeTree` table.
Expand Down
10 changes: 0 additions & 10 deletions docs/en/operations/settings/merge-tree-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -852,16 +852,6 @@ If the file name for column is too long (more than `max_file_name_length` bytes)

The maximal length of the file name to keep it as is without hashing. Takes effect only if setting `replace_long_file_name_to_hash` is enabled. The value of this setting does not include the length of file extension. So, it is recommended to set it below the maximum filename length (usually 255 bytes) with some gap to avoid filesystem errors. Default value: 127.

## clean_deleted_rows

Enable/disable automatic deletion of rows flagged as `is_deleted` when perform `OPTIMIZE ... FINAL` on a table using the ReplacingMergeTree engine. When disabled, the `CLEANUP` keyword has to be added to the `OPTIMIZE ... FINAL` to have the same behaviour.

Possible values:

- `Always` or `Never`.

Default value: `Never`

## allow_experimental_block_number_column

Persists virtual column `_block_number` on merges.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,59 +86,6 @@ SELECT * FROM mySecondReplacingMT FINAL;
1 │ first │ 2020-01-01 01:01:01
└─────┴─────────┴─────────────────────┘
```
### is_deleted

`is_deleted` — Имя столбца, который используется во время слияния для обозначения того, нужно ли отображать строку или она подлежит удалению; `1` - для удаления строки, `0` - для отображения строки.

Тип данных столбца — `UInt8`.

:::note
`is_deleted` может быть использован, если `ver` используется.

Строка удаляется в следующих случаях:

- при использовании инструкции `OPTIMIZE ... FINAL CLEANUP`
- при использовании инструкции `OPTIMIZE ... FINAL`
- параметр движка `clean_deleted_rows` установлен в значение `Always` (по умолчанию - `Never`)
- есть новые версии строки

Не рекомендуется выполнять `FINAL CLEANUP` или использовать параметр движка `clean_deleted_rows` со значением `Always`, это может привести к неожиданным результатам, например удаленные строки могут вновь появиться.

Вне зависимости от производимых изменений над данными, версия должна увеличиваться. Если у двух строк одна и та же версия, то остается только последняя вставленная строка.
:::

Пример:

```sql
-- with ver and is_deleted
CREATE OR REPLACE TABLE myThirdReplacingMT
(
`key` Int64,
`someCol` String,
`eventTime` DateTime,
`is_deleted` UInt8
)
ENGINE = ReplacingMergeTree(eventTime, is_deleted)
ORDER BY key;

INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 01:01:01', 0);
INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 01:01:01', 1);

select * from myThirdReplacingMT final;

0 rows in set. Elapsed: 0.003 sec.

-- delete rows with is_deleted
OPTIMIZE TABLE myThirdReplacingMT FINAL CLEANUP;

INSERT INTO myThirdReplacingMT Values (1, 'first', '2020-01-01 00:00:00', 0);

select * from myThirdReplacingMT final;

┌─key─┬─someCol─┬───────────eventTime─┬─is_deleted─┐
1 │ first │ 2020-01-01 00:00:000
└─────┴─────────┴─────────────────────┴────────────┘
```

## Секции запроса

Expand Down
1 change: 1 addition & 0 deletions programs/server/config.d/graphite_alternative.xml
2 changes: 0 additions & 2 deletions src/Core/SettingsEnums.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,6 @@ IMPLEMENT_SETTING_AUTO_ENUM(DefaultDatabaseEngine, ErrorCodes::BAD_ARGUMENTS)

IMPLEMENT_SETTING_AUTO_ENUM(DefaultTableEngine, ErrorCodes::BAD_ARGUMENTS)

IMPLEMENT_SETTING_AUTO_ENUM(CleanDeletedRows, ErrorCodes::BAD_ARGUMENTS)

IMPLEMENT_SETTING_MULTI_ENUM(MySQLDataTypesSupport, ErrorCodes::UNKNOWN_MYSQL_DATATYPES_SUPPORT_LEVEL,
{{"decimal", MySQLDataTypesSupport::DECIMAL},
{"datetime64", MySQLDataTypesSupport::DATETIME64},
Expand Down
8 changes: 0 additions & 8 deletions src/Core/SettingsEnums.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,14 +140,6 @@ enum class DefaultTableEngine

DECLARE_SETTING_ENUM(DefaultTableEngine)

enum class CleanDeletedRows
{
Never = 0, /// Disable.
Always,
};

DECLARE_SETTING_ENUM(CleanDeletedRows)

enum class MySQLDataTypesSupport
{
DECIMAL, // convert MySQL's decimal and number to ClickHouse Decimal when applicable
Expand Down
3 changes: 0 additions & 3 deletions src/Interpreters/Context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
#include <Common/thread_local_rng.h>
#include <Common/FieldVisitorToString.h>
#include <Common/getMultipleKeysFromConfig.h>
#include <Common/getNumberOfPhysicalCPUCores.h>
#include <Common/callOnce.h>
#include <Common/SharedLockGuard.h>
#include <Coordination/KeeperDispatcher.h>
Expand All @@ -33,7 +32,6 @@
#include <Storages/StorageS3Settings.h>
#include <Disks/DiskLocal.h>
#include <Disks/ObjectStorages/DiskObjectStorage.h>
#include <Disks/ObjectStorages/IObjectStorage.h>
#include <Disks/StoragePolicy.h>
#include <Disks/IO/IOUringReader.h>
#include <IO/SynchronousReader.h>
Expand All @@ -45,7 +43,6 @@
#include <Interpreters/Cache/FileCacheFactory.h>
#include <Interpreters/SessionTracker.h>
#include <Core/ServerSettings.h>
#include <Interpreters/PreparedSets.h>
#include <Core/Settings.h>
#include <Core/SettingsQuirks.h>
#include <Access/AccessControl.h>
Expand Down
2 changes: 1 addition & 1 deletion src/Interpreters/InterpreterOptimizeQuery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ BlockIO InterpreterOptimizeQuery::execute()
if (auto * snapshot_data = dynamic_cast<MergeTreeData::SnapshotData *>(storage_snapshot->data.get()))
snapshot_data->parts = {};

table->optimize(query_ptr, metadata_snapshot, ast.partition, ast.final, ast.deduplicate, column_names, ast.cleanup, getContext());
table->optimize(query_ptr, metadata_snapshot, ast.partition, ast.final, ast.deduplicate, column_names, getContext());

return {};
}
Expand Down
3 changes: 0 additions & 3 deletions src/Parsers/ASTOptimizeQuery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,6 @@ void ASTOptimizeQuery::formatQueryImpl(const FormatSettings & settings, FormatSt
if (deduplicate)
settings.ostr << (settings.hilite ? hilite_keyword : "") << " DEDUPLICATE" << (settings.hilite ? hilite_none : "");

if (cleanup)
settings.ostr << (settings.hilite ? hilite_keyword : "") << " CLEANUP" << (settings.hilite ? hilite_none : "");

if (deduplicate_by_columns)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " BY " << (settings.hilite ? hilite_none : "");
Expand Down
4 changes: 1 addition & 3 deletions src/Parsers/ASTOptimizeQuery.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,10 @@ class ASTOptimizeQuery : public ASTQueryWithTableAndOutput, public ASTQueryWithO
bool deduplicate = false;
/// Deduplicate by columns.
ASTPtr deduplicate_by_columns;
/// Delete 'is_deleted' data
bool cleanup = false;
/** Get the text that identifies this element. */
String getID(char delim) const override
{
return "OptimizeQuery" + (delim + getDatabase()) + delim + getTable() + (final ? "_final" : "") + (deduplicate ? "_deduplicate" : "")+ (cleanup ? "_cleanup" : "");
return "OptimizeQuery" + (delim + getDatabase()) + delim + getTable() + (final ? "_final" : "") + (deduplicate ? "_deduplicate" : "");
}

ASTPtr clone() const override
Expand Down
8 changes: 3 additions & 5 deletions src/Parsers/ParserOptimizeQuery.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
ASTPtr partition;
bool final = false;
bool deduplicate = false;
bool cleanup = false;
String cluster_str;

if (!s_optimize_table.ignore(pos, expected))
Expand Down Expand Up @@ -70,9 +69,6 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
if (s_deduplicate.ignore(pos, expected))
deduplicate = true;

if (s_cleanup.ignore(pos, expected))
cleanup = true;

ASTPtr deduplicate_by_columns;
if (deduplicate && s_by.ignore(pos, expected))
{
Expand All @@ -81,6 +77,9 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
return false;
}

/// Obsolete feature, ignored for backward compatibility.
s_cleanup.ignore(pos, expected);

auto query = std::make_shared<ASTOptimizeQuery>();
node = query;

Expand All @@ -90,7 +89,6 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
query->final = final;
query->deduplicate = deduplicate;
query->deduplicate_by_columns = deduplicate_by_columns;
query->cleanup = cleanup;
query->database = database;
query->table = table;

Expand Down
41 changes: 5 additions & 36 deletions src/Processors/Merges/Algorithms/ReplacingSortedAlgorithm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,22 @@
#include <Columns/ColumnsNumber.h>
#include <IO/WriteBuffer.h>

namespace DB
{

namespace ErrorCodes
namespace DB
{
extern const int INCORRECT_DATA;
}

ReplacingSortedAlgorithm::ReplacingSortedAlgorithm(
const Block & header_,
size_t num_inputs,
SortDescription description_,
const String & is_deleted_column,
const String & version_column,
size_t max_block_size_rows,
size_t max_block_size_bytes,
WriteBuffer * out_row_sources_buf_,
bool use_average_block_sizes,
bool cleanup_)
bool use_average_block_sizes)
: IMergingAlgorithmWithSharedChunks(header_, num_inputs, std::move(description_), out_row_sources_buf_, max_row_refs)
, merged_data(header_.cloneEmptyColumns(), use_average_block_sizes, max_block_size_rows, max_block_size_bytes), cleanup(cleanup_)
, merged_data(header_.cloneEmptyColumns(), use_average_block_sizes, max_block_size_rows, max_block_size_bytes)
{
if (!is_deleted_column.empty())
is_deleted_column_number = header_.getPositionByName(is_deleted_column);
if (!version_column.empty())
version_column_number = header_.getPositionByName(version_column);
}
Expand Down Expand Up @@ -73,15 +65,7 @@ IMergingAlgorithm::Status ReplacingSortedAlgorithm::merge()

/// Write the data for the previous primary key.
if (!selected_row.empty())
{
if (is_deleted_column_number!=-1)
{
if (!(cleanup && assert_cast<const ColumnUInt8 &>(*(*selected_row.all_columns)[is_deleted_column_number]).getData()[selected_row.row_num]))
insertRow();
}
else
insertRow();
}
insertRow();

selected_row.clear();
}
Expand All @@ -91,13 +75,6 @@ IMergingAlgorithm::Status ReplacingSortedAlgorithm::merge()
if (out_row_sources_buf)
current_row_sources.emplace_back(current.impl->order, true);

if ((is_deleted_column_number!=-1))
{
const UInt8 is_deleted = assert_cast<const ColumnUInt8 &>(*current->all_columns[is_deleted_column_number]).getData()[current->getRow()];
if ((is_deleted != 1) && (is_deleted != 0))
throw Exception(ErrorCodes::INCORRECT_DATA, "Incorrect data: is_deleted = {} (must be 1 or 0).", toString(is_deleted));
}

/// A non-strict comparison, since we select the last row for the same version values.
if (version_column_number == -1
|| selected_row.empty()
Expand Down Expand Up @@ -128,15 +105,7 @@ IMergingAlgorithm::Status ReplacingSortedAlgorithm::merge()

/// We will write the data for the last primary key.
if (!selected_row.empty())
{
if (is_deleted_column_number!=-1)
{
if (!(cleanup && assert_cast<const ColumnUInt8 &>(*(*selected_row.all_columns)[is_deleted_column_number]).getData()[selected_row.row_num]))
insertRow();
}
else
insertRow();
}
insertRow();

return Status(merged_data.pull(), true);
}
Expand Down
6 changes: 1 addition & 5 deletions src/Processors/Merges/Algorithms/ReplacingSortedAlgorithm.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,19 @@ class ReplacingSortedAlgorithm final : public IMergingAlgorithmWithSharedChunks
ReplacingSortedAlgorithm(
const Block & header, size_t num_inputs,
SortDescription description_,
const String & is_deleted_column,
const String & version_column,
size_t max_block_size_rows,
size_t max_block_size_bytes,
WriteBuffer * out_row_sources_buf_ = nullptr,
bool use_average_block_sizes = false,
bool cleanup = false);
bool use_average_block_sizes = false);

const char * getName() const override { return "ReplacingSortedAlgorithm"; }
Status merge() override;

private:
MergedData merged_data;

ssize_t is_deleted_column_number = -1;
ssize_t version_column_number = -1;
bool cleanup = false;

using RowRef = detail::RowRefWithOwnedChunk;
static constexpr size_t max_row_refs = 2; /// last, current.
Expand Down
9 changes: 3 additions & 6 deletions src/Processors/Merges/ReplacingSortedTransform.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,21 @@ class ReplacingSortedTransform final : public IMergingTransform<ReplacingSortedA
ReplacingSortedTransform(
const Block & header, size_t num_inputs,
SortDescription description_,
const String & is_deleted_column, const String & version_column,
const String & version_column,
size_t max_block_size_rows,
size_t max_block_size_bytes,
WriteBuffer * out_row_sources_buf_ = nullptr,
bool use_average_block_sizes = false,
bool cleanup = false)
bool use_average_block_sizes = false)
: IMergingTransform(
num_inputs, header, header, /*have_all_inputs_=*/ true, /*limit_hint_=*/ 0, /*always_read_till_end_=*/ false,
header,
num_inputs,
std::move(description_),
is_deleted_column,
version_column,
max_block_size_rows,
max_block_size_bytes,
out_row_sources_buf_,
use_average_block_sizes,
cleanup)
use_average_block_sizes)
{
}

Expand Down
Loading

0 comments on commit 583b963

Please sign in to comment.