Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/admin-manual/cluster-management/tso.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
{
"title": "Timestamp Oracle (TSO)",
"language": "en",
"description": "Timestamp Oracle (TSO) provides globally monotonic timestamps for Doris."
}
---

## Overview

Timestamp Oracle (TSO) is a service running on the **Master FE** that generates **globally monotonic** 64-bit timestamps. Doris uses TSO as a unified version reference in distributed scenarios, avoiding the correctness risks caused by physical clock skew across nodes.

Typical use cases include:

- A unified “transaction version” across multiple tables and nodes.
- Incremental processing / version-based reads using a single global ordering.
- Better observability: a timestamp is easier to interpret than an internal version counter.

## Timestamp Format

TSO is a 64-bit integer:

- High bits: **physical time (milliseconds)** since Unix epoch
- Low bits: **logical counter** for issuing multiple unique timestamps within the same millisecond

The core guarantee of TSO is **monotonicity**, not being an exact wall clock.

## Architecture and Lifecycle

- **Master FE** hosts the `TSOService` daemon.
- FE components (for example, transaction publish and metadata repair flows) obtain timestamps from `Env.getCurrentEnv().getTSOService().getTSO()`.
- The service uses a **time window lease** (window end physical time) to reduce persistence overhead while ensuring monotonicity across master failover.

### Monotonicity Across Master Failover

On master switch, the new Master FE replays the persisted window end and calibrates the initial physical time to ensure the first TSO it issues is strictly greater than any TSO issued by the previous master.

## Configuration

TSO is controlled by FE configuration items (see [FE Configuration](../config/fe-config.md) for how to set and persist configs):

- `enable_feature_tso`
- `tso_service_update_interval_ms`
- `max_update_tso_retry_count`
- `max_get_tso_retry_count`
- `tso_service_window_duration_ms`
- `tso_time_offset_debug_mode` (test only)
- `enable_tso_persist_journal` (may affect rollback compatibility)
- `enable_tso_checkpoint_module` (may affect older versions reading newer images)

## Observability and Debugging

### FE HTTP API

You can fetch the current TSO without consuming the logical counter via FE HTTP API:

- `GET /api/tso`

See [TSO Action](../open-api/fe-http/tso-action.md) for authentication, response fields, and examples.

### System Table: `information_schema.rowsets`

When enabled, Doris records the commit TSO into rowset metadata and exposes it via:

- `information_schema.rowsets.COMMIT_TSO`

See [rowsets](../system-tables/information_schema/rowsets.md).

## FAQ

### Can I treat TSO as a wall clock?

No. Although the physical part is in milliseconds, the physical time may be advanced proactively (for example, to handle high logical counter usage), so TSO should be used as a **monotonic version** rather than a precise wall clock.
82 changes: 82 additions & 0 deletions docs/admin-manual/config/fe-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,88 @@ Is it possible to dynamically configure: true

Is it a configuration item unique to the Master FE node: false

### TSO (Timestamp Oracle)

#### `enable_feature_tso`

Default:false

IsMutable:true

Is it a configuration item unique to the Master FE node: true

Whether to enable TSO (Timestamp Oracle) related experimental features, such as recording rowset commit TSO and exposing it via system tables.

#### `tso_service_update_interval_ms`

Default:50(ms)

IsMutable:false

Is it a configuration item unique to the Master FE node: true

The update interval of the TSO service in milliseconds. The daemon periodically checks clock drift/backward and renews the time window.

#### `max_update_tso_retry_count`

Default:3

IsMutable:true

Is it a configuration item unique to the Master FE node: true

Maximum retry count when the TSO service updates the global timestamp (for example, when persisting a new window end).

#### `max_get_tso_retry_count`

Default:10

IsMutable:true

Is it a configuration item unique to the Master FE node: true

Maximum retry count when generating a new TSO.

#### `tso_service_window_duration_ms`

Default:5000(ms)

IsMutable:true

Is it a configuration item unique to the Master FE node: true

The duration of a leased TSO time window in milliseconds. The Master FE persists the window end to reduce persistence frequency while keeping monotonicity across master failover.

#### `tso_time_offset_debug_mode`

Default:0(ms)

IsMutable:true

Is it a configuration item unique to the Master FE node: false

Time offset for the TSO service in milliseconds. For test/debug only.

#### `enable_tso_persist_journal`

Default:false

IsMutable:true

Is it a configuration item unique to the Master FE node: true

Whether to persist the TSO window end into edit log. Enabling this may emit new operation codes and may break rollback compatibility with older versions.

#### `enable_tso_checkpoint_module`

Default:false

IsMutable:true

Is it a configuration item unique to the Master FE node: true

Whether to include TSO information as a checkpoint image module for faster recovery. Older versions may need to ignore unknown modules when reading newer images.

### Service

#### `query_port`
Expand Down
66 changes: 66 additions & 0 deletions docs/admin-manual/open-api/fe-http/tso-action.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
{
"title": "TSO Action",
"language": "en",
"description": "Get current TSO (Timestamp Oracle) information from the Master FE."
}
---

# TSO Action

## Request

`GET /api/tso`

## Description

Returns the current TSO (Timestamp Oracle) information from the **Master FE**.

- This endpoint is **read-only**: it returns the current TSO value **without increasing** it.
- Authentication is required. Use an account with **administrator privileges**.

## Path parameters

None.

## Query parameters

None.

## Request body

None.

## Response

On success, the response body has `code = 0` and the `data` field contains:

| Field | Type | Description |
| --- | --- | --- |
| window_end_physical_time | long | The end physical time (ms) of the current TSO window on the Master FE. |
| current_tso | long | The current composed 64-bit TSO value. |
| current_tso_physical_time | long | The extracted physical time part (ms) from `current_tso`. |
| current_tso_logical_counter | long | The extracted logical counter part from `current_tso`. |

Example:

```json
{
"code": 0,
"msg": "success",
"data": {
"window_end_physical_time": 1625097600000,
"current_tso": 123456789012345678,
"current_tso_physical_time": 1625097600000,
"current_tso_logical_counter": 123
}
}
```

## Errors

Common error cases include:

- FE is not ready
- Current FE is not master
- Authentication failure
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,5 @@ Returns basic information about the Rowset.
| DATA_DISK_SIZE | bigint | The storage space for data within the Rowset. |
| CREATION_TIME | datetime | The creation time of the Rowset. |
| NEWEST_WRITE_TIMESTAMP | datetime | The most recent write time of the Rowset. |
| SCHEMA_VERSION | int | The Schema version number of the table corresponding to the Rowset data. |
| SCHEMA_VERSION | int | The Schema version number of the table corresponding to the Rowset data. |
| COMMIT_TSO | bigint | The commit TSO recorded in the Rowset metadata (64-bit). |
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,7 @@ The functionality of creating synchronized materialized views with rollup is lim
| enable_mow_light_delete | Whether to enable writing Delete predicate with Delete statements on Unique tables with Mow. If enabled, it will improve the performance of Delete statements, but partial column updates after Delete may result in some data errors. If disabled, it will reduce the performance of Delete statements to ensure correctness. The default value of this property is `false`. This property can only be enabled on Unique Merge-on-Write tables. |
| Dynamic Partitioning Related Properties | For dynamic partitioning, refer to [Data Partitioning - Dynamic Partitioning](../../../../table-design/data-partitioning/dynamic-partitioning) |
| enable_unique_key_skip_bitmap_column | Whether to enable the [Flexible Column Update feature](../../../../data-operate/update/update-of-unique-model.md#flexible-partial-column-updates) on Unique Merge-on-Write tables. This property can only be enabled on Unique Merge-on-Write tables. |
| enable_tso | Whether to enable TSO-related features for this table (for example, recording Rowset commit TSO and exposing `information_schema.rowsets.COMMIT_TSO`). |

## Access Control Requirements

Expand Down Expand Up @@ -735,4 +736,4 @@ AS SELECT * FROM t1;

```sql
CREATE TABLE t11 LIKE t10;
```
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
{
"title": "全局时间戳服务(TSO)",
"language": "zh-CN",
"description": "TSO(Timestamp Oracle)为 Doris 提供全局单调递增的时间戳。"
}
---

## 概述

TSO(Timestamp Oracle)是运行在 **Master FE** 上的服务,用于生成 **全局单调递增** 的 64 位时间戳。Doris 在分布式场景中将 TSO 作为统一的版本基准,从而规避多节点物理时钟偏移带来的正确性风险。

典型使用场景包括:

- 跨表、跨节点的统一“事务版本号”。
- 基于全局顺序的增量计算 / 分版本读取。
- 更易观测:时间戳相比内部版本号更具可读性。

## 时间戳结构

TSO 是一个 64 位整数:

- 高位:自 Unix 纪元以来的**物理时间(毫秒)**
- 低位:用于同一毫秒内发号的**逻辑计数器**

TSO 的核心保证是**单调递增**,而不是精确反映物理时钟(wall clock)。

## 架构与生命周期

- **Master FE** 上运行 `TSOService` 守护线程。
- FE 内部组件(例如事务发布与元数据修复流程)通过 `Env.getCurrentEnv().getTSOService().getTSO()` 获取时间戳。
- 服务采用“**时间窗口租约**”(窗口右界物理时间)来降低持久化开销,同时保证切主后的单调性。

### Master 切换时的单调性保证

当发生切主时,新 Master FE 会回放持久化的窗口右界并执行时间校准,确保新主发出的第一个 TSO 严格大于旧主已经发出的所有 TSO。

## 配置项

TSO 由 FE 配置项控制(如何配置与持久化请参见 [FE 配置项](../config/fe-config.md)):

- `enable_feature_tso`
- `tso_service_update_interval_ms`
- `max_update_tso_retry_count`
- `max_get_tso_retry_count`
- `tso_service_window_duration_ms`
- `tso_time_offset_debug_mode`(仅测试/调试)
- `enable_tso_persist_journal`(可能影响回滚兼容性)
- `enable_tso_checkpoint_module`(旧版本读取新镜像可能需忽略未知模块)

## 可观测与调试

### FE HTTP 接口

可以通过 FE HTTP 接口在不消耗逻辑计数器的情况下读取当前 TSO 信息:

- `GET /api/tso`

参见 [TSO Action](../open-api/fe-http/tso-action.md) 获取鉴权方式、返回字段与示例。

### 系统表:`information_schema.rowsets`

在相关能力开启后,Doris 会将提交时的 commit tso 写入 Rowset 元数据,并通过系统表暴露:

- `information_schema.rowsets.COMMIT_TSO`

参见 [rowsets](../system-tables/information_schema/rowsets.md)。

## FAQ

### TSO 能否当作物理时钟(wall clock)使用?

不能。虽然高位包含毫秒级物理时间,但在某些情况下(例如逻辑计数器使用量较高)物理部分可能会被主动推进。因此,应将 TSO 视为**单调递增的版本**,而不是精确的物理时钟。
Loading