Skip to content

Cache type codes to eliminate shared_ptr overhead in hot paths#465

Merged
slabko merged 1 commit intoClickHouse:masterfrom
iskakaushik:cache-type-codes-perf
Mar 2, 2026
Merged

Cache type codes to eliminate shared_ptr overhead in hot paths#465
slabko merged 1 commit intoClickHouse:masterfrom
iskakaushik:cache-type-codes-perf

Conversation

@iskakaushik
Copy link
Contributor

@iskakaushik iskakaushik commented Feb 25, 2026

  • ColumnDecimal: Cache data_type_code_ at construction to avoid shared_ptr<Type> temporaries and dynamic_cast via As<>() on every Append(Int128) and At() call. The underlying storage type (Int32/Int64/Int128) is invariant after construction.
  • ColumnLowCardinality: Cache index_type_code_ to replace VisitIndexColumn() — which called Type()->GetCode() (creating a shared_ptr temporary) plus dynamic_cast per invocation — with direct static_cast in getDictionaryIndex, appendIndex, and removeLastIndex. Cache is updated in LoadBody and Swap where index_column_ may change.

Profiling showed shared_ptr<Type> destructors consuming 5.54% of CPU in ColumnDecimal::Append and 3.91% in VisitIndexColumn, making these the highest-impact non-compression optimizations available.

Estimated gain: ~7-9% combined insert throughput improvement.

@slabko
Copy link
Contributor

slabko commented Mar 2, 2026

Hi @iskakaushik,

Thank you very much for this improvement. The shared_ptr type check was indeed quite slow.

Would you be able to rebase on master to pick up the fixed tests?

ColumnDecimal: cache data_type_code_ to avoid shared_ptr<Type>
temporaries and dynamic_cast via As<>() on every Append/At call.

ColumnLowCardinality: cache index_type_code_ to replace
VisitIndexColumn (which called Type()->GetCode() + dynamic_cast
per invocation) with direct static_cast in getDictionaryIndex,
appendIndex, and removeLastIndex.

Benchmarked with a 10M row insert (43-column schema, 100K rows/batch,
LZ4 1MB chunks) over 10 interleaved iterations. Block build time
improved by 7.0% (3477ms → 3233ms, t=30.8, p≈0) and grand total
by 1.6% (12086ms → 11890ms, t=6.4, p<0.001). Insert phase itself
is unchanged as it remains LZ4-compression-bound.
@iskakaushik iskakaushik force-pushed the cache-type-codes-perf branch from 25cf2c7 to 19c4443 Compare March 2, 2026 16:56
@slabko slabko merged commit 4a278b5 into ClickHouse:master Mar 2, 2026
7 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants