Skip to content

Document distribution statistics.#9945

Open
jmcarp wants to merge 1 commit intomainfrom
jmcarp/document-distribution-estimates
Open

Document distribution statistics.#9945
jmcarp wants to merge 1 commit intomainfrom
jmcarp/document-distribution-estimates

Conversation

@jmcarp
Copy link
Contributor

@jmcarp jmcarp commented Feb 27, 2026

Distributions have two kinds of statistics: histogram bucket counts, which can be converted from cumulative values to deltas by subtraction; and streaming estimates (min, max, quantiles, etc.), which can't. Because streaming estimates can't be subtracted, they always reflect values since the start of the series's epoch, and can't be used to examine a specific time range (e.g., the last hour). Users should probably use histogram statistics and not streaming statistics for monitoring purposes.

This patch updates the inline docs to clarify these differences and steer users toward histograms in most use cases.

@jmcarp jmcarp requested a review from bnaecker February 27, 2026 22:33
@jmcarp
Copy link
Contributor Author

jmcarp commented Feb 27, 2026

I almost want to make the distribution streaming estimates private for now, or hide them behind a flag, because I think users probably shouldn't be using them for monitoring applications. For now, I'm just proposing a docs change, but we can also talk about hiding these somehow if useful.

/// ranges. However, other statistical fields represent streaming
/// calculations: their value at a given point represents the cumulative
/// streaming estimate dating to the start time of that series. Streaming
/// estimates are also only available for the 0th point in a series and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular sentence seems more confusing than helpful, IMO. I'd probably cut it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I couldn't think of a clear explanation that wasn't really long, so better to omit for now.

/// streaming estimate dating to the start time of that series. Streaming
/// estimates are also only available for the 0th point in a series and
/// after gaps in the series, since they can't be subtracted over time
/// points. Use histogram statistics rather than streaming statistics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time "histogram statistics" is used. Could we define it or use another phrase, like "statistics derived from bin counts"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just changing this to "bin counts"

@jmcarp jmcarp force-pushed the jmcarp/document-distribution-estimates branch 2 times, most recently from 0674a20 to bfc529f Compare March 2, 2026 20:50
Distributions have two kinds of statistics: histogram bucket counts, which can
be converted from cumulative values to deltas by subtraction; and streaming
estimates (min, max, quantiles, etc.), which can't. Because streaming estimates
can't be subtracted, they always reflect values since the start of the series's
epoch, and can't be used to examine a specific time range (e.g., the last
hour). Users should probably use histogram statistics and not streaming
statistics for monitoring purposes.

This patch updates the inline docs to clarify these differences and steer users
toward histograms in most use cases.
@jmcarp jmcarp force-pushed the jmcarp/document-distribution-estimates branch from bfc529f to fda073a Compare March 2, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants