Skip to content

HDDS-14702. Make lazy source container replica deletion interval configurable#9837

Open
Gargi-jais11 wants to merge 1 commit intoapache:masterfrom
Gargi-jais11:HDDS-14702
Open

HDDS-14702. Make lazy source container replica deletion interval configurable#9837
Gargi-jais11 wants to merge 1 commit intoapache:masterfrom
Gargi-jais11:HDDS-14702

Conversation

@Gargi-jais11
Copy link
Contributor

What changes were proposed in this pull request?

Currently source container replica deletion is postponed by 1 hour after the container is successfully moved from source volume to destination volume during disk balancing.
This PR aims to make this configurable, and reduce the default value from 1 hour to 5 minutes.

<property>
    <name>hdds.datanode.disk.balancer.replica.deletion.delay</name>
    <value>5m</value>
    <tag>OZONE, DATANODE, DISKBALANCER</tag>
    <description>The delay after a container is successfully moved from source
      volume to destination volume before the source container replica is deleted.
      By default this is set to 5 minutes. Unit: ns, ms, s, m, h, d.
    </description>
  </property>

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14702

How was this patch tested?

Passed existing test cases for replica deletion delay.

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review February 26, 2026 15:48
@Gargi-jais11
Copy link
Contributor Author

@ChenSammi Please review the patch.

Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for working on this.

Comment on lines +123 to +125
description = "The deletion delay after a container is successfully moved from source volume to " +
"destination volume before the source container replica is deleted. " +
"Unit could be defined with postfix (ns,ms,s,m,h,d).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "deletion" sounds redundant here.

Suggested change
description = "The deletion delay after a container is successfully moved from source volume to " +
"destination volume before the source container replica is deleted. " +
"Unit could be defined with postfix (ns,ms,s,m,h,d).")
description = "The delay after a container is successfully moved from source volume to " +
"destination volume before the source container replica is deleted. " +
"Unit could be defined with postfix (ns,ms,s,m,h,d).")

| `hdds.datanode.disk.balancer.parallel.thread` | `5` | The number of worker threads to use for moving containers in parallel. |
| `hdds.datanode.disk.balancer.service.interval` | `60s` | The time interval at which the Datanode DiskBalancer service checks for imbalance and updates its configuration. |
| `hdds.datanode.disk.balancer.stop.after.disk.even` | `true` | If true, the DiskBalancer will automatically stop its balancing activity once disks are considered balanced (i.e., all volume densities are within the threshold). |
| `hdds.datanode.disk.balancer.replica.deletion.delay` | `5m` | The deletion delay after a container is successfully moved from source volume to destination volume before the source container replica is deleted. This lazy deletion allows time before failing the read thread holding the old container replica. Unit: ns, ms, s, m, h, d. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `hdds.datanode.disk.balancer.replica.deletion.delay` | `5m` | The deletion delay after a container is successfully moved from source volume to destination volume before the source container replica is deleted. This lazy deletion allows time before failing the read thread holding the old container replica. Unit: ns, ms, s, m, h, d. |
| `hdds.datanode.disk.balancer.replica.deletion.delay` | `5m` | The delay after a container is successfully moved from source volume to destination volume before the source container replica is deleted. This lazy deletion provides a grace period before failing the read thread holding the old container replica. Unit: ns, ms, s, m, h, d. |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants