Skip to content

HDDS-14730. Update Recon container sync to use container IDs#9842

Draft
jasonosullivan34 wants to merge 4 commits intoapache:masterfrom
jasonosullivan34:HDDS-14730-container-ids
Draft

HDDS-14730. Update Recon container sync to use container IDs#9842
jasonosullivan34 wants to merge 4 commits intoapache:masterfrom
jasonosullivan34:HDDS-14730-container-ids

Conversation

@jasonosullivan34
Copy link

@jasonosullivan34 jasonosullivan34 commented Feb 27, 2026

What changes were proposed in this pull request?

  • Change to use Container IDs to reduce the payload size for the RPCs between Recon and SCM for container sync

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14730

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@devmadhuu

@devmadhuu devmadhuu self-requested a review February 27, 2026 11:43
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jasonosullivan34 for the patch. Few comments in line with code. Pls check.

Also please add in PR description , how the patch was tested. Also better write following tests:

  • A unit test for ContainerStateMap.getContainerIDs(state, start, count) verifying pagination and state filtering

  • A unit or integration test for syncWithSCMContainerInfo() covering the "container missing from Recon, add it" path, and the "container already present, skip it" path

* @return the list of containers from SCM in a given state
* @throws IOException
*/
List<ContainerInfo> getListOfContainers(long startContainerID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused now. We can remove it if no longer used.

}

@Override
public List<ContainerInfo> getListOfContainers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused now. We can remove it if no longer used.

@Override
public List<ContainerID> getListOfContainerIDs(
ContainerID startContainerID, int count, HddsProtos.LifeCycleState state)
throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly check if this method needs to throw IOException. Looks like method throws Exception.

ContainerID startContainerID, int count, HddsProtos.LifeCycleState state)
throws IOException;

List<ContainerInfo> getListOfContainers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it, if not used.


public GetContainerCountResponseProto getClosedContainerCount(
StorageContainerLocationProtocolProtos.GetContainerCountRequestProto
GetContainerCountRequestProto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the changes in this class are showing due to formatting. Please revert them as not related to PR change.

if (isSyncDataFromSCMRunning.compareAndSet(false, true)) {
try {
List<ContainerInfo> containers = containerManager.getContainers();
List<ContainerID> containerIDs = containerManager.getContainerIDs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need not to load this list in memory, containerManager.getContainerIDs() loads all container IDs from Recon's local ContainerStateManager into a List upfront. Then for every SCM-returned ID, List.contains() does an O(n) linear scan. On a cluster with a million containers, this is O(n²) work for a full sync.
The ContainerManager interface already exposes containerExist(ContainerID id), which does a direct O(log n) map lookup without materializing any list.

listOfContainers.forEach(containerID -> {
boolean isContainerPresentAtRecon =
containers.contains(containerInfo);
containerIDs.contains(containerID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
containerIDs.contains(containerID);
boolean isContainerPresentAtRecon = containerManager.containerExist(containerID);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized above way.

SCMListContainerIDsResponseProto.newBuilder();

List<ContainerID> containerIDs = impl.getListOfContainerIDs(
startContainerID, request.getCount(), request.getState());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state is declared optional in the proto. In proto2 Java bindings, calling getState() on an unset optional enum field returns the first enum value (OPEN), not null. So should use below:

if (request.hasState()) {
   state = request.getState();
 }

Without this guard, a caller that omits the state (e.g., future tooling, a different client version) will silently receive only OPEN containers instead of all containers, which would be a silent correctness bug.

try {
List<ContainerID> results = scm.getContainerManager().getContainerIDs(
startContainerID, count, state);
AUDIT.logReadSuccess(buildAuditMessageForSuccess(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using same Audit actions. SCMAction.LIST_CONTAINER. Two semantically different operations in audit logs, making it harder to distinguish ID-only queries from full container data queries.


/**
* Returns container IDs under certain conditions.
* Search container IDs from start ID(exclusive),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerStateMap.getContainerIDs() uses tailMap(start) which is inclusive. Pls correct the javadoc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants