Provider lifecycle race condition in multi-threaded SDKs

I've been digging provider lifecycle in Go and Swift SDKs, and I believe there's a couple of things in OF spec that make it hard to implement it correctly in the presence of true multi-threading.

The main one is that there's no clear ownership of the provider state and provider events — it's split between the provider and the API, and both provider and the API are allowed to emit events. This is an issue because neither the provider nor the API are equipped to ensure the consistent ordering of events

The API is required to emit "provider ready" event and update the state after initialization is complete. However, once the provider returns from the init function, there's a brief window before the API updates the state and emits the event. This is not an issue in single-threaded languages like js or python. However, with multi-threading, the provider is allowed to do work in the background and emit an error event within that window — and now the order of provider ready and error events is undefined.

Current state across multi-threaded SDKs:
- Go: race condition
- Swift: 🐛 implements eventing but does not update state on provider events
- Java: race condition
- Kotlin: race condition
- Dotnet: race condition
- Rust: eventing not implemented

This would be trivial to fix if only one of API or provider were allowed to emit events (or update the state), so they could use locks/queues to order the events/updates properly.

If both are allowed to emit, the only way to prevent a race is for the API to guard eventing with a lock and to hold this lock throughout the provider initialization. This way, no provider events would come through until API processes its own ready event first.

The main issue with this approach is that it locks up the API/provider and serializes all calls to init/shutdown/on context change. This more or less completely destroys concurrency and prevents the provider from reacting to context change / shutdown if initialization is taking a while. This also easily leads to deadlocks if provider tries to emit events from/during lifecycle handlers.

Both race condition and total locking sound bad, so I would really suggest moving state management responsibility back to providers — they are the only party who knows their own state at all times, and they are equipped to resolve any races.

I see that the spec is trying to lift the burden from provider authors but imo SDK trying to manage providers' state is not helping. Anecdotally, I can add that as a provider author, the eventing and state management has been the most confusing part of the OF spec/SDKs, and I've spent quite a bit of time trying to make SDK set the state I want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provider lifecycle race condition in multi-threaded SDKs #365

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provider lifecycle race condition in multi-threaded SDKs #365

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions