You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should OpenCHAMI adopt openchami/inventory-service as the canonical replacement for SMD, retargeted to the data model proposed in #112 with TypeID as the primary identifier format, and bridged to existing SMD consumers through a time-bounded compatibility shim? This decision unblocks a cluster of dependent work — TPM enrollment (#40), geolocated IP assignment for hot-swap (#29), the node lifecycle state machine (#91), and the hardware inventory data model (#112) — all of which are blocked or constrained by the identity question.
Category
Architecture
Stakeholders / Affected Areas
Every site operating OpenCHAMI and every service that currently talks to SMD: bss, boot-service, ochami, cloud-init, metadata-service, power-control, coresmd, magellan, ansible-smd-inventory, and all downstream community deployments. The TSC owns the outcome. HPE and Dell (as vendor members) have production deployment interest. This decision also directly affects openchami/fabrica — the code generation framework — and every service built on it (fru-tracker, boot-service, and future fabrica-generated services).
Decision Needed By
Before further investment lands in inventory-service, before #112 is accepted or rejected, and before #40 can complete its identity model. Realistically the next 1–2 TSC meetings.
Problem Statement
A. The xname conflation problem
SMD inherited from Cray's CSM a single string — the xname (e.g., x1000c1s7b0n0) — that simultaneously encodes two distinct concepts:
Physical location — which cabinet, chassis, slot, BMC port, and node
index a component currently occupies.
Hardware identity — which physical device is being referred to.
These are not the same thing, and treating them as the same causes compounding
problems throughout the stack:
RFD [RFD] TPM Enrollment and secure secret delivery #40 (TPM Enrollment, in progress) states this directly: "location,
while unique across the system, isn't stable. It is possible and even
somewhat common to remove a blade from one chassis and replace it in
another." The TPM work requires a stable, hardware-bound identity — IDevID /
LAK certificates — that travels with the device, not the slot.
RFD [RFD] Geolocated IP addresses to facilitate hot-swappable servers #29 (Geolocated IPs for hot-swap) identifies that when hardware moves
to a new xname, its BMC MAC address changes, breaking the expectation that
management IPs are location-stable. The proposed workaround — rewriting MAC
addresses — is hardware-dependent. The root cause is the xname simultaneously
driving DHCP identity and hardware identity.
Lifecycle state ([RFD] Node Lifecycle State Machine #91) asks whether node state should live in the inventory
service. The answer is ambiguous as long as "the node" means "the location"
rather than "the device." Moving a blade silently resets its lifecycle history
under the current model.
Cray vocabulary burden: The xname encoding (x, c, s, b, n for
cabinet, chassis, slot, BMC, node) is Cray-specific and meaningless to
operators using non-Cray hardware or non-Cray rack naming conventions.
B. inventory-service inherits this debt by construction
openchami/inventory-service
is a fabrica-generated, API-compatible rewrite of SMD. It has valuable
properties — BMC-discovery has been pruned, tokensmith handles AuthN/AuthZ, and
it runs as a lightweight Go binary. But because it is API-compatible, it
inherits SMD's schema, including the xname as primary key. Continuing inventory-service as-is is implicitly a decision to continue the
xname-as-identity model. That should be a conscious choice, not a default.
RFD #112 (Hardware Inventory API, proposed Sept 2025) independently arrived at
a different model:
Device.id is a UUID — stable, system-assigned, not location-encoded.
Physical location is a separate concept; the xname, if needed at all, is
a properties entry that magellan populates when it knows it.
parentID expresses hierarchy (GPU belongs to Node, Node belongs to Chassis)
without Cray vocabulary.
An arbitrary properties map handles vendor-specific fields without schema
changes.
This model directly resolves the instability problems in #40 and #29. It is
also the foundation of #112's proposed Inventory / History / Collection API
trio. However, #112 has no implementation and its relationship to inventory-service has not been decided. This RFD decides it.
D. Plain UUIDs are stable but opaque
A bare UUID (d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0) carries no information
about what kind of thing it identifies. An operator grepping a log file or
debugging a boot failure has no way to distinguish a node ID from a BMC ID
without a separate lookup. This replaces Cray vocabulary friction with total
opacity.
It also creates a practical bug class: nothing in the type system prevents
passing a BMC identifier to a function expecting a node identifier. SMD's xnames
at least made the shape visible — x1000c1s7b0 and x1000c1s7b0n0 are
obviously different kinds of thing. A UUID-only model loses that signal entirely.
The xname's readability was a feature. The proposed solution preserves it.
E. fabrica already has a prefixed ID scheme — but with insufficient entropy
openchami/fabrica already generates IDs in the form prefix-<8-random-hex-chars>
(e.g., device-1a2b3c4d). fru-tracker registers "Device" → "device" and
uses GenerateUIDForResource("Device") to produce these IDs at record creation.
This scheme has the right instinct — type prefix plus opaque suffix — but two
problems prevent using it as-is for the inventory plane:
32 bits of entropy (4 billion values per prefix) is insufficient for
global uniqueness across federated sites. A birthday attack reaches 50%
collision probability at ~65,000 records per prefix.
ParseUID enforces prefix-hex format using strings.Split(uid, "-"),
so it would reject TypeID's underscore-separated, base32-encoded format.
TypeID is the natural evolution of fabrica's existing instinct, with a published
spec, a Go library, and a UUID v7 suffix that is both globally unique and
time-sortable.
F. The existing SMD bugs that motivated this work
Real and unresolved, but secondary to the identity question:
Any replacement path addresses these. The identity question cannot be fixed by
patching SMD.
Proposed Solution
Retarget inventory-service to implement the #112Device model with TypeID
as the primary identifier, and bridge existing SMD consumers through a
time-bounded compatibility shim.
1. What TypeID is
TypeID is a published open specification
for type-safe, time-sortable, globally unique identifiers, inspired by Stripe's cus_, ch_ pattern. A TypeID has two parts separated by an underscore:
node_01926e3ca4f17xyz8ab9cd0ef1
└──┘ └──────────────────────────┘
type UUID v7 in Crockford base32
prefix (26 chars, URL-safe)
The type prefix is a lowercase snake_case string. An operator reading
any log line, API response, or database record immediately knows the kind of
resource.
The suffix is a UUID v7 encoded in Crockford base32 — globally unique,
monotonically time-sortable, 26 characters, URL-safe without quoting.
The full identifier is a single string with no hyphens: node_01926e3ca4f17xyz8ab9cd0ef1.
TypeID is supported by a Go library (github.com/jetify-com/typeid-go) and
implementations in most major languages. UUID v7 is an IETF standard (RFC 9562,
2024); Postgres stores it natively in the uuid column type, and the
time-ordered insertion pattern eliminates the B-tree index fragmentation of UUID
v4.
2. The prefix vocabulary
The TSC should own and publish a canonical OpenCHAMI TypeID prefix registry.
Proposed initial prefixes:
Prefix
Resource
node
Compute node
bmc
Baseboard management controller
chassis
Chassis or blade enclosure
rack
Physical rack
switch
Network switch
gpu
GPU device (child of node)
nic
Network interface card
dimm
Memory module
pdu
Power distribution unit
This list is not exhaustive and grows by TSC approval. Just as the xname grammar
was governed by Cray, the TypeID prefix registry is governed by the OpenCHAMI
TSC. It is a governance artifact, not a code artifact — a short Markdown
document in the .github or community repo.
node_... is a node. chassis_... is its parent. The parentID is
type-legible without a lookup. When a blade moves from slot 7 to slot 3,
magellan updates properties.xname. FRU history, lifecycle state, attestation
certificates, and error logs all remain attached to the same node_... TypeID.
4. Changes to inventory-service
inventory-service is the right vehicle for this implementation; the concern
is the schema, not the code. Concretely:
Adopt TypeID as the primary key. The fabrica code-generation model
derives schema from Go structs. The id field changes from an xname string
to a TypeID wrapper type. The UUID v7 suffix is stored in a native uuid
Postgres column for index efficiency; the prefix is validated at the service
layer. This is a struct change, not a live-database migration.
Implement the Device model from [RFD]: Hardware Inventory API and Data Model Proposal #112 — properties as map[string]json.RawMessage, parentID as a TypeID reference, deviceType
as an extensible string rather than a hard-coded Cray enum.
Magellan writes xname into properties. Magellan populates properties.xname as part of its inventory push. Location is discoverable
data, not foundational identity.
Query by property.GET /devices?properties.xname=x1000c1s7b0n0 lets
existing callers resolve a TypeID from an xname without knowing it up front.
The service returns the TypeID; callers use it for all subsequent operations.
5. The SMD compatibility shim
Services that currently talk to SMD — bss, cloud-init, coresmd, power-control, ansible-smd-inventory — cannot all migrate simultaneously.
An explicit compatibility shim provides runway.
Design: The shim is a thin reverse proxy that translates between the
SMD-compatible wire format (xname-primary, Cray vocabulary) and the inventory-service API (TypeID-primary, #112 model). It is deployed alongside inventory-service, not inside it.
Accepts requests on the /hsm/v2/... path prefix (SMD's existing API shape).
Translates xname-based queries to properties.xname lookups against inventory-service.
Translates TypeID responses back to xname-primary responses for the legacy
caller.
Is stateless and carries no inventory data of its own.
Ships with a committed sunset date — not open-ended compatibility.
Scope: The shim covers the read-heavy paths that existing services actually
use: State/Components, Inventory/Hardware, hsm/v2/State/Components group
queries. It does not need to replicate SMD's discovery endpoints, which are
magellan's responsibility.
Sunset timeline: The shim should be deprecated on a published date,
suggested 12–18 months after inventory-service reaches feature parity with
SMD's inventory capabilities. Sites that have not migrated by that date are
pinned to the previous SMD release.
6. Changes to fabrica
TypeID adoption in inventory-service surfaces a natural next step: update
fabrica so that all fabrica-generated services benefit automatically.
What changes in fabrica:
GenerateUID / GenerateUIDForResource in pkg/resource/resource.go:
replace prefix-<8hex> generation with TypeID generation
(prefix_<uuidv7-base32>). Add github.com/jetify-com/typeid-go as a
dependency.
ParseUID: currently assumes strings.Split(uid, "-") with a hex suffix.
Replace with TypeID parsing, or retain as a format-aware utility that accepts
both formats during a transition window.
IsValidUID and GetResourceTypeFromUID: update to match.
routes.go.tmpl: the line resource.RegisterResourcePrefix("{{.Name}}", "{{toLower .Name}}") continues to work — toLower produces valid TypeID prefixes. DiscoverySnapshot → discoverysnapshot is a valid TypeID prefix.
Breaking change assessment: The UID field is typed string throughout the
fabrica storage layer (StorageBackend, ent schema field.String("uid")).
There are no uuid.UUID typed fields to migrate. ParseUID is not called in
any generated handler or storage template — it is a diagnostic utility. The file
backend uses the UID only as a filename component, guarding only against path
traversal (. and /). New-format TypeIDs are valid filenames.
The breaking change is data-level, not code-level: existing records stored
with device-1a2b3c4d format UIDs coexist with new device_01926e... format
records in the same database, because the UID column is a plain string. A
one-time migration utility should be provided to rewrite existing records to
TypeID format for services that want consistency. fru-tracker is at v0.1.0
with no production deployments, so the timing is favorable.
7. Composability constraint (unchanged)
OpenCHAMI components must not hard-depend on heavy external systems to function. inventory-service must be deployable standalone. Sites that run DCIM platforms
(Nautobot, NetBox, or others) can build integration adapters against the inventory-service API; those adapters are optional components outside inventory-service's deployment footprint.
8. Spec publication (end state, not precondition)
Once inventory-service has converged through real use, extract the OpenAPI
spec from the implementation and publish it with a conformance test suite.
The spec is a deliverable, not a constraint. The TypeID prefix registry should
be published alongside it as a companion governance document.
Alternatives Considered
Option A: Continue inventory-service as xname-compatible SMD rewrite
Keep the xname as primary key, fix the discrete bugs, and accept the identity
model as-is. Lowest implementation risk. Preserves full backward compatibility
with all existing consumers.
Why not recommended: The instability problems in #40 and #29 remain
structural. The lifecycle state machine (#91) cannot cleanly answer "what
happens to node state when hardware moves?" The TPM work (#40) must either bolt
TPM IDs on as secondary keys alongside xnames (increasing complexity) or create
a separate identity service (introducing consistency problems). Every downstream
issue that stems from location-as-identity gets harder to address the longer the
xname is the primary key.
Option B: Plain UUID without a type prefix
Use bare UUID as in #112's original proposal. Globally unique, location-stable,
universally supported by databases and language runtimes.
Why not recommended as primary format: Opaque to operators — no signal about
resource type in the identifier itself. Creates a latent bug class where a BMC
UUID can be passed to a function expecting a node UUID without any type-system
signal. TypeID adds readability and Go type safety at negligible cost over bare
UUID v7. If the TSC judges TypeID's non-IETF-standard status unacceptable, UUID
v7 (RFC 9562) is the preferred fallback. UUID v4 should not be chosen because
its random insertion order causes B-tree index fragmentation at HPC scale.
Option C: Build from scratch rather than evolve inventory-service
Implement the #112 model in a new service rather than evolving inventory-service.
Why not recommended:inventory-service already has meaningful work done —
fabrica alignment, tokensmith integration, BMC-plane separation, initial Go
structure. The concern is the schema, not the code. The schema change (struct
update + TypeID dependency) is far less work than building a new service.
Maintain both a TypeID-primary internal model and a permanent xname-primary API
surface, supporting both indefinitely rather than teiming-bounding the shim.
Why not recommended: Permanent dual-API creates maintenance burden and
removes the incentive for consumers to migrate. The shim should have a published
sunset date. Open-ended compatibility is a slow path to SMD's current situation.
Option E: Incremental SMD (do nothing)
Patch bugs in SMD, defer replacement. Lowest risk, but architectural debt
compounds. The xname instability is not addressable through bug patches.
xname doesn't disappear; it loses primary key status. Sites with
operational tooling built on xnames — console servers, Slurm node names,
runbooks — don't need to change overnight. The xname lives in properties,
is queryable via the property filter API, and magellan continues to populate
it. The shim keeps legacy services working during the transition.
Magellan's role expands slightly. Under the current xname model, magellan
registers hardware by submitting an xname. Under the TypeID model, magellan
submits hardware with its manufacturer/serial/part data; inventory-service
assigns the TypeID; magellan annotates the returned device with properties.xname based on its scan context. This is a small but real change
to the magellan → inventory-service handshake and warrants a follow-on RFD.
coresmd is the hardest downstream consumer. It calls SMD's API directly
for DHCP lease generation keyed on xname. The shim covers this during the
transition. Long-term, coresmd resolves the xname to a TypeID via GET /devices?properties.xname=... and then uses the TypeID for all
subsequent operations within a session.
Lifecycle state machine ([RFD] Node Lifecycle State Machine #91). State attaches to a TypeID-identified
device, not a location. Moving hardware updates properties.xname; the
lifecycle record stays attached to the TypeID and is unaffected.
Multi-vendor Redfish (roadmap [RFD]: Redfish Interface Strategy #123). Vendor-specific Redfish quirks belong
only in magellan. Magellan's vendor normalization produces a clean Device
record regardless of vendor; inventory-service never sees vendor-specific
encoding.
Single-maintainer risk.inventory-service has one primary committer.
Growing active contributors is a parallel objective, independent of this
decision.
Performance. TypeID suffixes are UUID v7 values — monotonically
time-ordered, no B-tree page splits. This is a performance improvement over
UUID v4 and comparable to string xname columns at HPC scale. Verify against
SMD's production workload before full promotion.
Phasing and feature parity. The first milestone for inventory-service
under the TypeID model: feature parity with SMD's inventory capabilities (not
its discovery capabilities, which are magellan's responsibility). New features
and SMD feature pruning are deferred to subsequent RFDs.
Data sovereignty. Once the implementation has converged, the TSC should
own and publish both the API spec and the TypeID prefix registry independently
of any single implementation.
TypeID primary key in inventory-service. Add github.com/jetify-com/typeid-go. Change id field to TypeID; store UUID
v7 suffix in a native uuid Postgres column.
properties map. Implement map[string]json.RawMessage including xname as a first-class property key that magellan populates.
Query by property.GET /devices?properties.xname=... and similar filter
expressions for xname-based lookup.
SMD compatibility shim. Build and deploy the translation layer covering
the read paths bss, cloud-init, coresmd, remote-console and power-control actually
use. Commit a sunset date.
TSC prefix registry. Publish the canonical prefix list as a governance
document. Establish the process for adding prefixes.
TypeID in fabrica. Update GenerateUID / GenerateUIDForResource to
produce TypeIDs. Update ParseUID to accept TypeID format. Ship a migration
utility for services with existing prefix-<hex> records. Bump fabrica minor
version.
BSS, cloud-init, coresmd, power-control, remote-console adapters. Each downstream service
migrates from xname lookup to TypeID-first lookup with properties.xname
filter for legacy resolution. Coordinate with shim sunset.
Magellan handshake update. Adapt magellan to submit device data without
an xname primary key; receive a TypeID back; annotate with properties.xname.
Follow-on RFD for the full interface spec.
Decision Goal
Should OpenCHAMI adopt
openchami/inventory-serviceas the canonical replacement for SMD, retargeted to the data model proposed in #112 with TypeID as the primary identifier format, and bridged to existing SMD consumers through a time-bounded compatibility shim? This decision unblocks a cluster of dependent work — TPM enrollment (#40), geolocated IP assignment for hot-swap (#29), the node lifecycle state machine (#91), and the hardware inventory data model (#112) — all of which are blocked or constrained by the identity question.Category
Architecture
Stakeholders / Affected Areas
Every site operating OpenCHAMI and every service that currently talks to SMD:
bss,boot-service,ochami,cloud-init,metadata-service,power-control,coresmd,magellan,ansible-smd-inventory, and all downstream community deployments. The TSC owns the outcome. HPE and Dell (as vendor members) have production deployment interest. This decision also directly affectsopenchami/fabrica— the code generation framework — and every service built on it (fru-tracker,boot-service, and future fabrica-generated services).Decision Needed By
Before further investment lands in
inventory-service, before #112 is accepted or rejected, and before #40 can complete its identity model. Realistically the next 1–2 TSC meetings.Problem Statement
A. The xname conflation problem
SMD inherited from Cray's CSM a single string — the xname (e.g.,
x1000c1s7b0n0) — that simultaneously encodes two distinct concepts:index a component currently occupies.
These are not the same thing, and treating them as the same causes compounding
problems throughout the stack:
while unique across the system, isn't stable. It is possible and even
somewhat common to remove a blade from one chassis and replace it in
another." The TPM work requires a stable, hardware-bound identity — IDevID /
LAK certificates — that travels with the device, not the slot.
to a new xname, its BMC MAC address changes, breaking the expectation that
management IPs are location-stable. The proposed workaround — rewriting MAC
addresses — is hardware-dependent. The root cause is the xname simultaneously
driving DHCP identity and hardware identity.
service. The answer is ambiguous as long as "the node" means "the location"
rather than "the device." Moving a blade silently resets its lifecycle history
under the current model.
x,c,s,b,nforcabinet, chassis, slot, BMC, node) is Cray-specific and meaningless to
operators using non-Cray hardware or non-Cray rack naming conventions.
B.
inventory-serviceinherits this debt by constructionopenchami/inventory-serviceis a
fabrica-generated, API-compatible rewrite of SMD. It has valuableproperties — BMC-discovery has been pruned, tokensmith handles AuthN/AuthZ, and
it runs as a lightweight Go binary. But because it is API-compatible, it
inherits SMD's schema, including the xname as primary key. Continuing
inventory-serviceas-is is implicitly a decision to continue thexname-as-identity model. That should be a conscious choice, not a default.
C. RFD #112 already proposed a clean data model
RFD #112 (Hardware Inventory API, proposed Sept 2025) independently arrived at
a different model:
Device.idis a UUID — stable, system-assigned, not location-encoded.a
propertiesentry that magellan populates when it knows it.parentIDexpresses hierarchy (GPU belongs to Node, Node belongs to Chassis)without Cray vocabulary.
propertiesmap handles vendor-specific fields without schemachanges.
This model directly resolves the instability problems in #40 and #29. It is
also the foundation of #112's proposed Inventory / History / Collection API
trio. However, #112 has no implementation and its relationship to
inventory-servicehas not been decided. This RFD decides it.D. Plain UUIDs are stable but opaque
A bare UUID (
d5e6f7a8-b9c0-d1e2-f3g4-h5i6j7k8l9m0) carries no informationabout what kind of thing it identifies. An operator grepping a log file or
debugging a boot failure has no way to distinguish a node ID from a BMC ID
without a separate lookup. This replaces Cray vocabulary friction with total
opacity.
It also creates a practical bug class: nothing in the type system prevents
passing a BMC identifier to a function expecting a node identifier. SMD's xnames
at least made the shape visible —
x1000c1s7b0andx1000c1s7b0n0areobviously different kinds of thing. A UUID-only model loses that signal entirely.
The xname's readability was a feature. The proposed solution preserves it.
E. fabrica already has a prefixed ID scheme — but with insufficient entropy
openchami/fabricaalready generates IDs in the formprefix-<8-random-hex-chars>(e.g.,
device-1a2b3c4d).fru-trackerregisters"Device" → "device"anduses
GenerateUIDForResource("Device")to produce these IDs at record creation.This scheme has the right instinct — type prefix plus opaque suffix — but two
problems prevent using it as-is for the inventory plane:
global uniqueness across federated sites. A birthday attack reaches 50%
collision probability at ~65,000 records per prefix.
ParseUIDenforcesprefix-hexformat usingstrings.Split(uid, "-"),so it would reject TypeID's underscore-separated, base32-encoded format.
TypeID is the natural evolution of fabrica's existing instinct, with a published
spec, a Go library, and a UUID v7 suffix that is both globally unique and
time-sortable.
F. The existing SMD bugs that motivated this work
Real and unresolved, but secondary to the identity question:
remote-console.second-class.
profile-based rollouts awkward.
Any replacement path addresses these. The identity question cannot be fixed by
patching SMD.
Proposed Solution
Retarget
inventory-serviceto implement the #112Devicemodel with TypeIDas the primary identifier, and bridge existing SMD consumers through a
time-bounded compatibility shim.
1. What TypeID is
TypeID is a published open specification
for type-safe, time-sortable, globally unique identifiers, inspired by Stripe's
cus_,ch_pattern. A TypeID has two parts separated by an underscore:snake_casestring. An operator readingany log line, API response, or database record immediately knows the kind of
resource.
monotonically time-sortable, 26 characters, URL-safe without quoting.
node_01926e3ca4f17xyz8ab9cd0ef1.TypeID is supported by a Go library (
github.com/jetify-com/typeid-go) andimplementations in most major languages. UUID v7 is an IETF standard (RFC 9562,
2024); Postgres stores it natively in the
uuidcolumn type, and thetime-ordered insertion pattern eliminates the B-tree index fragmentation of UUID
v4.
2. The prefix vocabulary
The TSC should own and publish a canonical OpenCHAMI TypeID prefix registry.
Proposed initial prefixes:
nodebmcchassisrackswitchgpunicdimmpduThis list is not exhaustive and grows by TSC approval. Just as the xname grammar
was governed by Cray, the TypeID prefix registry is governed by the OpenCHAMI
TSC. It is a governance artifact, not a code artifact — a short Markdown
document in the
.githuborcommunityrepo.3. What the new inventory record looks like
{ "id": "node_01926e3ca4f17xyz8ab9cd0ef1", "deviceType": "Node", "manufacturer": "HPE", "partNumber": "P38472-B21", "serialNumber": "CZ123456789", "parentID": "chassis_01926e3ca4f07abc1de2fg3hi", "properties": { "xname": "x1000c1s7b0n0", "nid": 42, "tpm.idevid_cert": "..." } }node_...is a node.chassis_...is its parent. TheparentIDistype-legible without a lookup. When a blade moves from slot 7 to slot 3,
magellan updates
properties.xname. FRU history, lifecycle state, attestationcertificates, and error logs all remain attached to the same
node_...TypeID.4. Changes to
inventory-serviceinventory-serviceis the right vehicle for this implementation; the concernis the schema, not the code. Concretely:
fabricacode-generation modelderives schema from Go structs. The
idfield changes from an xname stringto a TypeID wrapper type. The UUID v7 suffix is stored in a native
uuidPostgres column for index efficiency; the prefix is validated at the service
layer. This is a struct change, not a live-database migration.
Devicemodel from [RFD]: Hardware Inventory API and Data Model Proposal #112 —propertiesasmap[string]json.RawMessage,parentIDas a TypeID reference,deviceTypeas an extensible string rather than a hard-coded Cray enum.
properties. Magellan populatesproperties.xnameas part of its inventory push. Location is discoverabledata, not foundational identity.
GET /devices?properties.xname=x1000c1s7b0n0letsexisting callers resolve a TypeID from an xname without knowing it up front.
The service returns the TypeID; callers use it for all subsequent operations.
5. The SMD compatibility shim
Services that currently talk to SMD —
bss,cloud-init,coresmd,power-control,ansible-smd-inventory— cannot all migrate simultaneously.An explicit compatibility shim provides runway.
Design: The shim is a thin reverse proxy that translates between the
SMD-compatible wire format (xname-primary, Cray vocabulary) and the
inventory-serviceAPI (TypeID-primary, #112 model). It is deployed alongsideinventory-service, not inside it.Specifically, the shim:
/hsm/v2/...path prefix (SMD's existing API shape).properties.xnamelookups againstinventory-service.caller.
Scope: The shim covers the read-heavy paths that existing services actually
use:
State/Components,Inventory/Hardware,hsm/v2/State/Componentsgroupqueries. It does not need to replicate SMD's discovery endpoints, which are
magellan's responsibility.
Sunset timeline: The shim should be deprecated on a published date,
suggested 12–18 months after
inventory-servicereaches feature parity withSMD's inventory capabilities. Sites that have not migrated by that date are
pinned to the previous SMD release.
6. Changes to
fabricaTypeID adoption in
inventory-servicesurfaces a natural next step: updatefabrica so that all fabrica-generated services benefit automatically.
What changes in fabrica:
GenerateUID/GenerateUIDForResourceinpkg/resource/resource.go:replace
prefix-<8hex>generation with TypeID generation(
prefix_<uuidv7-base32>). Addgithub.com/jetify-com/typeid-goas adependency.
ParseUID: currently assumesstrings.Split(uid, "-")with a hex suffix.Replace with TypeID parsing, or retain as a format-aware utility that accepts
both formats during a transition window.
IsValidUIDandGetResourceTypeFromUID: update to match.routes.go.tmpl: the lineresource.RegisterResourcePrefix("{{.Name}}", "{{toLower .Name}}")continues to work —toLowerproduces valid TypeID prefixes.DiscoverySnapshot→discoverysnapshotis a valid TypeID prefix.Breaking change assessment: The UID field is typed
stringthroughout thefabrica storage layer (
StorageBackend, ent schemafield.String("uid")).There are no
uuid.UUIDtyped fields to migrate.ParseUIDis not called inany generated handler or storage template — it is a diagnostic utility. The file
backend uses the UID only as a filename component, guarding only against path
traversal (
.and/). New-format TypeIDs are valid filenames.The breaking change is data-level, not code-level: existing records stored
with
device-1a2b3c4dformat UIDs coexist with newdevice_01926e...formatrecords in the same database, because the UID column is a plain string. A
one-time migration utility should be provided to rewrite existing records to
TypeID format for services that want consistency.
fru-trackeris at v0.1.0with no production deployments, so the timing is favorable.
7. Composability constraint (unchanged)
OpenCHAMI components must not hard-depend on heavy external systems to function.
inventory-servicemust be deployable standalone. Sites that run DCIM platforms(Nautobot, NetBox, or others) can build integration adapters against the
inventory-serviceAPI; those adapters are optional components outsideinventory-service's deployment footprint.8. Spec publication (end state, not precondition)
Once
inventory-servicehas converged through real use, extract the OpenAPIspec from the implementation and publish it with a conformance test suite.
The spec is a deliverable, not a constraint. The TypeID prefix registry should
be published alongside it as a companion governance document.
Alternatives Considered
Option A: Continue inventory-service as xname-compatible SMD rewrite
Keep the xname as primary key, fix the discrete bugs, and accept the identity
model as-is. Lowest implementation risk. Preserves full backward compatibility
with all existing consumers.
Why not recommended: The instability problems in #40 and #29 remain
structural. The lifecycle state machine (#91) cannot cleanly answer "what
happens to node state when hardware moves?" The TPM work (#40) must either bolt
TPM IDs on as secondary keys alongside xnames (increasing complexity) or create
a separate identity service (introducing consistency problems). Every downstream
issue that stems from location-as-identity gets harder to address the longer the
xname is the primary key.
Option B: Plain UUID without a type prefix
Use bare UUID as in #112's original proposal. Globally unique, location-stable,
universally supported by databases and language runtimes.
Why not recommended as primary format: Opaque to operators — no signal about
resource type in the identifier itself. Creates a latent bug class where a BMC
UUID can be passed to a function expecting a node UUID without any type-system
signal. TypeID adds readability and Go type safety at negligible cost over bare
UUID v7. If the TSC judges TypeID's non-IETF-standard status unacceptable, UUID
v7 (RFC 9562) is the preferred fallback. UUID v4 should not be chosen because
its random insertion order causes B-tree index fragmentation at HPC scale.
Option C: Build from scratch rather than evolve inventory-service
Implement the #112 model in a new service rather than evolving
inventory-service.Why not recommended:
inventory-servicealready has meaningful work done —fabrica alignment, tokensmith integration, BMC-plane separation, initial Go
structure. The concern is the schema, not the code. The schema change (struct
update + TypeID dependency) is far less work than building a new service.
Option D: Permanent dual-API (TypeID internal, xname external forever)
Maintain both a TypeID-primary internal model and a permanent xname-primary API
surface, supporting both indefinitely rather than teiming-bounding the shim.
Why not recommended: Permanent dual-API creates maintenance burden and
removes the incentive for consumers to migrate. The shim should have a published
sunset date. Open-ended compatibility is a slow path to SMD's current situation.
Option E: Incremental SMD (do nothing)
Patch bugs in SMD, defer replacement. Lowest risk, but architectural debt
compounds. The xname instability is not addressable through bug patches.
Comparison of approaches
✓✓ = strong · ✓ = adequate · ○ = workable but lossy · ✗ = poor
Other Considerations
Relationship between this RFD, [RFD]: Hardware Inventory API and Data Model Proposal #112, and inventory-service. These three
things are currently independent. This RFD proposes resolving that: [RFD]: Hardware Inventory API and Data Model Proposal #112's
Devicemodel is the target schema forinventory-service, and TypeIDreplaces [RFD]: Hardware Inventory API and Data Model Proposal #112's original bare UUID
idfield. If there are concerns with[RFD]: Hardware Inventory API and Data Model Proposal #112's model, raise them in that issue; this RFD assumes the
location-decoupled, properties-map approach is directionally correct.
xname doesn't disappear; it loses primary key status. Sites with
operational tooling built on xnames — console servers, Slurm node names,
runbooks — don't need to change overnight. The xname lives in
properties,is queryable via the property filter API, and magellan continues to populate
it. The shim keeps legacy services working during the transition.
Magellan's role expands slightly. Under the current xname model, magellan
registers hardware by submitting an xname. Under the TypeID model, magellan
submits hardware with its manufacturer/serial/part data;
inventory-serviceassigns the TypeID; magellan annotates the returned device with
properties.xnamebased on its scan context. This is a small but real changeto the magellan → inventory-service handshake and warrants a follow-on RFD.
coresmd is the hardest downstream consumer. It calls SMD's API directly
for DHCP lease generation keyed on xname. The shim covers this during the
transition. Long-term, coresmd resolves the xname to a TypeID via
GET /devices?properties.xname=...and then uses the TypeID for allsubsequent operations within a session.
TPM identity storage ([RFD] TPM Enrollment and secure secret delivery #40). The TPM RFD can store
tpm.idevid_certandrelated fields in
propertieswithout a separate identity service. The stablenode_...TypeID is the anchor. This resolves the design fork in [RFD] TPM Enrollment and secure secret delivery #40 between"add to SMD" vs. "create a new service."
Lifecycle state machine ([RFD] Node Lifecycle State Machine #91). State attaches to a TypeID-identified
device, not a location. Moving hardware updates
properties.xname; thelifecycle record stays attached to the TypeID and is unaffected.
Multi-vendor Redfish (roadmap [RFD]: Redfish Interface Strategy #123). Vendor-specific Redfish quirks belong
only in magellan. Magellan's vendor normalization produces a clean
Devicerecord regardless of vendor;
inventory-servicenever sees vendor-specificencoding.
Single-maintainer risk.
inventory-servicehas one primary committer.Growing active contributors is a parallel objective, independent of this
decision.
Performance. TypeID suffixes are UUID v7 values — monotonically
time-ordered, no B-tree page splits. This is a performance improvement over
UUID v4 and comparable to string xname columns at HPC scale. Verify against
SMD's production workload before full promotion.
Phasing and feature parity. The first milestone for
inventory-serviceunder the TypeID model: feature parity with SMD's inventory capabilities (not
its discovery capabilities, which are magellan's responsibility). New features
and SMD feature pruning are deferred to subsequent RFDs.
Data sovereignty. Once the implementation has converged, the TSC should
own and publish both the API spec and the TypeID prefix registry independently
of any single implementation.
Work Items
Devicemodel as
inventory-service's target schema.github.com/jetify-com/typeid-go. Changeidfield to TypeID; store UUIDv7 suffix in a native
uuidPostgres column.propertiesmap. Implementmap[string]json.RawMessageincludingxnameas a first-class property key that magellan populates.GET /devices?properties.xname=...and similar filterexpressions for xname-based lookup.
the read paths
bss,cloud-init,coresmd,remote-consoleandpower-controlactuallyuse. Commit a sunset date.
document. Establish the process for adding prefixes.
GenerateUID/GenerateUIDForResourcetoproduce TypeIDs. Update
ParseUIDto accept TypeID format. Ship a migrationutility for services with existing
prefix-<hex>records. Bump fabrica minorversion.
migrates from xname lookup to TypeID-first lookup with
properties.xnamefilter for legacy resolution. Coordinate with shim sunset.
an xname primary key; receive a TypeID back; annotate with
properties.xname.Follow-on RFD for the full interface spec.
tpm.idevid_certand related fieldsin
properties.xname.
inventory-service.
deviceTypefieldplus open prefix registry; virtual nodes use prefix
node, distinguished bydeviceType.Coordinate shim sunset with magellan rollout progress.
SMD's production workload before promoting to default.
Related Docs / PRs
idfield)openchami/smdopenchami/inventory-serviceopenchami/magellan· magellan #129openchami/fabricaopenchami/fru-tracker