Continuum Resources LLC — Applied AI Research Series
WP-CR-2025-07  ·  Unclassified  ·  Public Release Authorized

Embedding-Driven
Requirements Management

A Semantic, Embedding-Based Approach to Requirements Engineering — Improving Traceability, Impact Analysis, and System Alignment Across Complex Enterprise and Defense Systems

Author
Kurt A. Richardson, PhD
Affiliation
Head of R&D, Continuum Resources LLC
Published
March 2025
Classification
Unclassified // Public
Domain
MBSE · Systems Engineering · AI
Scroll to read
Section 00

Executive Summary

Requirements management is simultaneously the most foundational and most frequently failing discipline in complex systems engineering. In defense programs, space systems, financial platforms, and enterprise software — the environments where requirements failure costs are measured in billions, mission readiness, and lives — traditional keyword-based traceability approaches are producing false confidence. A link exists between a requirement and a test case; the link says nothing about whether the test case semantically covers the requirement's intent.

Semantic embedding technology changes this. By representing requirements, design artifacts, test cases, and system behaviors as high-dimensional vectors in a shared semantic space, organizations can measure semantic coverage, detect implicit conflicts, perform automated impact analysis, and enforce system alignment in ways that keyword matching and manual traceability matrices cannot approach. This paper presents Continuum Resources' embedding-driven requirements management framework — the research and methodology that underpins our requirements engineering practice for MBSE programs and complex enterprise systems.

67%
of defects in complex systems trace to requirements gaps or inconsistencies discovered late in development
0.82
Mean cosine similarity threshold above which embedding-based traceability matches expert human judgment
Reduction in manual traceability effort achieved in pilot programs using embedding-assisted tooling
⚡ Core Thesis

Requirements traceability has been treated as a documentation exercise. Embedding-driven requirements management treats it as an analytical capability — one that surfaces gaps, conflicts, and impacts that no human review team can find at the scale of a modern complex system. The shift from link management to semantic analysis is as significant for requirements engineering as the shift from waterfall to agile was for delivery.

Section 01

Introduction: Requirements at Scale

The James Webb Space Telescope had approximately 344 potential single-point failures that required individual design solutions. The F-35 program manages over 250,000 system requirements across nine aircraft variants and multiple contractor organizations. A modern enterprise financial platform may have 50,000 functional requirements spanning regulatory compliance, business logic, security, and integration interfaces. At this scale, requirements management is not a process — it is a data management and analytics challenge for which most organizations are using tools designed for hundreds of requirements, not hundreds of thousands.

The failure mode is insidious. Requirements exist. They are documented. They have identifiers. They are linked — nominally — to design artifacts and test cases through traceability matrices maintained with varying degrees of fidelity. A program manager looking at a traceability coverage report sees green across the board. What the report does not show: that a substantial fraction of those links are semantic mismatches — a keyword in the requirement title happened to match a keyword in the test case name, but the test case tests a different behavior than the requirement specifies. The requirement is "covered" in the database. It is not covered in reality.

"Traceability has become a compliance theater in most complex programs. The link exists; the link is maintained; the link is reported. Whether the link means what it claims to mean — whether the test case actually validates the requirement — is a question the tooling was never designed to answer."
— Kurt A. Richardson, PhD, Head of R&D, Continuum Resources LLC

The Embedding Opportunity

Large language models and the transformer architectures behind them have produced a capability with profound implications for requirements management: the ability to encode natural language text as dense semantic vectors that capture meaning, not just keywords. Two sentences that express the same idea in completely different words will have high cosine similarity in embedding space. Two sentences that share keywords but describe different behaviors will have low similarity. This is the property that makes embedding-driven requirements management qualitatively different from all prior automated approaches.

The application of this capability to requirements engineering is the subject of this paper — drawing on Continuum's published research, operational MBSE practice, and the results of applying these techniques on active DoD and enterprise programs. This is not a laboratory result; it is an operational methodology.

Section 02

The Requirements Problem

Requirements management failures in complex systems manifest in four primary patterns, each with compounding effects as system complexity grows:

FAILURE 01
Traceability Gaps at Scale
As requirements count grows beyond human review capacity, traceability maintenance becomes inconsistent. New requirements are added; existing links are not updated to reflect the new requirement's coverage implications. After 12–18 months of active development, the traceability matrix reflects the program's starting state more than its current state.
FAILURE 02
Semantic Misalignment
Requirements written at different points in time, by different authors, using different terminology may specify the same system behavior — or conflicting behaviors — without any keyword match signaling the relationship. Duplicate requirements silently diverge; conflicting requirements are both implemented; neither is detected until integration testing or operational failure.
FAILURE 03
Impact Blindness
When a requirement changes, understanding the full downstream impact — which design decisions, test cases, code modules, and other requirements are affected — requires traversing the traceability graph. In practice, this traversal is done incompletely. Changed requirements propagate through part of the system; the rest experiences an undocumented delta that surfaces as defects in testing or operation.
FAILURE 04
Stakeholder-System Drift
Over the development lifecycle, stakeholder needs evolve, system designs accumulate local optimizations, and the original requirements intent gradually diverges from what the system is actually being built to do. Without semantic alignment checking, this drift is invisible until acceptance testing reveals that the system is technically compliant with documented requirements but does not serve the mission as intended.

The Cost of Requirements Failures in Defense Programs

The Government Accountability Office has documented requirements instability and traceability failures as contributing factors in the majority of major defense acquisition programs experiencing cost overruns and schedule delays. The Systems Engineering Research Center (SERC) estimates that requirements defects discovered in system testing cost 10–100× more to fix than the same defects caught at requirements review. For programs with 250,000 requirements, even a 0.1% semantic mismatch rate in traceability represents 250 potentially undetected coverage gaps — each of which is a defect incubating toward operational failure.

Section 03

Traditional Approaches & Their Limits

Requirements management tooling has evolved significantly since the first requirements databases, but the fundamental analytical approach has not. Understanding why existing approaches fall short is prerequisite to appreciating what embedding-driven methods add.

ApproachMethodStrengthCritical Limitation
Manual Traceability Matrices Human review creates and maintains links between requirements and artifacts High accuracy when maintained; captures expert judgment Does not scale; maintenance burden grows quadratically; becomes stale within months
Keyword / Regex Matching Automated link suggestion based on shared terms and identifiers Fast; scales well; reduces manual effort for initial link creation Synonyms missed; false positives from shared terms with different semantics; no conflict detection
TF-IDF Vector Models Statistical term frequency weighting to compute document similarity Better than keyword matching; captures term importance; well-understood No semantic generalization; "velocity" and "speed" are unrelated; out-of-vocabulary terms unseen
Ontology-Based Approaches Domain ontologies define relationships between concepts; requirements aligned to ontology Strong semantic coverage within ontology scope; supports formal reasoning Ontology construction is expensive and brittle; does not generalize beyond defined concepts; maintenance burden
Model-Based (MBSE) Requirements represented as model elements with formal relationships in SysML/UML Formal semantics; rigorous; supports simulation and verification Natural language requirements must be manually translated to model elements; gap analysis is still keyword-limited
Embedding-Driven (CDMF) Dense semantic vectors encode meaning; cosine similarity measures semantic coverage Handles synonyms; detects implicit conflicts; generalizes across terminology; scales linearly Requires domain calibration; hallucination risk in LLM-augmented pipelines; threshold tuning required
Section 04

Semantic Embeddings: A Technical Primer

A semantic embedding is a dense, high-dimensional vector representation of text — produced by a neural language model trained to encode the meaning of language rather than just its tokens. Requirements engineers do not need to understand the architecture of transformer models, but do need to understand three key properties that make embeddings useful for requirements analysis.

Property 1: Semantic Proximity

Requirements that express similar intent — even in completely different language — will have embeddings that are close together in the high-dimensional vector space, as measured by cosine similarity. Consider:

  • REQ-001: "The system shall authenticate users before granting access to classified data."
  • REQ-247: "Prior to displaying restricted information, the platform must verify user identity."

These requirements share no significant keywords — "authenticate" vs. "verify identity," "granting access" vs. "displaying," "classified" vs. "restricted" — yet an embedding model trained on natural language will place them close together because they express the same semantic intent. A keyword-based tool sees no relationship; an embedding-based tool measures a similarity score above 0.85, correctly suggesting a potential duplicate or dependency.

Property 2: Semantic Distance for Conflict Detection

Requirements that appear to address the same domain but specify conflicting behaviors will have moderate cosine similarity — close enough to indicate they concern the same system aspect, but distinct enough to flag for human review. This "conflict zone" in similarity space (typically 0.6–0.8 cosine similarity for requirements in the same domain) is invisible to keyword matching but detectable through embedding analysis.

Property 3: Cross-Artifact Semantic Coverage

The most powerful application: embeddings allow semantic comparison across artifact types. A requirement, a design element, a test case, and a code module can all be embedded in the same vector space. Cosine similarity between a requirement vector and a test case vector directly measures whether the test case semantically covers the requirement — not just whether it was manually linked to it. This is the basis for automated traceability gap detection.

🔬 Embedding Models for Requirements Engineering

Not all embedding models are equally suited for requirements analysis. General-purpose models (OpenAI text-embedding-3-large, Cohere Embed v3) perform well for functional requirements in standard English. For requirements with highly technical domain vocabulary — SATCOM system specifications, financial regulatory requirements, aerospace interface control documents — domain-adapted fine-tuning substantially improves semantic accuracy. Continuum's requirements embedding pipeline uses a base model with domain-specific calibration tuned on INCOSE-aligned requirements corpora.

Cosine Similarity Threshold Interpretation

Cosine Similarity RangeSemantic RelationshipRecommended ActionRE Implication
0.90 – 1.00Near-identical meaningFlag as potential duplicate; require explicit differentiationMerge candidates; or document intentional distinction
0.80 – 0.90Strong semantic overlapConfirm traceability link is correct; inspect for coverageHigh-confidence traceability; review for completeness
0.65 – 0.80Moderate semantic relationshipHuman review recommended; potential conflict zoneCandidate links requiring expert validation
0.45 – 0.65Weak, topical relationshipFlag for domain expert review if high-risk requirementsDomain proximity; likely separate concerns
Below 0.45Semantically distinctNo automated traceability link suggestedUnrelated requirements; traceability gap if link expected
Section 05

Embedding-Driven Traceability

Traditional traceability creates a graph of explicit links between requirements and artifacts. Embedding-driven traceability creates a semantic space in which the proximity of any two artifacts reflects their meaning similarity — and derives traceability insights from that space rather than from manually curated links. The two approaches are complementary: embeddings augment and validate explicit links, and surface gaps and conflicts that manual link management cannot detect.

Automated Link Generation and Validation

The embedding pipeline generates traceability candidates by computing cosine similarities between all requirement-artifact pairs above a configurable threshold. For programs with tens of thousands of requirements and artifacts, this generates a ranked list of candidate links that focuses human review effort on the cases most likely to represent real semantic relationships — or the cases where existing links are potentially incorrect.

Link validation is the more valuable function: for every existing human-maintained traceability link, the system computes the cosine similarity between the linked artifacts. Links with cosine similarity below the threshold (e.g., below 0.65) are flagged as potentially incorrect or stale — a requirement was linked to a test case that, on semantic analysis, does not actually address the requirement's intent. These "weak link" flags are the most directly actionable output of the embedding traceability pipeline.

Semantic Coverage Analysis

Coverage analysis answers: "For each requirement, are there test cases in the test suite that semantically cover its specified behavior?" This is distinct from structural coverage (does a link exist?) and provides a ground-truth coverage estimate that reflects testing intent rather than database state.

  • Full coverage: One or more test cases with cosine similarity ≥ 0.80 to the requirement. Requirement is semantically tested.
  • Partial coverage: Test cases with similarity 0.65–0.80 exist. Coverage is possible but uncertain — human review required to confirm the test case actually tests the requirement behavior.
  • Gap identified: No test cases with similarity ≥ 0.65. The requirement is likely not tested. This is a genuine traceability gap regardless of what the link database shows.

Duplicate and Conflict Detection

Within the requirements set itself, embedding analysis detects two critical conditions:

  • Near-duplicates (similarity ≥ 0.90): Requirements expressing the same behavior in different language. Common in large programs where multiple authors write requirements without complete visibility of the existing corpus. Near-duplicates must be explicitly reconciled — either merged into a single authoritative requirement or documented as intentionally distinct with an explicit differentiation rationale.
  • Potential conflicts (similarity 0.65–0.85, same domain cluster): Requirements that concern the same system aspect but specify behaviors that may be incompatible. For example, a performance requirement specifying maximum latency and a reliability requirement specifying redundancy behavior that necessarily introduces latency. Embedding analysis surfaces these as candidates; domain experts adjudicate whether a real conflict exists and how it is resolved.
Section 06

Automated Impact Analysis

When a requirement changes, the impact analysis question is: what else changes? Traditional impact analysis traverses the explicit traceability graph — following links from the changed requirement to directly linked artifacts, then to artifacts linked to those, and so on. This graph traversal has two failure modes: it misses implicit impacts (artifacts semantically affected but not explicitly linked), and it amplifies noise (follows weak or incorrect links to irrelevant artifacts).

Embedding-driven impact analysis adds a semantic layer to graph traversal. When requirement R changes, the system:

  1. Computes the embedding of the changed requirement and its predecessor version.
  2. Measures the semantic delta between the two versions — the change's impact direction and magnitude in semantic space.
  3. Identifies all artifacts (design elements, test cases, other requirements, code modules) within a semantic neighborhood of the changed requirement — not just explicitly linked ones.
  4. Ranks the impacted artifacts by their cosine proximity to the semantic delta, distinguishing between artifacts that concern the aspect that changed versus aspects that were stable.
  5. Generates a stratified impact report: Critical (high semantic proximity to the changed aspect), Potential (moderate proximity), and Review (low proximity but in the same domain cluster).
✓ Real-World Result: SATCOM Program

In a Continuum-supported satellite communication program, a change to a link budget requirement (specifying minimum received signal margin) was processed through both traditional graph traversal and embedding-driven impact analysis. Traditional analysis identified 12 directly linked design elements. Embedding-driven analysis identified 38 semantically impacted elements — including 14 that had no explicit link to the requirement but were semantically affected by the link budget change (antenna pointing accuracy, atmospheric modeling parameters, ground station configuration requirements). Of these 14, 9 were subsequently confirmed by the systems engineering team as genuinely impacted. The traditional analysis would have produced an incomplete change assessment.

Change Propagation Modeling

For programs operating under a formal engineering change proposal process, embedding-driven impact analysis supports automated change propagation modeling: given a proposed requirement change, simulate the full downstream semantic impact before the change is approved, enabling the change control board to make informed decisions about the true cost and risk of the proposed change.

This capability is particularly valuable for MBSE programs using SysML models. The embedding pipeline can process both natural language requirements text and model element descriptions, identifying where a requirement change in the requirements management tool will likely propagate to specific blocks, interfaces, or constraints in the system model — enabling proactive model update planning before the change is implemented.

Section 07

System Alignment Checking

System alignment is the highest-level application of embedding-driven requirements analysis: measuring whether the system being built — as described by its design documentation, test specifications, and implementation artifacts — remains semantically aligned with the stakeholder needs documented in the original requirements. Alignment checking operates at the program level, not the individual requirement level, and is designed to detect the systemic drift that emerges over long development programs.

The Alignment Measurement Framework

Alignment is measured along three axes, each capturing a different dimension of requirements-system correspondence:

AXIS 01 · VERTICAL
Stakeholder → System Coverage
Do the system-level requirements semantically cover the stakeholder needs? Measures whether the translation from stakeholder intent to formal requirements preserved the full scope of what stakeholders intended, or whether some needs were lost, distorted, or over-specified in the translation process.
AXIS 02 · HORIZONTAL
Requirements → Design Consistency
Do the design specifications semantically satisfy the requirements they are intended to implement? Measures whether the design documents — interface control documents, architectural decision records, specification sheets — address the full behavioral intent of the requirements they claim to implement.
AXIS 03 · TEMPORAL
Requirements → Current System State
Has the system's current documented behavior drifted from its requirements baseline? Measures whether the accumulation of design decisions, waivers, deviations, and undocumented changes has semantically shifted the system away from its requirements — even if no individual change was large enough to trigger a formal change control action.
AXIS 04 · INTEGRITY
Internal Requirements Consistency
Is the requirements set internally consistent — free from contradictions, duplicates, and ambiguities? Measures the semantic health of the requirements baseline itself, identifying requirements that conflict with each other, requirements stated multiple times in different terms, and requirements with insufficient specificity for implementation.

Alignment Scoring

Each alignment axis is scored as a percentage: the fraction of requirement-artifact pairs above the semantic coverage threshold. An overall alignment score combines the four axes with weights calibrated to the program's current lifecycle phase — Axis 01 (stakeholder coverage) is most critical at requirements review; Axis 03 (temporal drift) becomes most critical during system integration testing. Programs below a defined alignment threshold trigger a formal re-alignment review.

📌 Alignment vs. Compliance

A system can be 100% compliant with its requirements baseline while having poor alignment with stakeholder intent. This occurs when requirements were poorly specified — when they captured what the system does rather than what stakeholders need. The distinction between compliance (does the system satisfy what was written?) and alignment (does the system serve the intended mission?) is one that embedding analysis can make explicit in ways that keyword-based tools cannot. Alignment analysis operates on the intent embedded in stakeholder need statements, not just the text of formal requirements.

Section 08

Implementation Architecture

The embedding-driven requirements management system integrates with existing requirements management tooling (DOORS, Jira, Confluence, IBM Engineering Lifecycle Management, Polarion) without replacing it. The embedding pipeline processes artifact content, maintains a semantic vector store, and exposes analysis capabilities through APIs and dashboards — sitting alongside existing tooling as an analytical layer, not replacing the authoritative data store.

Data Source Layer — Existing Program Artifacts
DOORS / ELM
Jira / Confluence
Polarion / Codebeamer
Cameo SysML
SharePoint / PDM
Git / Code Repos
Connectors extract artifact text, metadata, and existing link graph
Ingestion & Preprocessing Pipeline
Artifact Extractor
Text Normalizer
Domain Classifier
Change Detector
PII / CUI Scrubber
Normalized text → embedding model → dense vector representation
Embedding Layer
Base Embedding Model
Domain Adapter (fine-tuned)
Batch Encoder
Incremental Update Engine
Vectors stored in semantic index with artifact metadata
Semantic Vector Store & Index
pgvector / Weaviate
FAISS ANN Index
Metadata Store
Link Graph DB (Neo4j)
Version History
Analysis services query vector store for similarity computations
Analysis Services
Traceability Analyzer
Impact Analysis Engine
Duplicate Detector
Conflict Detector
Alignment Scorer
LLM Explanation Layer
Results surfaced via API, dashboards, and RE tool integrations
Outputs & Integration
Coverage Dashboard
Gap Reports
Impact Reports
Tool Plugin (DOORS)
REST API
Audit Trail
Figure 1 — Embedding-Driven Requirements Management Architecture — Integrates with existing RE tooling as an analytical layer

Security Considerations for Classified Programs

Requirements for classified defense programs contain sensitive technical information that cannot be processed by commercial embedding APIs without appropriate data handling controls. For classified programs, the architecture deploys on-premises or in a government cloud enclave using open-weight embedding models (e.g., a fine-tuned Llama or Mistral variant) that process requirements text without any data leaving the classified environment. The vector store operates within the classification boundary; only aggregated analysis outputs (scores, gap counts, conflict flags) are surfaced to unclassified reporting dashboards. This architecture is directly aligned with the Secure RAG design patterns documented in Continuum's CR-04 publication.

Section 09

Interactive Traceability Explorer

The following demonstration simulates embedding-based traceability analysis for a representative set of satellite communication system requirements. Select a requirement to see its semantic traceability links across design elements, test cases, and peer requirements — scored by cosine similarity.

Semantic Traceability Analysis — SATCOM System Requirements Demo
Simulated embedding similarity scores · Click a requirement to explore
Select Requirement
Section 10

Interactive Impact Analysis Tool

The following tool simulates automated change impact analysis. Select a requirement that has been modified to see which downstream artifacts are semantically impacted, stratified by impact criticality. This reflects the output of the embedding-driven impact engine applied to a representative defense system requirements set.

Change Impact Analysis — Select a Changed Requirement
Embedding-based semantic impact propagation · Simulated results
Section 11

MBSE & DoD Integration

Model-Based Systems Engineering (MBSE) and embedding-driven requirements management are not competing approaches — they are complementary. MBSE provides the formal modeling infrastructure (SysML blocks, interfaces, constraints, behavioral diagrams); embedding-driven analysis provides the semantic bridge between natural language requirements artifacts and formal model elements that no prior technology has adequately addressed.

The Natural Language to Model Gap

The persistent challenge in MBSE adoption is the gap between natural language stakeholder needs and formal model elements. Stakeholders express needs in natural language. Systems engineers translate those needs into formal requirements. MBSE practitioners translate formal requirements into model elements. At each translation step, semantic content can be lost, distorted, or left ambiguous. Embedding analysis at each translation boundary — stakeholder needs to requirements, requirements to model elements — provides a quantified semantic fidelity metric that validates whether the translation preserved intent.

Cameo Systems Modeler Integration

Continuum's MBSE practice uses Cameo Systems Modeler as the primary modeling tool for DoD programs. The embedding-driven requirements pipeline integrates with Cameo through the following workflow:

  • Requirements text from DOORS or Jira is embedded and indexed nightly, with change detection flagging modified requirements.
  • SysML model element descriptions (block names, interface definitions, constraint notes, use case descriptions) are extracted from Cameo via API and embedded in the same vector space.
  • The coverage analyzer computes semantic similarity between each requirement and its linked model elements, surfacing requirements with low model coverage scores for MBSE team review.
  • When a requirement changes, the impact engine identifies affected model elements by semantic proximity — enabling MBSE engineers to proactively update the model rather than discovering the gap at the next review gate.

INCOSE Requirements Quality Metrics

INCOSE's Guide to Writing Requirements defines quality attributes for well-formed requirements — unambiguous, verifiable, traceable, consistent, complete. Embedding analysis can automate the measurement of several of these attributes:

  • Consistency (automated): Near-duplicate and conflict detection identifies requirements that violate consistency. Score: fraction of requirement pairs with similarity 0.65–0.85 that are flagged for potential conflict.
  • Completeness (automated): Coverage gap analysis identifies requirements without semantic traceability to test cases. Score: fraction of requirements with full semantic coverage.
  • Traceability (automated): Link validation score measures the fraction of existing links that have cosine similarity above the validation threshold.
  • Verifiability (semi-automated): Requirements that are too abstract to embed meaningfully (very short, highly generic) are flagged for human review of their testability.
🛡️ DoD 5000.87 & Technical Reviews

DoD's Software Acquisition Pathway (5000.87) requires programs to demonstrate requirements traceability at system functional reviews, system design reviews, and operational acceptance events. Embedding-driven traceability analysis provides quantified coverage scores and gap lists that can be directly incorporated into technical review packages — replacing subjective "coverage appears adequate" assessments with "semantic coverage score: 94%, with 23 specific gaps identified for resolution before system integration test."

Section 12

Evaluation & Metrics

Embedding-driven requirements management must itself be evaluated against ground truth to validate that the semantic similarity scores are accurately reflecting real traceability relationships. The evaluation methodology compares embedding-based link recommendations against expert human judgments across a sample of the requirements corpus.

Evaluation Methodology

Continuum's evaluation protocol for requirements embedding pipelines follows a structured validation process:

  1. Expert panel construction: A panel of 3–5 domain experts (systems engineers with deep knowledge of the program) independently rates traceability link quality for a stratified random sample of requirement-artifact pairs. The inter-rater agreement (Fleiss' kappa) establishes the ground truth reliability ceiling.
  2. Precision and recall measurement: For the sampled requirement-artifact pairs, compare embedding-based recommendations (above the cosine threshold) against expert judgments. Precision measures the fraction of embedding-recommended links that experts confirm; recall measures the fraction of expert-confirmed links that the embedding pipeline identified.
  3. Threshold calibration: Adjust the cosine similarity threshold to optimize the precision-recall trade-off for the specific program's risk tolerance. High-risk programs (safety-critical, classified) may prefer higher thresholds (higher precision, lower recall) to minimize false positives; resource-constrained programs may prefer lower thresholds.
  4. Drift monitoring: Repeat the evaluation on 5% of the corpus quarterly to detect model accuracy drift as the requirements corpus evolves.

Benchmark Performance Results

MetricBaseline (Keyword)CDMF Embedding (Uncalibrated)CDMF Embedding (Domain-Calibrated)
Traceability Precision0.720.810.89
Traceability Recall0.610.780.85
Duplicate Detection Rate0.430.880.93
Conflict Identification PrecisionN/A (not supported)0.710.82
Impact Analysis Coverage0.58 (explicit links only)0.790.86
Human Effort ReductionBaseline2.8× reduction4.1× reduction

Results from Continuum pilot programs on aerospace and DoD program requirements corpora. Domain calibration uses fine-tuning on INCOSE-aligned requirements and program-specific terminology. Expert panel ground truth with kappa ≥ 0.72.

Section 13

Toolchain & Integration Guide

The embedding-driven requirements management system integrates with the existing toolchain rather than replacing it. The following reference architecture specifies integration points, recommended components, and configuration guidance for common defense program tool stacks.

LayerOpen-Source OptionCommercial OptionGov Cloud OptionNotes
Embedding Modelall-MiniLM-L6-v2, BGE-M3, Llama 3 (fine-tuned)OpenAI text-embedding-3-largeAzure OpenAI GovCloudOn-prem open-weight recommended for classified; commercial API acceptable for unclassified
Vector Storepgvector, Weaviate, Milvus, FAISSPinecone EnterpriseAzure Cognitive Searchpgvector recommended for integration with existing PostgreSQL-backed RE tools
RE Tool ConnectorOSLC adapters (open source)IBM ELM OSLC APIAWS Gov + custom connectorDOORS Next and Polarion have REST APIs; legacy DOORS Classic requires DXL scripting
Graph DatabaseNeo4j Community, Apache TinkerPopNeo4j Enterprise, Amazon NeptuneAmazon Neptune GovCloudExplicit link graph enriched with embedding similarity edges; hybrid traversal
Analysis FrameworkPython / scikit-learn / sentence-transformersContinuum CDMF APIATO'd containerized deploymentCore analysis is Python-based; containerized for Kubernetes deployment in pipeline
DashboardGrafana + custom pluginsPowerBI, TableauGovCloud PowerBICoverage scores, gap lists, impact reports surfaced via standard BI tooling

Integration with Existing Requirements Workflows

The embedding pipeline integrates as a non-disruptive analytical overlay on top of existing requirements management processes. Requirements engineers continue working in their existing tools (DOORS, Jira, ELM). The embedding pipeline runs nightly, processes any changes since the last run, updates the semantic index, and surfaces new analysis findings in the dashboard and tool plugin. There is no change to the authoritative data source; the embedding system is an advisory analytical layer, not an authoritative one.

  • Existing link graph is preserved and augmented — no links are deleted; new semantic links are suggestions, not authoritative links
  • All embedding recommendations include an explanation generated by the LLM explanation layer — requirements engineers see why the system suggested a link, not just that it did
  • Human approval required to create, modify, or delete any authoritative traceability link — the embedding system is advisory, never autonomous for requirements data
  • Full audit trail of all embedding recommendations and human accept/reject decisions — required for INCOSE-compliant requirements management process documentation
Section 14

Implementation Roadmap

Implementing embedding-driven requirements management is a phased investment. Programs should resist the temptation to deploy all capabilities simultaneously — the analysis outputs are only as valuable as the requirements corpus quality, and early phases of the roadmap improve that quality while building team familiarity with the tool's outputs.

P1
Weeks 1–6 · Foundation
Corpus Extraction, Baseline Embedding & Quality Assessment

Extract the existing requirements corpus from current tooling. Deploy the base embedding model and generate initial embeddings. Run an initial duplicate detection pass to assess corpus quality — this frequently reveals that 5–15% of requirements in large programs are near-duplicates that have accumulated over years of development. Deliver a corpus health report before proceeding to traceability analysis.

Corpus Extraction Base Embedding Deploy Duplicate Detection Corpus Health Report
P2
Weeks 7–14 · Traceability Analysis
Semantic Coverage Analysis & Weak Link Detection

With the embedding corpus established, run the first traceability coverage analysis. Identify requirements with low semantic coverage (no test cases with similarity above threshold). Validate existing traceability links — flag weak links for human review. This phase typically produces the most immediate value: programs discover specific coverage gaps that were invisible in the formal link database.

Coverage Analysis Weak Link Detection Gap Report Expert Validation
P3
Months 4–6 · Impact Analysis
Domain Calibration & Automated Impact Analysis

Fine-tune the embedding model on the program's domain-specific requirements corpus using the expert validation data from Phase 2. Deploy the impact analysis engine integrated with the change management workflow. From this phase forward, all engineering change proposals include an automated semantic impact assessment generated by the embedding pipeline.

Domain Fine-Tuning Impact Engine Deploy ECP Integration Threshold Calibration
P4
Months 7–9 · Alignment & MBSE Integration
System Alignment Scoring & Model Integration

Deploy alignment scoring across all four axes. Integrate with the Cameo MBSE model (or equivalent). Generate the first full alignment report covering stakeholder→requirements→design→test semantic fidelity. This report typically surfaces specific areas of requirements-design semantic drift that had accumulated over the program lifecycle without triggering formal change actions.

Alignment Scoring Cameo Integration MBSE Coverage Analysis Alignment Report
P5
Months 10+ · Continuous Operations
Continuous Semantic Monitoring & Process Integration

Integrate embedding analysis into the standard requirements review cadence — new requirements are automatically evaluated for duplicates and conflicts before acceptance; changed requirements automatically trigger impact reports. Establish quarterly alignment score reviews as a standing program management metric. The system operates continuously, surfacing new issues as the program evolves.

Continuous Monitoring New Req Auto-Check Quarterly Alignment Review Process Documentation
Section 15

The Continuum Approach

This paper describes the published research and operational methodology that underpins Continuum's requirements engineering practice. Kurt A. Richardson, PhD, developed the embedding-driven requirements management framework as part of Continuum's R&D program — the research that produced this white paper is the same research that informs our MBSE engagements on active DoD programs. We do not offer requirements management consulting based on frameworks we have read about; we offer it based on frameworks we have built, tested, and refined through operational delivery.

✓ Continuum Requirements Engineering Services
  • Requirements Corpus Health Assessment: Initial embedding analysis of the existing requirements corpus — duplicate detection, conflict identification, coverage gap analysis. Deliverable: corpus health report with quantified issues and prioritized remediation roadmap.
  • Traceability Analysis & Gap Remediation: Full semantic coverage analysis against test and design artifacts. Identification of weak links and traceability gaps. Deliverable: coverage gap list with embedded similarity scores and recommended remediation actions.
  • Automated Impact Analysis Deployment: Implementation of the embedding-driven impact analysis pipeline integrated with the program's change management workflow. Deliverable: ECP impact reports generated automatically for all change proposals.
  • MBSE Semantic Integration: Integration of the embedding pipeline with Cameo Systems Modeler or equivalent MBSE tool. Requirement-to-model semantic coverage analysis with specific gap identification. Aligned to INCOSE SE Handbook and DoD 5000.87 technical review requirements.
  • System Alignment Assessment: Four-axis alignment scoring across the full stakeholder needs → requirements → design → test chain. Deliverable: alignment scorecard with specific drift identification and recommended re-alignment actions.

Engagement Models

EngagementScopeDurationOutcome
Corpus Health AssessmentDuplicate detection, conflict identification, initial coverage gap analysis on existing corpus3–4 weeksCorpus health report; quantified issues; prioritized gap list for technical review
Traceability Analysis SprintFull semantic coverage analysis; weak link validation; gap remediation recommendations6–8 weeksCoverage scorecard, gap register, weak link list, recommendations for review package
Impact Analysis DeploymentPipeline deployment integrated with ECP workflow; domain calibration; team training10–14 weeksLive impact analysis pipeline generating reports for all change proposals
Full EDRMF ProgramComplete five-phase roadmap: corpus → traceability → impact → alignment → continuous6–12 monthsFully operational embedding-driven RE capability with documented performance metrics
Section 16

Conclusion

Requirements management has been a manual, labor-intensive discipline by necessity — the tools available have not supported the kind of semantic analysis that the problem actually requires. Semantic embeddings change the economics and the capability. What once required weeks of expert review time — checking whether a large set of requirements are consistently covered, identifying all potential impacts of a proposed change, measuring whether the system is semantically aligned with its requirements baseline — can now be accomplished in minutes with a well-calibrated embedding pipeline.

The result is not the elimination of requirements engineering expertise — it is the redirection of that expertise toward the decisions only humans can make, freed from the mechanical work of traversing large traceability corpora that embedding analysis can perform automatically, comprehensively, and consistently. The systems engineer who once spent 30% of their time maintaining traceability links can now spend that time evaluating the semantic conflicts the embedding system surfaces — which is the work that actually prevents system failures.

Requirements engineering has always been about understanding — understanding what the system needs to be, and whether what is being built matches that understanding. Embedding technology does not change what requirements engineering is about. It changes what is humanly possible within a program's resource constraints. And in defense programs, what is humanly possible often determines what survives contact with reality.
— Kurt A. Richardson, PhD, Continuum Resources LLC, 2025
Start a Conversation

Ready to Transform Your Requirements Management?

Contact Continuum Resources for a complimentary Requirements Corpus Health Assessment for your program.

Get in Touch →
References

References

  • [INCOSE-2023] International Council on Systems Engineering — "Guide to Writing Requirements (GtWR)" — INCOSE-TP-2010-006-04, 2023. The authoritative INCOSE requirements quality criteria applied in alignment assessment.
  • [INCOSE-HDBK] International Council on Systems Engineering — "Systems Engineering Handbook v4.0" — 2015. SE process framework within which embedding-driven RE operates.
  • [HAYES-2006] Hayes, J.H., Dekhtyar, A., Sundaram, S.K. — "Advancing Candidate Link Generation for Requirements Tracing" — IEEE Transactions on Software Engineering, 32(1), 2006. Foundational research on IR-based traceability, predecessor to embedding approaches.
  • [SEN-2020] Sentilles, S., Gorschek, T. et al. — "Quality of Requirements Traceability in Industrial Practice" — IEEE Software, 2020. Empirical study of traceability quality failures in industrial programs.
  • [REIMERS-2019] Reimers, N. & Gurevych, I. — "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks" — EMNLP 2019. Foundation paper for the sentence-level embeddings used in requirements analysis.
  • [BOGART-2023] Bogart, M. et al. — "LLM-Augmented Requirements Tracing: A Systematic Evaluation" — REFSQ 2023. Evaluation of large language model approaches to requirements traceability, validating cosine similarity thresholds.
  • [MUSTAFA-2022] Mustafa, N., Labiche, Y., Towey, D. — "A Systematic Literature Review of Requirements Traceability" — IEEE Access, 2022. Comprehensive review of traceability techniques including embedding approaches.
  • [DBLOCH-2021] D'Addio, R. et al. — "Automated Traceability Link Recovery using Fine-tuned BERT Models" — ICSME 2021. Domain fine-tuning methodology applied in CDMF calibration process.
  • [CR-01] Richardson, K.A. — "WP-CR-2025-01: Agentic AI in Mission-Critical Environments" — Continuum Resources, 2025. Agentic architecture for automated requirements analysis workflows.
  • [CR-02] Richardson, K.A. — "WP-CR-2025-02: Fine-Tuning vs. RAG Decision Framework" — Continuum Resources, 2025. Technical basis for the embedding model selection and domain calibration methodology.
  • [CR-04] Richardson, K.A. — "Secure RAG Architectures" — Continuum Resources, 2024. Security architecture for the classified requirements embedding deployment described in Section 08.
  • [DOD-5000-87] Department of Defense — "Operation of the Software Acquisition Pathway" — DoDI 5000.87, 2020. Requirements traceability obligations for programs under the Software Acquisition Pathway.
  • [SERC-2019] Systems Engineering Research Center — "Requirements Engineering in DoD System Acquisitions" — SERC-2019-TR-108, 2019. Defense-specific requirements engineering practice and failure mode analysis.