Computational Surrogate Optimization Using AI — Continuum Resources WP-CR-2026-02

Section 01

Executive Summary

Finite element (FE) simulation has become indispensable to vehicle safety research — enabling virtual crash tests that would be impractical or impossible to conduct physically, and providing biomechanical insight at spatial and temporal resolution no physical ATD can match. The world's leading human body models (HBMs) — THUMS, GHBMC, SAFER — now contain upward of two million elements representing bones, soft tissues, and organs at anatomical fidelity. Yet the calibration of these models remains a fundamental bottleneck: each FE simulation consumes hours of high-performance computing time, and the material parameter space governing biological tissue behavior spans dozens of uncertain dimensions. Exhaustive exploration of that space by brute force is computationally intractable.

Machine learning resolves this bottleneck through a class of techniques collectively called surrogate modeling: training fast ML emulators on a strategically chosen set of FE simulation outputs, then using those emulators to drive parameter optimization, uncertainty quantification, and experimental design at a fraction of the direct simulation cost. Combined with Bayesian calibration frameworks and active learning acquisition strategies, the ML-augmented calibration pipeline can reduce the number of FE simulations required by one to two orders of magnitude while producing posterior distributions over material parameters rather than single-point estimates — a scientifically and regulatorily more defensible output.

This paper provides a technically grounded examination of these methods as applied to HBM calibration and ATD biofidelity correlation, with a proposed architecture for integrating them into NHTSA's experimental and analytical biomechanics research programs.

2M+

Elements in Full-Body HBM

8–24h

Per Full-Body FE Simulation

100×

Simulation Cost Reduction via Surrogate

Key Contributions

Gaussian process surrogates trained on Latin Hypercube Sampling (LHS) batches provide uncertainty-aware emulation of HBM thorax and lower extremity response, enabling Bayesian calibration against PMHS corridor data at negligible marginal cost per evaluation.
Bayesian calibration via the Kennedy-O'Hagan framework with MCMC sampling produces posterior distributions over material parameters that explicitly represent calibration uncertainty — enabling credible interval propagation into downstream injury risk predictions.
Active learning with Expected Information Gain (EIG) acquisition reduces the FE simulation budget required to achieve target posterior precision by 60–80% compared to space-filling LHS alone.
ATD-HBM transfer learning trains ML mappings from ATD kinematic and force time-history features to HBM injury metric predictions, establishing a data-driven biofidelity bridge that supplements the classical CORA corridor comparison methodology.
Physics-informed neural networks (PINNs) enforce continuum mechanics constraints during tissue mechanics parameter identification, improving extrapolation fidelity beyond the experimental loading rate range.

Section 02

Introduction — The Virtual Testing Imperative

The 2012 Moving Ahead for Progress in the 21st Century Act (MAP-21, Section 24109) directed NHTSA to research and develop advanced crash test dummies using virtual and physical testing. This legislative mandate formalized what the biomechanics research community had been building toward for two decades: a future in which virtual human body models, validated against physical PMHS data, can augment or partially replace physical dummy tests in the regulatory process. That future is closer than commonly appreciated — but its realization depends on solving the calibration problem credibly.

The calibration challenge is not unique to biomechanics. In aerospace structural analysis, geophysical reservoir modeling, and nuclear engineering, the same fundamental problem arises: a complex physical simulation contains material or process parameters that cannot be directly measured at the scale of interest and must be inferred from coarser experimental observations. The computational science community has developed a sophisticated methodology for addressing this problem — centered on surrogate models, Bayesian inference, and active experimental design — that is directly applicable to HBM calibration with relatively modest adaptation.

The limiting factor in human body model utility is not element count, not anatomical detail, and not solver capability. It is the fidelity of the material constitutive models — and those models are only as good as the calibration process that produced them.

— WP-CR-2026-02, Continuum Resources LLC

This paper is organized as a progressive technical argument: beginning with the simulation landscape (Section 3), characterizing the calibration problem structurally (Section 4), and then building the ML-augmented solution architecture layer by layer — surrogates (5), Bayesian calibration (6), active learning (7), ATD correlation (8), physics-informed approaches (9), and shape-based population modeling (10). Section 11 provides an interactive sensitivity reference; Sections 12–14 address the Continuum approach, conclusions, and references.

Section 03

The Simulation Landscape

Two classes of computational model define the current vehicle safety simulation ecosystem: anthropomorphic test device (ATD) finite element models — digital twins of physical crash test dummies — and human body models (HBMs) that represent biological tissue directly. They serve complementary regulatory roles, and the tension between them defines the central challenge that ML-augmented calibration must resolve.

Human Body Models

The dominant HBMs in active research and regulatory use are three:

THUMS (Total Human Model for Safety): Developed by Toyota and JSR Engineering. Version 5 (THUMS-v5) contains approximately 2 million elements representing all major skeletal and soft tissue structures. Validated to over 100 PMHS test corridors spanning head, neck, thorax, abdomen, pelvis, and lower extremity. Runs in LS-DYNA. A full frontal crash simulation requires 8–16 hours on a 16-core HPC node.
GHBMC (Global Human Body Models Consortium): The GHBMC M50-O (50th-percentile male occupant) model contains approximately 2.2 million elements with detailed organ representation including liver, spleen, kidney, heart, and lungs. The GHBMC family includes M95-O, F05-O (5th-percentile female), and a detailed thorax model. Extensive peer-reviewed validation against PMHS corridors; the gold standard for thorax injury research. Also runs in LS-DYNA.
SAFER HBM / VIVA+: The SAFER HBM from Chalmers University and its open-source successor VIVA+ (Open Human Body Model) emphasize seated occupant posture and active muscle modeling — critical for out-of-position and autonomous vehicle pre-crash posture scenarios. Lower element count (~700k elements) enables faster simulation but at some cost to anatomical detail.

ATD Finite Element Models

Physical crash test dummies are expensive, fragile, and limited in measurement modality. Their FE model counterparts — commercially distributed by Humanetics for Hybrid III, WorldSID, and THOR families — enable virtual compliance testing at dramatically lower cost and higher throughput. Key properties: 100,000–500,000 elements, simulation runtime of minutes rather than hours, tuned to match physical ATD response within defined corridors.

ATD Model	Crash Mode	Biofidelity Level	FMVSS Application	FE Elements
Hybrid III 50M	Frontal	Low (by design — optimized for repeatability)	FMVSS 208	~250k
THOR-50M	Frontal (next-gen)	Moderate — more biofidelic thorax, improved head/neck	FMVSS 208 (proposed update)	~380k
WorldSID 50M	Side	Moderate	FMVSS 214	~200k
ES-2re	Side (legacy)	Low	FMVSS 214 (legacy)	~120k
Q-Series (Q3, Q6)	Frontal / Side	Moderate (pediatric)	FMVSS 213 (CRS)	~150–200k

The Biofidelity Gap

The Hybrid III — the dominant regulatory ATD — was designed in the 1970s primarily for repeatability and durability, not biofidelity. Its thorax, with a rigid sternum and steel ribs, is mechanically unlike human thorax tissue. Its head, a rigid aluminum sphere with skin, does not exhibit the frequency-dependent viscoelastic response of human skull and brain. Regulatory thresholds derived from Hybrid III measurements are therefore twice-removed from human injury tolerance: first by the ATD-to-human biofidelity gap, then by the IRF calibration uncertainty discussed in WP-CR-2026-01.

The THOR-50M ATD, developed by NHTSA specifically to improve biofidelity, addresses many of these gaps: a more flexible thoracic spine, a multi-point chest deflection measurement system, and improved head/neck kinematics. The transition from Hybrid III to THOR as the primary frontal crash dummy — currently underway in both NCAP and proposed FMVSS 208 revisions — makes ATD-HBM correlation a pressing near-term research priority. ML methods accelerate that transition by enabling rapid, data-driven development of THOR-HBM transfer functions.

Section 04

The Calibration Problem — Structure and Difficulty

HBM calibration is an inverse problem: given experimental observations (PMHS force-deflection corridors, strain gauge recordings, fracture timing data) and a parameterized forward model (the FE simulation), find the parameter values that produce simulation outputs consistent with the observations. The difficulty is not conceptual — it is computational and statistical.

The Parameter Space

A thorax-focused calibration illustrates the dimensionality. The thorax response in frontal impact is governed by material properties of at least six tissue types: rib cortical bone (Young's modulus E, yield stress σ_y, failure strain ε_f, strain rate sensitivity coefficient), costal cartilage (nonlinear stiffness curve parameters), intercostal muscle (passive stiffness, active tone scaling), sternum (E, σ_y), lung parenchyma (bulk modulus, Poisson ratio), and the pericardium and cardiac tissue interaction terms. Across these tissue types, a thorax calibration involves 20–40 uncertain scalar parameters.

Calibration as an Inverse Problem

θ* = argmin_θ ‖ y_sim(θ) − y_exp ‖²_W

where θ ∈ ℝᵈ is the material parameter vector (d = 20–40 for thorax),
y_sim(θ) is the FE simulation output at parameters θ,
y_exp is the PMHS experimental observation vector,
W is a weighting matrix (e.g., inverse corridor width)

This point-estimate formulation is the classical least-squares approach. Its fundamental limitation: it produces a single θ* without characterizing parameter identifiability or posterior uncertainty. Multiple parameter combinations may produce equally good fits to the experimental data (non-identifiability), a fact that classical calibration cannot detect or quantify.

The Computational Bottleneck

The FE forward model y_sim(θ) is expensive: each evaluation requires a full crash simulation (8–24 hours for a full-body model, 1–4 hours for an isolated thorax model on a 32-core node). A naive grid search over a 30-dimensional parameter space at even 5 levels per dimension requires 5³⁰ ≈ 10²¹ evaluations — a number that exceeds the estimated lifetime of the solar system in compute hours. Even sophisticated classical approaches — Latin Hypercube Sampling with response surface polynomial fitting — typically require 200–2,000 evaluations, corresponding to 200–2,000 full FE simulations. At 4 compute-hours per simulation on 32 cores, that is 800–8,000 core-hours per calibration cycle.

This cost is not merely inconvenient — it is scientifically limiting. It means that most HBM calibration work involves manual tuning by experienced biomechanics engineers, who iteratively adjust parameter values based on physical intuition and visual inspection of force-deflection corridor overlays. This process is slow, non-reproducible, and does not produce uncertainty estimates. The result: calibrated HBMs that match their validation corridors but whose parameter values are poorly identified and whose response under extrapolation conditions is unpredictably uncertain.

Classical Methods and Their Limits

One-at-a-time (OAT) sensitivity analysis: Vary each parameter independently while holding others at nominal values. Misses interactions; computationally efficient per parameter but requires O(d) simulations and produces a misleading sensitivity picture when parameter interactions are strong — which they routinely are in tissue mechanics.
Factorial DOE / Taguchi arrays: Systematic exploration of a discrete grid in parameter space. Captures main effects and low-order interactions. Scales poorly with dimension (2^d for full factorial); Taguchi L-arrays reduce this but at the cost of interaction resolution.
Response surface methodology (RSM): Fit a polynomial surrogate (typically quadratic) to DOE simulation outputs. Fast prediction once trained, but polynomial form is a poor approximation for the nonlinear, non-monotonic response surfaces typical of biomechanical simulations near yield/failure thresholds.
Gradient-based optimization: Requires gradient information (finite differences add simulation cost) and is easily trapped in local minima on the rough, multimodal response surfaces produced by models with fracture and contact nonlinearities.

Section 05

Surrogate Modeling — ML Emulation of FE Simulations

A surrogate model (also called an emulator or metamodel) is a fast-evaluating ML function trained to approximate the input-output behavior of an expensive computational model. Once trained, the surrogate replaces the FE model in all downstream analyses — optimization, sensitivity analysis, Bayesian calibration — enabling thousands of evaluations per second where the FE model required hours per evaluation. The key design choices are: how to generate the training data (experimental design), which ML architecture to use, and how to quantify and communicate surrogate prediction uncertainty.

Training Data Generation: Latin Hypercube Sampling

Latin Hypercube Sampling (LHS) is the standard experimental design for surrogate training in computational science. It partitions each input dimension into n equal-probability strata and samples exactly once from each stratum, ensuring that the n training points are well-spread across the full parameter space with no clustering or replication. For a d-dimensional parameter space with n training runs, LHS provides substantially better space coverage than random sampling (which permits clustering) with only O(n) evaluations.

For HBM thorax calibration, a practical LHS training set of n = 200–400 simulations typically suffices for surrogate training when the input dimension d ≤ 30, provided the parameter ranges are physically motivated. Range specification is not arbitrary: each parameter's sampling range should span from biomechanically plausible minimum to maximum values drawn from tissue mechanics literature, not from unbounded numerical ranges. Unrealistically wide ranges waste training budget on physically impossible parameter combinations.

Gaussian Process Regression (Kriging)

Gaussian Process Regression (GPR) is the surrogate architecture of choice for HBM calibration applications, for two reasons that are both theoretically and practically significant. First, GPR provides a full predictive distribution — mean prediction plus uncertainty — at every point in input space, not just a point estimate. This uncertainty is essential for Bayesian calibration and active learning. Second, GPR is an interpolating method: it passes exactly through the training data (zero residual at observed points), which is correct behavior when the FE simulation is deterministic (no simulation noise).

Gaussian Process Surrogate — Posterior Distribution

ŷ(θ) | θ*, y_sim(θ*) ~ GP(μ_post(θ), k_post(θ, θ'))

μ_post(θ) = k(θ, θ*) [K + σ²I]⁻¹ y_sim(θ*)
k_post(θ,θ') = k(θ,θ') − k(θ,θ*)[K+σ²I]⁻¹k(θ*,θ')

Matérn 5/2 kernel: k(r) = (1 + √5·r/ℓ + 5r²/3ℓ²) exp(−√5·r/ℓ)

The Matérn 5/2 kernel is preferred over the squared-exponential (RBF) kernel for biomechanical applications because it assumes twice-differentiable but not infinitely smooth functions — more realistic for material response surfaces that exhibit yield/fracture discontinuities. Lengthscale ℓ and output scale are optimized by maximizing the marginal likelihood on training data.

The computational cost of GPR scales as O(n³) for training (matrix inversion of the n×n kernel matrix) and O(n²) for prediction. For n = 400 training points, this is trivial on modern hardware. However, GPR degrades in quality as dimensionality increases beyond d ≈ 20–30 ("curse of dimensionality" in kernel methods). For high-dimensional full-body calibrations, dimensionality reduction via global sensitivity analysis (Sobol indices, described below) is a prerequisite — identifying the 15–20 most influential parameters before surrogate training.

Sobol Global Sensitivity Analysis

Sobol variance decomposition decomposes the total variance of the FE simulation output across all possible parameter combinations into contributions from each individual parameter and their interactions. The total-order Sobol index S_T^i for parameter i quantifies the fraction of output variance attributable to parameter i, including all its interactions with other parameters. Parameters with S_T^i below a threshold (commonly 0.01 — 1% of total variance) can be fixed at nominal values, reducing the effective calibration dimension before surrogate construction.

📊 Typical Sobol Sensitivity Structure — GHBMC Thorax

In frontal thorax calibration studies, Sobol analysis consistently identifies a small dominant set: rib cortical bone Young's modulus and failure strain account for 40–55% of the variance in peak sternal deflection, with costal cartilage stiffness contributing a further 15–20%. The remaining 20+ parameters collectively account for the residual variance. This concentration justifies surrogate construction in a 6–8 dimensional active subspace — dramatically reducing training data requirements.

Neural Network Surrogates for High-Dimensional Outputs

When the simulation output is high-dimensional — for example, a full force-deflection time history (200–500 time steps) rather than a scalar peak value — GPR becomes impractical as a direct emulator. Dimensionality reduction of the output (PCA on the output time histories, retaining the top k principal components that explain 95% of output variance) combined with independent GP emulators per component provides a tractable approach. Alternatively, neural networks — specifically architectures with residual connections — can directly emulate high-dimensional time-series outputs given sufficient training data (typically n ≥ 1,000 for time-series surrogates).

Section 06

Bayesian Calibration — Posterior Parameter Estimation

The classical inverse problem formulation of Section 4 produces a point estimate θ*. Bayesian calibration replaces that point estimate with a full posterior distribution p(θ | y_exp) — a probability distribution over parameter values that is consistent with both the experimental observations and any prior knowledge about plausible parameter ranges. The posterior distribution is the scientifically correct output of a calibration exercise: it communicates not just which parameter values are consistent with the data, but how well-identified the parameters are, which directions in parameter space are unconstrained by the available experiments, and how calibration uncertainty propagates into downstream injury predictions.

The Kennedy-O'Hagan Framework

The Kennedy-O'Hagan (KO) framework (2001) is the foundational Bayesian calibration model for computer simulations. It accounts for two sources of discrepancy between the FE simulation and physical reality: parameter uncertainty (the θ we want to infer) and systematic model error — the bias introduced by structural simplifications in the FE model itself. Failing to account for model discrepancy causes Bayesian calibration to overfit parameters to experimental noise, producing a biased posterior.

Kennedy-O'Hagan Bayesian Calibration Model

y_exp(x) = y_sim(x, θ) + δ(x) + ε

where y_sim(x, θ) ≈ ĝ(x, θ) [GP surrogate for FE model]
δ(x) ~ GP(0, k_δ) [model discrepancy term]
ε ~ N(0, σ_ε²) [observation/measurement error]
θ ~ π(θ) [prior over material parameters]

Identifiability tension: the model discrepancy term δ(x) absorbs systematic bias, but it competes with θ for the observed data signal. Regularization is required — typically through informative priors on δ that enforce smoothness and zero mean, and on θ that reflect tissue mechanics literature bounds.

MCMC Sampling of the Posterior

The posterior distribution p(θ | y_exp) is not available in closed form for the KO model. Markov Chain Monte Carlo (MCMC) sampling — generating a sequence of θ samples that converge to the target posterior distribution — is the standard computational approach. Two MCMC algorithms are particularly relevant for HBM calibration:

Metropolis-Hastings (MH): Proposes new θ values from a proposal distribution, accepting or rejecting based on the likelihood ratio. Simple to implement; scales poorly with dimension because random-walk proposals become inefficient in high dimensions (the acceptance rate drops exponentially with d). Practical for d ≤ 10 after Sobol screening.
Hamiltonian Monte Carlo (HMC) / NUTS: Uses gradient information (from the GP surrogate — gradients are analytically available) to propose large, deterministic moves through parameter space that are accepted at high rates regardless of dimension. The No-U-Turn Sampler (NUTS) automates the HMC step-size tuning. The algorithm of choice for d ≥ 10 and is implemented in Stan and PyMC. Requires gradient access — provided analytically by the GP surrogate.

A practical MCMC calibration run for a thorax model with d = 8 active parameters (post-Sobol) typically requires 5,000–20,000 posterior samples with 1,000–2,000 warm-up samples for NUTS, running against the GP surrogate at microsecond-per-evaluation cost. Total MCMC wall-clock time: minutes on a laptop. The binding cost was the n = 300 FE simulations used to train the surrogate.

Posterior Uncertainty Propagation

The posterior distribution p(θ | y_exp) is the output of calibration, but the scientific product is the posterior predictive distribution over simulation outputs at new experimental conditions: p(y_new | y_exp) = ∫ p(y_new | θ) p(θ | y_exp) dθ. This integral is approximated by Monte Carlo: draw θ_i from the posterior, evaluate ŷ(θ_i) from the surrogate, collect the distribution of outputs. The resulting credible intervals on HBM response predictions quantify how well the physical PMHS data has determined the model's material parameters — and therefore how much uncertainty should be carried forward into injury risk predictions that use the calibrated HBM.

⚠ Non-Identifiability — The Practical Challenge

In thorax calibration, rib cortical bone Young's modulus and failure strain are partially non-identifiable from force-deflection data alone: many combinations of (E, ε_f) produce similar peak deflection but different rib fracture timing and count. This non-identifiability manifests in the posterior as a banana-shaped ridge in the (E, ε_f) joint distribution. Adding independent experimental constraints — rib strain gauge data, fracture count, acoustic emission timing — resolves non-identifiability by providing orthogonal information. Bayesian calibration is the only framework that makes this non-identifiability visible rather than hiding it in a point estimate.

Section 07

Active Learning for Efficient Simulation Budgets

LHS generates the initial surrogate training set in a single batch, without any feedback from the simulation outputs. Active learning — also called sequential experimental design or Bayesian optimization — improves on this by selecting new simulation points one at a time (or in batches), informed by what has already been learned from previous simulations. The goal is to maximize information gain per simulation run, concentrating the finite compute budget on the regions of parameter space that matter most for calibration.

Acquisition Functions for Surrogate Improvement

An acquisition function maps each candidate parameter vector θ to a scalar score representing how valuable a simulation at that point would be. The next simulation is run at the argmax of the acquisition function. Three acquisition strategies are relevant for HBM calibration:

Acquisition Function	Objective	HBM Application	Limitation
Integrated Mean Squared Error (IMSE)	Minimize surrogate prediction variance globally across parameter space	General-purpose surrogate refinement; ensures no region is poorly emulated	Does not prioritize calibration-relevant regions; wastes budget in low-probability posterior zones
Expected Improvement (EI)	Maximize expected improvement over current best-fit parameter set	Optimization-focused calibration; drives toward θ* efficiently	Exploitative — converges to local optima on multimodal surfaces; ignores uncertainty quantification goal
Expected Information Gain (EIG)	Maximize expected reduction in posterior entropy p(θ \| y_exp)	Calibration-focused; concentrates simulations where posterior uncertainty is largest	Computationally intensive to evaluate; requires nested Monte Carlo or variational approximation
Max Variance in Posterior Predictive	Maximize surrogate uncertainty in high-posterior-probability region	Practical compromise: cheap to evaluate, well-targeted to calibration-relevant space	Requires pre-computation of approximate posterior; iterative

The Sequential Calibration Loop

The ML-augmented calibration workflow integrates LHS initialization, surrogate training, Bayesian calibration, and active learning acquisition into a sequential loop. Each iteration adds new FE simulation points targeted by the acquisition function, retrains the surrogate, and updates the posterior. Convergence is declared when the posterior credible intervals on the primary biomechanical outputs (e.g., peak sternal deflection, rib fracture count) fall below a pre-specified width.

Phase 0 — Initialization

Sobol Sensitivity Screening

Active Subspace Identification (d ≤ 12)

Prior Specification from Literature

↓

Phase 1 — Initial LHS Batch (n₀ = 200–300 FE runs)

Latin Hypercube Design

HPC Parallel FE Execution

Output Feature Extraction

↓

Phase 2 — Surrogate Training & Calibration

GP Surrogate Fit (Matérn 5/2)

NUTS / HMC Posterior Sampling

Convergence Assessment (R̂, ESS)

↓ Active Learning Loop (repeat until convergence)

Phase 3 — Active Acquisition (Δn = 10–20 FE runs per cycle)

EIG Acquisition Evaluation

Next-Point Selection & FE Execution

Surrogate Update (incremental)

↓ Converged

Output

Posterior p(θ | y_exp)

Calibrated HBM with Uncertainty

Posterior Predictive Corridors

Identifiability Report

Figure 2 — Sequential ML-Augmented HBM Calibration Loop. Typical total FE simulation budget: 350–500 runs, vs. 2,000–10,000 for classical RSM approaches. Active learning delivers 4–20× simulation budget reduction.

Batch Active Learning for HPC Parallelism

Sequential acquisition selects one simulation at a time — suboptimal when the HPC cluster can run 20 simulations in parallel. Batch active learning algorithms (e.g., the Kriging Believer and Constant Liar heuristics, or the more principled batch EIG using determinantal point processes) select batches of k points that are simultaneously informative and mutually non-redundant. For practical HBM calibration, a batch size equal to the available parallel simulation capacity — typically 10–32 runs — provides near-optimal utilization without the complexity of exact batch EIG.

Section 08

ATD-HBM Correlation via Transfer Learning

The classical biofidelity assessment methodology — CORA (CORrelation and Analysis) corridor comparison — evaluates how closely an ATD's time-history response matches a PMHS or HBM reference corridor. CORA produces scores for phase correlation (timing alignment), magnitude correlation (amplitude ratio), and shape correlation (normalized waveform similarity), combined into an overall biofidelity score from 0 to 1. While CORA is established and standardized (ISO 18571), it has a fundamental limitation: it operates in signal space, measuring how similar the ATD response is to the reference without predicting what injury the occupant would sustain.

The Biofidelity Transfer Problem

The scientifically meaningful question is not whether the ATD's force-deflection response looks like the HBM's — it is whether the ATD measurements, as transformed by the applicable IRF, predict the same injury probability that the HBM predicts for the same crash. These two questions have different answers. A THOR chest deflection measurement that produces a CORA score of 0.92 against the GHBMC thorax response might still produce an injury probability estimate 30% different from the GHBMC's, because the HIC-based or deflection-based IRF does not capture the same injury mechanisms that the GHBMC models through tissue strain and fracture criteria.

ML transfer learning addresses this gap directly: train a mapping from ATD kinematic and force time-history features to HBM injury metric predictions, using matched ATD-HBM simulation pairs (same crash pulse, same boundary conditions). This mapping is the biofidelity transfer function — a data-driven transformation that converts ATD measurements into HBM-consistent injury predictions without requiring the ATD to be biomechanically identical to the human.

Training Data Construction

Constructing matched ATD-HBM simulation pairs requires running both the ATD FE model and the HBM in the same crash simulation environment, with the same crash pulse, restraint system, and boundary conditions. For each pair, features are extracted from the ATD response (peak forces, peak deflection, deflection rate, HIC value, Nij components, D-ring force, chest band deflections at multiple ribs) and the target label is the HBM injury metric (peak rib strain, costal cartilage strain energy, liver maximum principal stress, P(AIS ≥ 3) from the tissue-level response).

A well-designed training dataset spans the relevant range of crash conditions: delta-v from 15 to 56 km/h (roughly 10 to 35 mph), PDOF from 0° to 30° (for frontal oblique), barrier types (ODB, SORB, full frontal rigid), and restraint configurations (with/without pretensioner, with/without load limiter, multiple airbag variants). A dataset of 500–1,000 matched pairs — achievable in 500–1,000 HBM simulations at 2–8 hours each — provides sufficient coverage for a frontal thorax transfer function.

Transfer Learning Architecture

The ML architecture for ATD-HBM transfer proceeds in two stages. In the first stage, a feature extraction network is pre-trained on the large Hybrid III ATD simulation dataset (where simulation volumes are higher because Hybrid III FE runs are faster). In the second stage, the feature extractor is fine-tuned on matched THOR-HBM pairs, with only the final prediction layers re-trained from scratch. This transfer learning approach provides two advantages: it requires fewer THOR-HBM matched pairs for good performance, and it learns representations that are robust to ATD-specific artifacts present in the Hybrid III training data.

✓ Hybrid III → THOR Transfer Learning — Practical Value

NHTSA's transition from Hybrid III to THOR as the primary frontal ATD creates an immediate need for THOR-specific IRFs and biofidelity transfer functions — but the THOR PMHS calibration database is smaller than the legacy Hybrid III dataset. Transfer learning from Hybrid III to THOR dramatically reduces the THOR data requirements for comparable biofidelity function performance. This is one of the highest-value near-term ML applications in ATD development research.

Section 09

Physics-Informed Neural Networks for Tissue Mechanics

Standard data-driven surrogates — GP and neural networks alike — are purely statistical: they learn the input-output mapping from data without any knowledge of the underlying physical laws. For HBM calibration, this creates an extrapolation problem: biological tissue properties derived from low-rate quasi-static experiments must extrapolate to high-rate impact conditions (strain rates of 1–100 s⁻¹ vs. 0.001 s⁻¹ in standard mechanical testing). A data-driven surrogate has no mechanism to enforce physical consistency in this extrapolation regime; a Physics-Informed Neural Network (PINN) does.

PINN Architecture for Constitutive Modeling

A PINN for soft tissue constitutive modeling is a neural network that simultaneously minimizes two loss terms: a data loss penalizing deviation from experimentally observed stress-strain curves, and a physics loss penalizing violations of continuum mechanics constraints — thermodynamic consistency (Clausius-Duhem inequality), objectivity (frame indifference), and material symmetry (isotropy or transverse isotropy as appropriate for the tissue). These constraints are embedded as automatic differentiation penalties through the network's output, enforced at collocation points throughout the deformation space.

PINN Loss — Constitutive Identification

L_total = λ_data · L_data + λ_phys · L_phys

L_data = Σᵢ ‖ σ_NN(Fᵢ; θ) − σ_exp(Fᵢ) ‖² [stress prediction error]
L_phys = Σⱼ [max(0, −∂²Ψ/∂I₁²)]² [convexity of strain energy Ψ]
+ ‖ Ψ(I) − Ψ(I_ref) ‖ at I=I_identity [zero strain → zero energy]

F is the deformation gradient tensor; I₁, I₂, I₃ are strain invariants; Ψ is the strain energy density function predicted by the network. The physics penalties enforce thermodynamic admissibility — preventing the network from learning constitutive laws that violate conservation of energy.

Application to Rib Cortical Bone

Rib cortical bone presents a canonical PINN calibration problem. Its elastic modulus (10–25 GPa), yield stress (100–200 MPa), and post-yield softening behavior vary substantially with donor age, anatomical level (rib 3 vs. rib 9), and loading rate. Direct measurement at impact rates requires specialized Hopkinson bar experiments that generate a small number of high-noise data points. A PINN trained jointly on quasi-static coupon data (high N, low rate) and Hopkinson bar data (low N, high rate) can extrapolate the rate-dependent constitutive model to the intermediate crash rates (50–500 s⁻¹) that HBM thorax simulations actually experience — a regime where direct data is sparse.

Comparison with Standard Neural Surrogate

Property	Standard NN Surrogate	Physics-Informed NN
Interpolation accuracy	High (fits training data)	Comparable
Extrapolation accuracy	Poor (unconstrained)	Good (physics-constrained)
Thermodynamic consistency	Not guaranteed	Enforced by construction
Training data requirements	Moderate	Comparable (physics provides regularization)
Training complexity	Simple	Moderate (requires AD through physics terms)
Interpretability	Low	Moderate (physics structure is interpretable)

Section 10

Statistical Shape Models and Population Diversity

Current HBMs represent specific reference individuals — the GHBMC M50-O is a 50th-percentile male occupant by stature and mass, seated in a standard upright posture. Real-world crash occupants span an enormous range of stature (5th-percentile female at 152 cm to 95th-percentile male at 190 cm), body mass index (18–40+ kg/m²), seated posture, and age-related skeletal geometry changes. A calibrated M50-O model, however well-calibrated, cannot directly represent the injury risk of a 70-year-old 5th-percentile female. Statistical shape models, combined with ML, provide the machinery to extend a single well-calibrated HBM to the occupant population.

Constructing a Statistical Shape Model

A Statistical Shape Model (SSM) is constructed from a population of segmented anatomical geometries — CT or MRI scans from a representative subject sample, processed to extract bone and organ surface meshes. After rigid alignment (Procrustes analysis) and non-rigid registration to a common template mesh topology, each subject's geometry is represented as a displacement field from the mean shape. Principal Component Analysis (PCA) on the stacked displacement fields identifies the dominant modes of shape variation.

For thorax SSMs derived from thoracic CT databases (e.g., NIH NLST, NHLBI), the leading principal components typically encode: overall size (PC1, ~45% of variance), rib cage aspect ratio / barrel-chest vs. flat-chest morphology (PC2, ~18%), kyphosis and spinal curvature (PC3, ~12%), and asymmetry (PC4, ~7%). Together, the top 6–8 PCs capture ~90% of the shape variability in a diverse adult population.

ML-Predicted Injury Risk Across Shape Space

Given an SSM and a calibrated HBM, a morphed model library can be generated by instantiating HBM meshes at a set of LHS-sampled shape parameter vectors (shape PC scores). Running each morphed HBM through the crash simulation produces injury outcome labels — and a training dataset for an ML model that predicts injury risk as a function of both crash parameters and occupant shape.

This architecture — SSM-parameterized HBM morphing + crash simulation + ML surrogate — enables continuous injury risk prediction across the occupant size-shape distribution without running a separate full HBM calibration for each body size. The key assumption that must be validated: that the calibrated material properties from the M50-O model transfer to morphed geometries representing other body sizes. Age-related changes in bone density and tissue stiffness require separate covariate adjustment beyond geometric morphing.

Statistical shape models transform a single validated reference human body model into a population-representative simulation capability. They are the computational bridge between ATD-based regulatory testing and the occupant diversity that real-world safety standards must protect.

— WP-CR-2026-02, Continuum Resources LLC

Section 11

Interactive Sensitivity Explorer

The following reference presents illustrative Sobol total-order sensitivity indices for key HBM calibration outputs across three anatomical regions. Values reflect the parameter importance structure reported in the peer-reviewed literature on GHBMC and THUMS thorax, head, and lower extremity calibration studies. Select a region to explore which material parameters dominate the response variance for each output metric.

Sobol Total-Order Sensitivity Indices — HBM Calibration

Fraction of output variance attributable to each material parameter (including interactions) · Literature-derived estimates

Section 12

The Continuum Approach

Continuum Resources' ML engineering and knowledge graph capabilities address three specific technical challenges in the HBM calibration and ATD correlation research pipeline described in this paper. Each aligns to a distinct NHTSA EMBIR research workstream.

CAPABILITY 01

Surrogate Construction Pipeline

End-to-end ML surrogate development for FE simulation emulation — Sobol screening, LHS design, GP training (Matérn 5/2), LOOCV validation, and posterior predictive uncertainty quantification. Deployable as a containerized pipeline on HPC environments.

CAPABILITY 02

Bayesian Calibration Engine

Kennedy-O'Hagan Bayesian calibration with NUTS/HMC sampling via Stan or PyMC. Includes model discrepancy specification, identifiability diagnostics (R̂, ESS, rank plots), and posterior predictive corridor generation against PMHS experimental data.

CAPABILITY 03

PathRAG — Simulation Knowledge Graph

PathRAG deployed over NHTSA biomechanics literature, PMHS test report databases, and HBM validation study corpora. Multi-hop queries connecting tissue mechanics parameters → validation corridor studies → ATD performance data → FMVSS compliance history.

CAPABILITY 04

ATD-HBM Transfer Learning

Transfer learning architecture for Hybrid III → THOR biofidelity transfer. Pre-trained on Hybrid III simulation database; fine-tuned on THOR-HBM matched pairs. Output: THOR measurement → HBM injury probability prediction function for FMVSS 208 update research.

CAPABILITY 05

Active Learning Orchestration

Sequential experimental design with batch EIG acquisition. Integrates with LS-DYNA HPC job scheduling to automate the surrogate-update-acquire-simulate loop without manual intervention. Target: 400 total FE simulations for full posterior convergence on thorax model.

CAPABILITY 06

PINN for Rate-Dependent Tissue Mechanics

Physics-informed neural networks for cortical bone and cartilage constitutive identification from mixed-rate experimental data (quasi-static + Hopkinson bar). Thermodynamic consistency enforced via automatic differentiation penalties. PyTorch implementation with AD-computed physics gradients.

🔬 Proposed Research Contribution — NHTSA EMBIR

Continuum's proposed contribution to the EMBIR Computational Modeling and Data Analysis workstreams: (1) ML-accelerated Bayesian calibration of the GHBMC thorax model against the NHTSA PMHS sled test database, producing posterior material parameter distributions and posterior predictive corridors for THOR biofidelity assessment; (2) development of a THOR-50M → GHBMC injury prediction transfer function using matched simulation pairs, directly supporting the FMVSS 208 thorax criterion update research. Both workstreams are fully contained within Continuum's analytical and computational capability — no wet lab or physical testing infrastructure required.

Section 13

Conclusion

The computational bottleneck in HBM calibration is not a hardware problem — it is a methodology problem. The FE simulation will remain expensive; what can change is how intelligently the available simulation budget is allocated. Gaussian process surrogates, Bayesian calibration, and active learning with EIG acquisition collectively transform an intractable brute-force parameter search into a principled, efficient Bayesian inference problem that converges to a scientifically defensible posterior distribution within a simulation budget achievable on a modern HPC cluster.

The regulatory implications extend beyond efficiency. A posterior distribution over HBM material parameters is a fundamentally better scientific product than a single-point calibrated model. It makes parameter identifiability visible, propagates calibration uncertainty into downstream injury predictions, and provides an honest accounting of what the experimental data does and does not determine. When calibrated HBM predictions inform FMVSS tolerance criteria or ATD biofidelity assessments, that honesty matters for the quality of the resulting safety standards.

ATD-HBM transfer learning and statistical shape models extend the ML impact beyond calibration into the two most consequential remaining gaps in virtual testing methodology: the biofidelity bridge between regulatory dummy measurements and human injury outcomes, and the extension of single-reference HBM predictions to the full diversity of the protected occupant population. Together, these advances define a credible technical path toward the MAP-21 vision of virtual testing as a regulatory tool — not as a replacement for physical testing, but as a scientifically rigorous complement that dramatically expands the safety knowledge accessible from each physical test program dollar.

Bayesian calibration does not just fit the model to the data. It tells you what the data does not determine — and that is equally important for knowing how much to trust the model's predictions where the data runs out.

— WP-CR-2026-02, Continuum Resources LLC

Section 14

References

Kennedy, M.C., & O'Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B, 63(3), 425–464.
Sacks, J., Welch, W.J., Mitchell, T.J., & Wynn, H.P. (1989). Design and analysis of computer experiments. Statistical Science, 4(4), 409–423.
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., ... & Tarantola, S. (2008). Global Sensitivity Analysis: The Primer. Wiley.
Stein, M. (1987). Large sample properties of simulations using Latin Hypercube Sampling. Technometrics, 29(2), 143–151.
Rasmussen, C.E., & Williams, C.K.I. (2006). Gaussian Processes for Machine Learning. MIT Press.
Raissi, M., Perdikaris, P., & Karniadakis, G.E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707.
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., & de Freitas, N. (2016). Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1), 148–175.
Neal, R.M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (pp. 113–162). CRC Press.
Gayzik, F.S., Moreno, D.P., Vavalle, N.A., Rhyne, A.C., & Stitzel, J.D. (2012). Development of the Global Human Body Models Consortium Midsize Male Full Body Model. Injury Biomechanics Symposium, IRCOBI.
Maeno, T., & Hasegawa, J. (2001). Development of a finite element model of the Total Human Model for Safety (THUMS) and application to car pedestrian impacts. Proceedings of the 17th International Technical Conference on Enhanced Safety of Vehicles (ESV), Paper 494.
Gehre, C., Gades, H., & Wernicke, P. (2009). Objective rating of signals using test and simulation responses. Proceedings of the 21st ESV Conference, Paper 09-0407.
ISO 18571:2014. Road vehicles — Objective rating metric for non-ambiguous signals. International Organization for Standardization.
Comellas, E., Pica Ciamarra, M., & Fortunato, G. (2020). Mechanical characterization of biological soft tissues. Journal of the Mechanical Behavior of Biomedical Materials, 103, 103551.
Cootes, T.F., Taylor, C.J., Cooper, D.H., & Graham, J. (1995). Active shape models — Their training and application. Computer Vision and Image Understanding, 61(1), 38–59.
Yoganandan, N., Pintar, F.A., Sances, A., Walsh, P.R., Ewing, C.L., Thomas, D.J., & Snyder, R.G. (1995). Biomechanics of skull fracture. Journal of Neurotrauma, 12(4), 659–668.
Shaw, G., Parent, D., Purtsezov, S., Lessley, D., Crandall, J., Kent, R., ... & Bass, C. (2009). Impact response of restrained PMHS in frontal sled tests. Stapp Car Crash Journal, 53, 1–42.

Computational SurrogateOptimization Using AI —FE Model Calibration

Executive Summary

Key Contributions

Introduction — The Virtual Testing Imperative

The Simulation Landscape

Human Body Models

ATD Finite Element Models

The Biofidelity Gap

The Calibration Problem — Structure and Difficulty

The Parameter Space

The Computational Bottleneck

Classical Methods and Their Limits

Surrogate Modeling — ML Emulation of FE Simulations

Training Data Generation: Latin Hypercube Sampling

Gaussian Process Regression (Kriging)

Sobol Global Sensitivity Analysis

Neural Network Surrogates for High-Dimensional Outputs

Bayesian Calibration — Posterior Parameter Estimation

The Kennedy-O'Hagan Framework

MCMC Sampling of the Posterior

Posterior Uncertainty Propagation

Active Learning for Efficient Simulation Budgets

Acquisition Functions for Surrogate Improvement

The Sequential Calibration Loop

Batch Active Learning for HPC Parallelism

ATD-HBM Correlation via Transfer Learning

The Biofidelity Transfer Problem

Training Data Construction

Transfer Learning Architecture

Physics-Informed Neural Networks for Tissue Mechanics

PINN Architecture for Constitutive Modeling

Application to Rib Cortical Bone

Comparison with Standard Neural Surrogate

Statistical Shape Models and Population Diversity

Constructing a Statistical Shape Model

ML-Predicted Injury Risk Across Shape Space

Interactive Sensitivity Explorer

The Continuum Approach

Conclusion

References

Computational Surrogate
Optimization Using AI —
FE Model Calibration