using EpiAware, Distributions
ar2 = AR(;
damp_priors=[truncated(Normal(0.2, 0.2), 0, 1),
truncated(Normal(0.1, 0.05), 0, 1)],
init_priors=[Normal(0, 0.2), Normal(0, 0.2)],
ϵ_t=HierarchicalNormal(std_prior=HalfNormal(0.1))
)Composable probabilistic models can lower barriers to rigorous infectious disease modelling
Recent outbreaks of Ebola, COVID-19 and mpox, and routine surveillance of endemic pathogens such as influenza, have demonstrated the value of modelling for synthesising data to inform decision making. Effective models require integration of expert domain knowledge from multiple domains and outputs to be timely enough to inform policy yet current modelling approaches create barriers to meeting these goals. Methods used to synthesise available data broadly fall into approaches that chain separate models together, offering flexibility but losing information and potentially introducing bias, or rigorous joint models that are often monolithic and difficult to adapt. These barriers have prevented advances across multiple settings where models could have provided actionable insights. Composable models where components can be reused across different contexts and combined in various configurations whilst maintaining statistical rigour could address these limitations. We outline proposed requirements for a composable infectious disease modelling framework and present a proof of concept that addresses these requirements through composable epidemiological components built on Julia’s type system and Turing.jl. We demonstrate a prototype R interface showing how such frameworks can bridge software ecosystems. Three case studies show how latent process components can be composed with epidemiological models to estimate time-varying reproduction numbers. The first replicates a COVID-19 analysis for South Korea using a renewal process. The second extends these components with reporting delays and day-of-week effects to replicate EpiNow2, a real-time nowcasting tool. The third replicates an ordinary differential equation model analysis of influenza outbreak data. We then discuss strengths, limitations, and alternative approaches. Our approach demonstrates promise for enabling interdisciplinary collaboration by lowering technical barriers for domain experts to contribute directly to model development. Future work is needed to solve remaining composability challenges, explore other options, expand the component library, and explore opportunities for large language model assisted model construction.
* These authors contributed equally to this work.
1 Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, United Kingdom
2 Center for Forecasting and Outbreak Analysis; Centers for Disease Control, United States of America
3 University of Cambridge, United Kingdom
4 MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, United Kingdom
5 Microsoft Health Futures, Microsoft Research, United States of America
✉ Correspondence: Sam Abbott <sam.abbott@lshtm.ac.uk>
Introduction
Modelling infectious disease dynamics can be a valuable tool for synthesising information and developing quantitative evidence for understanding and control [1]. The utility of such models lies in their ability to combine expert knowledge about transmission with the often complex ways in which infections are ultimately observed. During the COVID-19 pandemic, models integrating case counts, prevalence surveys, severity data, and hospital bed occupancy provided evidence that supported policy decisions [2–4], yet these models were not used during the 2022 mpox outbreak as contact structure and behaviour needed explicit representation and these models were not adapted to include this [5]. Similarly, cross-sectional viral load measurements have been shown to provide useful information on epidemic dynamics [6], yet this method has not been incorporated into most models used to inform policy even where such data was routinely available [3,7]. Multi-model efforts synthesise cross-domain expertise by combining forecasts or scenario projections from multiple teams (e.g., [8,9]) and have been shown to improve predictive accuracy [10,11], yet teams rarely collaborate on elements of their models despite shared goals. For evidence to inform policy it must be timely, explicit about methods and limitations, and as simple as possible whilst remaining rigorous [12]. In practice, while rigorous policy-relevant modelling is possible it usually comes at the cost of the flexibility needed for cross-domain collaboration and timely evidence generation.
Probabilistic programming languages (PPLs) such as Stan or LibBi [13,14], and modelling frameworks such as pomp and monty [15,16], have enabled rigorous joint modelling and inference, reduced development effort, improved model quality, and supported workflow best practices compared with manual specification [17–19]. However, they have not enabled the flexibility and cross-domain collaboration needed to provide evidence in a timely manner. This may be because they do not provide straightforward ways to share or reuse model components; the practical unit of reuse remains the whole model or its codebase, not the underlying epidemiological concepts. This fragmentation is evident in the field. For example, at least seven packages estimate effective reproduction numbers using renewal-based approaches in Stan [20–26], yet none share components despite overlapping features and, in some cases, the same authors. Reproduction number estimation using wastewater highlights challenges when different domain expertise is required: tools were implemented by non-wastewater experts with different levels of focus on sampling details, share no components and it is not clear which modelling choices matter [25,26]. To resolve these issues some efforts have aimed for greater generality: epinowcast provides a modular Bayesian framework supporting flexible model specification, and epidist [27] extends Bayesian Regression Models using Stan (brms) [28] for delay distribution estimation. Yet these, and other similar efforts, remain monolithic and inflexible, difficult for users to contribute to or adapt beyond supported functionality, and limited in scope.
To work around these flexibility constraints, some analyses choose to chain multiple models together, fitting separate models sequentially and passing outputs to subsequent stages [29–31]. This allows for flexibility but prevents full propagation of uncertainty and can introduce biased estimates compared with the rigour of fully joint models [32]. Modelling using data from the UK Office for National Statistics Community Infection Survey (CIS) demonstrates this approach and resulting challenges [3]. From April 2020 to March 2023, the CIS tested over four million swabs from more than 150,000 households at a cost exceeding £500 million. Identifiability concerns meant external modellers were able to access only summarised prevalence estimates rather than underlying observations. This created a cascade where estimates were used as inputs to subsequent analyses: prevalence estimates became inputs for incidence [33] and reproduction number estimates [34], which themselves informed policy analyses [4,35,36]. At each stage, uncertainty from earlier estimates was approximated and modelling assumptions inherited without the ability to evaluate their impact. This may have impacted the evidence used in policy-relevant decisions about variant transmissibility and severity [4,35,36].
Recent developments in computational statistics and scientific computing demonstrate the potential for composable modelling, in which components can be reused across contexts and combined in different configurations whilst maintaining statistical rigour, to address the tension between flexibility and rigour. As an example, advances in Turing.jl, a probabilistic programming language built in Julia [37], have introduced submodel interfaces that enable composable probabilistic programming, offering a pathway for epidemiological model composition [38,39]. In other fields, domain-specific languages (programming languages tailored to specific application domains that use terminology and concepts familiar to domain experts rather than general programming constructs) built in Julia have enabled composable models: HydroModels.jl [40] for hydrological modelling and SpeedyWeather.jl [41]. For the CIS example above, if those analysing the primary data used models composed from submodels, they could publish intermediate and fully deidentified results that others could, more easily, integrate into their modelling using Markov melding or related methods [42]. This would avoid implicit assumptions by making connections between components explicit and improve uncertainty propagation.
In this paper, we first propose requirements for composable infectious disease modelling based on the challenges we have identified, our experience developing infectious disease models during recent outbreaks, and established best practices for Bayesian modelling. We then present a proof of concept that addresses these requirements, combining the flexibility of chaining models together with the statistical rigour of joint models through composable epidemiological components. Our approach enables building block style model construction through standardised interfaces similar to those used in SpeedyWeather.jl [41]. Our proof of concept supports composability of a range of model families including ordinary differential equations and mixed equation types, with the potential for different computational backends, implemented as a domain-specific language intended to be implemented as optional package extensions [43]. We demonstrate our approach using an autoregressive model example to illustrate the proposed compositional pattern (building models by combining composable components) and component swapping capabilities. Through three case studies using the prototype, we show how these latent process components can be composed with epidemiological models to estimate time-varying reproduction numbers: the first replicates a COVID-19 analysis for South Korea using a renewal process [44]; the second extends these components with reporting delays and day-of-week effects to replicate EpiNow2, a real-time nowcasting tool [21]; the third replicates an ordinary differential equation model analysis of influenza outbreak data [45]. Finally, we discuss alternative design approaches, evaluate the strengths and limitations of our compositional approach, and identify key areas for future development.
Proposed requirements for composable infectious disease modelling
Based on the challenges outlined above, our experience developing infectious disease models during recent outbreaks, and established best practices for Bayesian modelling [18], we propose requirements for composable infectious disease modelling organised into six themes (Table 1). The requirements in the uncertainty and inference theme and the model structure theme aim to address building and fitting models with appropriate uncertainty propagation. The requirements in the population heterogeneity theme aim to address the need to model diseases across different groups whilst reducing the additional bookkeeping this typically requires. The requirements in the accessibility and adoption theme aim to lower barriers for diverse users, enable incremental uptake, and support the propagation of domain expertise into composed models. The requirements in the interoperability theme aim to ensure components work together and integrate with external tools. The requirements in the modelling workflow theme aim to support the development of validated models that can answer key stakeholder questions in decision maker relevant timelines [12], and enable emerging opportunities for large language model assisted model construction. Whilst we have presented this as a mapping of problems to requirements, in reality there is overlap; for example, standardised interfaces should enable component compatibility, support handling of multiple strata, and ensure proper uncertainty propagation.
| Theme | Proposed requirement | Motivating problem |
|---|---|---|
| Uncertainty & inference | Probabilistic model specification with full uncertainty propagation | Decisions must account for incomplete knowledge about infection risk, transmission, and observation [46] |
| Joint modelling and staged inference [42] with proper uncertainty propagation | Chaining models introduces implicit assumptions and cascading biases (see CIS example in the introduction) | |
| Automatic differentiation [47] | Complex models require efficient fitting methods, and many modern inference approaches rely on gradients [48] | |
| Switchable inference backends | The best inference method varies by model and data combination; no single approach works universally | |
| Model structure | Clear separation between model components | Infection processes, other latent processes, and observation processes require distinct expertise and should be reasoned about separately |
| Easy nesting of models within models | Adding new data sources (such as case observations to a wastewater model) or processes with existing models (such as incubation periods to early outbreak analysis) can require modifying entire models (Figure 1) | |
| Support for multiple modelling paradigms | Infectious disease models take many forms (such as compartmental models, agent-based simulations, and network transmission models) | |
| Population heterogeneity | Arbitrary stratification for all model components | Diseases affect populations heterogeneously across age, location, and risk group [1] |
| Automatic management of multi-group structure | Managing dimensions and interactions across groups (such as age groups and locations) can be complex and error-prone | |
| Programmatic generation and modification of model components | Extending models to additional groups (such as age groups or locations) and adapting existing components often requires repetitive code | |
| Partial pooling [49] in order to share information across groups | Sparse data in individual groups can require borrowing information between them | |
| Accessibility & adoption | Clear, concise modelling language | Diverse users including those without programming expertise need to specify models quickly |
| Components contain both the structure and parameter prior distributions | Domain expertise can be difficult to transfer between models or modelling teams | |
| Support for incremental adoption of the approach | Requiring complete rebuilds of existing models can be wasteful and time-consuming [12] | |
| Components functional as standalone tools | Components should provide value even outside the compositional framework | |
| Interoperability | Standardised interfaces between components | Components developed independently may be incompatible and uncertainty may not propagate correctly between them |
| Data-agnostic model definitions | Models need to generalise across datasets, support prior predictive checks, and enable forecasting | |
| Composability with machine learning (ML) approaches | Integration with neural networks, Gaussian processes, and other ML tools enables leveraging that expertise [50] | |
| Modelling workflow | A single specification for simulation and inference | Both simulation (for prior predictive checks and forecasting) and inference are needed [18], but implementing both separately can be resource intensive and difficult to keep in sync |
| Rapid model iteration | Modelling workflows [18], especially during outbreaks where evidence and requirements can change rapidly [12], require models that are easy to iterate on | |
| Easy model and model component validation and exploration | Complex models can be difficult to debug, validate, and understand | |
| Modelling language optimised for large language model assisted construction with automated validation | Large language models may be useful for constructing models but can make errors and often do not fully capture expert knowledge [51] |
Our approach
Meeting these requirements requires programming with probabilities, for which probabilistic programming languages are designed [17]. We also need a probabilistic programming language that supports automatic differentiation for modern inference, the ability to program over model structure itself to enable model nesting and composition, and access to as wide an ecosystem as possible to avoid lock-in and enable integration with existing scientific computing tools. As far as we are aware, only probabilistic programming languages built in Julia [37] provide the metaprogramming capabilities (writing code that generates or manipulates other code) needed to create domain-specific abstractions that can handle arbitrary stratification, standardised interfaces between components, and programming over the model structure. Among Julia’s PPL options, Turing.jl [39] best meets our requirements with mature submodel support for nesting models within models, extensive inference algorithm choices, and its implementation as a light abstraction layer (a simplified interface that hides complex implementation details whilst enabling access to underlying functionality) on top of the wider Julia ecosystem. Additional benefits of Julia include eliminating the two-language problem, leveraging multiple dispatch (a programming paradigm where function behaviour is determined by the types of all arguments rather than just the first, enabling automatic selection of the most specific implementation for given inputs) for clean component composition, and accessing the mature SciML ecosystem [52] for differential equations, neural networks [50,53], and other scientific computing tools.
Our approach uses a two-layer architecture with a high-level domain-specific language (DSL) for epidemiological modelling and a low-level implementation using Turing.jl. Though importantly we are not locked in to this choice as the DSL is agnostic of the backend used. This separation enables incremental adoption without rebuilding existing models to use our DSL, with all components remaining functional as standalone tools outside the compositional framework. The DSL layer provides clear, concise model specification using epidemiological concepts, enabling domain experts to contribute components encapsulating their specialised knowledge without understanding the low-level details of all framework elements. The backend layer aims to handle the automated bookkeeping of stratification, interface validation, and uncertainty propagation whilst supporting multiple inference approaches and auto differentiation options by leveraging the Turing.jl and wider Julia ecosystems. This structure enables integration of diverse data sources by allowing researchers to add a submodel with its own latent and observation process to an existing model, for example adding wastewater measurements to a case-based renewal model or clinical biomarkers to transmission dynamics without modifying the core infection model. It also facilitates staged inference approaches such as Markov melding [42], meaning that components can be fitted separately and combined with proper uncertainty propagation.
To demonstrate cross-ecosystem accessibility, we also developed EpiAwareR [54], an R interface to the prototype using JuliaCall [55]. This enables R users to access the compositional modelling framework without requiring Julia programming knowledge, specifying models using familiar R syntax while gaining access to the full composable component library.
Figure 1 demonstrates how our proposed approach could enable component reuse across different epidemiological applications. Four example applications (wastewater surveillance, biomarker modelling, early outbreak analysis, and incubation period estimation) each compose models from components of different types. The schematic also highlights two examples of component reuse. The incubation period model appears in all four applications, and the within-host viral kinetics model is shared between biomarker modelling and wastewater surveillance. This highlights how components developed for one application could be incorporated into others when they share common underlying processes.
Domain-specific language structure
Our prototype DSL builds on Julia’s type system to enable composable epidemiological modelling through two key design patterns. First, abstract types define interfaces that implementations must follow, analogous to contracts specifying what operations a model component must support rather than how it implements them. All model components inherit from a parent AbstractModel type, establishing a common foundation whilst allowing specialised behaviour through subtypes. Second, structures (data containers that group related variables and their types together) contain other structures as fields, a struct-in-struct pattern, similar to the approach used in SpeedyWeather.jl [41], that allows complex models to be built by nesting simpler components. This pattern enables models to be assembled like building blocks whilst maintaining clear boundaries between different epidemiological processes.
We organise model components into three abstract type hierarchies, each of which inherits from AbstractModel, corresponding to distinct epidemiological processes. AbstractEpiModel represents infection generation processes such as renewal models or ordinary differential equation transmission dynamics. AbstractLatentModel captures time-varying parameters and unobserved processes such as changing reproduction numbers or reporting rates, implemented through structures like autoregressive processes, random walks, or moving averages. AbstractObservationModel links latent states to observed data by encoding measurement processes such as reporting delays, aggregation over time periods, and observation error distributions. These structures are data-agnostic, specifying what to do when they encounter data rather than containing data themselves, making model definitions reusable across different datasets and scenarios. Each hierarchy supports multiple concrete implementations that can be swapped to compare modelling assumptions whilst keeping other components fixed.
Models can compose across these hierarchies rather than being restricted to combining components within a single type. For example, an observation model can contain a latent process as a field to represent time-varying ascertainment, or wrap another observation model to add reporting delays. This cross-hierarchy composition extends the building block analogy to include more complex models. For instance, we can construct a delay convolution observation model, LatentDelay, using an underlying observation model and a delay distribution model.
The EpiProblem structure is a top-level container assembling these components into a complete epidemiological model. It holds an infection process (epi_model), a latent process (latent_model), an observation process (observation_model), and a time span for inference or simulation. The latent model generates time-varying epidemiological parameters, the infection model uses these parameters to simulate disease transmission and generate infections, and the observation model links these latent infections to observed data through measurement processes. However, this is just the top-level structure: each submodel can itself contain any of these abstract types, enabling flexible composition such as observation models that include latent processes for time-varying ascertainment. Structures can be modified in place using tools like Accessors.jl [56], enabling both model iteration and patterns such as partially pooled models that update low-level priors based on grouping structures. The abstract type system enables shared methods across all model components, such as print methods (shown below) for displaying model specifications or functions for visualising model directed acyclic graphs. This approach also allows for the creation of mappings between submodels such as one to one, one to many, and many to many mappings so that, for example, a single infection process can be linked to multiple observations models by specifying a mapping model.
To demonstrate the approach we focus on AbstractLatentModels, and build an ARIMA(2,1,1) process through composition, as shown in the ARIMA panel of Figure 2. In the case studies (Section 4) this is then composed with infection and observation models to estimate time-varying reproduction numbers. We start with an autoregressive order two (AR(2)) process. Mathematically, the AR(2) process is:
\[Z_t = \rho_1 Z_{t-1} + \rho_2 Z_{t-2} + \epsilon_t, \quad \epsilon_t \sim \text{Normal}(0, \sigma)\]
In our prototype DSL, this is defined using the AR struct. We use priors inspired by [57] for illustration. Prior distributions are specified using Distributions.jl [58], the standard probability distributions package in the Julia ecosystem, which provides a unified interface for probability distributions that is interoperable with both our framework and Turing.jl.
This constructor has created the following struct definition.
ar2AR(damp_prior = Product(Truncated(0.2, 0.2, 0.0, 1.0)),
init_prior = TuringScalMvNormal([0.0, 0.0], 0.2),
p = 2,
ϵ_t = HierarchicalNormal(mean = 0.0,
std_prior = HalfNormal(0.1),
add_mean = false))
Prior samples from this model are shown in Figure 3 A. Another common latent model is the moving average model
\[Z_t = \epsilon_t + \theta \epsilon_{t-1}, \quad \epsilon_t \sim \text{Normal}(0, \sigma)\]
The MA struct defines this in the same way that the AR did for the AR process.
ma1 = MA(;
θ_priors=[truncated(Normal(0.0, 0.2), -1, 1)],
ϵ_t=HierarchicalNormal(std_prior=HalfNormal(0.1))
)Prior samples from this process are shown in Figure 3 B. A popular combination of these models is the autoregressive moving average (ARMA) model that can have different orders for both the AR and MA components. An ARMA(2,1) can be defined as:
\[Z_t = \rho_1 Z_{t-1} + \rho_2 Z_{t-2} + \epsilon_t + \theta \epsilon_{t-1}\]
The ARIMA panel of Figure 2 shows this composition as a directed graph where AR(2) connects to MA(1) to form the combined process. In our DSL this can be represented as a composition of the AR and MA structs by updating the AR error term:
using Accessors
arma21 = @set ar2.ϵ_t = ma1The result of this step has been to update the ar2 struct so that the definition of the moving average model is nested inside it.
arma21AR(damp_prior = Product(Truncated(0.2, 0.2, 0.0, 1.0)),
init_prior = TuringScalMvNormal([0.0, 0.0], 0.2),
p = 2,
ϵ_t = MA(θ = Product(Truncated(0.0, 0.2, -1.0, 1.0)),
q = 1,
ϵ_t = HierarchicalNormal(mean = 0.0,
std_prior = HalfNormal(0.1),
add_mean = false)))
The combined model dynamics are shown in Figure 3 C. Similarly, autoregressive integrated moving average (ARIMA) models extend ARMA by adding differencing operations that transform the series
\[\Delta Z_t = Z_{t} - Z_{t-1}\]
So that the ARIMA(2,1,1) model is defined as an ARMA(2,1) model for the first order differences:
\[\Delta Z_t = \rho_1 \Delta Z_{t-1} + \rho_2 \Delta Z_{t-2} + \epsilon_t + \theta \epsilon_{t-1}\]
We compose the ARMA model with a differencing operation using the DiffLatentModel wrapper:
arima211 = DiffLatentModel(arma21, Normal(0, 0.2); d=1)Prior samples from the full model are shown in Figure 3 D. Alternatively, EpiAware provides an arima() constructor function that simplifies this specification.
Other latent models extend modelling options through combining models additively, multiplicative scaling, and piecewise processes. This approach enables representation of arbitrary latent processes through composition. In the case studies (Figure 2), we use these latent processes to generate trajectories of time-varying reproduction numbers \(R_t\). These are then linked with infection models via AbstractEpiModel implementations such as Renewal, which computes expected infections using the discrete-time renewal equation \(I_t = R_t \sum_{s} g_s I_{t-s}\) where \(g_s\) is the generation interval distribution. These latent process models are also used to capture time-varying ascertainment rates in observation models.
Backend implementation: Turing.jl interface
Our DSL translates to Turing.jl models through generate functions that dispatch on abstract type hierarchies. Each abstract type (AbstractLatentModel, AbstractEpiModel, AbstractObservationModel) has a generate function interface that concrete model types must implement a method to dispatch upon. When a generate function receives a model component, Julia’s multiple dispatch automatically selects the appropriate implementation based on the component’s type, producing Turing.jl [39] code using the @model macro. Unit tests are used to ensure concrete implementations satisfy interface requirements, enabling all components to interoperate correctly regardless of which specific model types are composed together. This approach means new model types integrate seamlessly without modifying existing code, and users can swap components to compare modelling assumptions whilst keeping other parts of the model fixed.
The generated Turing.jl models serve dual purposes: simulation by sampling from prior distributions, or inference by conditioning on observed data and applying Turing.jl’s suite of algorithms including gradient-based methods like the No U-Turn Sampler (NUTS) [48]. These are standard Turing.jl models with no special constraints beyond those imposed by the Turing.jl framework itself, supporting all DynamicPPL.jl [59] operations such as parameter fixing, model conditioning, and posterior predictive sampling. Generated components can be used directly as standalone models or nested within other Turing.jl models using the @submodel macro, enabling incremental adoption where only parts of a model use the compositional framework whilst other parts use custom Turing.jl.
Many computational components are backend-agnostic, containing no probabilistic programming constructs and therefore portable to other frameworks. For example, we use a common pattern, accumulate_scan, which builds on the base accumulate function to model iterative temporal processes through step functions. These step functions are structs built on the AbstractAccumulationStep interface that implement a callable (a struct that can be invoked like a function) defining the single-step update rule. Mathematical utilities such as functions for converting between reproduction numbers and growth rates, discretising continuous delay distributions, and reparameterising observation error models are similarly backend-agnostic. This separation enables a package extension pattern where standard Julia packages provide domain-specific functionality (e.g. CensoredDistributions.jl [60]) whilst the compositional layer is added as an optional extension, allowing users to adopt the framework incrementally without requiring their entire workflow to commit to the compositional approach.
Returning to our example from the DSL section, the autoregressive process can be mapped from its high-level representation to a Turing.jl model using the generate_latent function and multiple dispatch, which generates the following Turing.jl model:
@model function EpiAwareBase.generate_latent(latent_model::AR, n)
p = latent_model.p
@assert n>p "n must be longer than order of the autoregressive process"
ar_init ~ latent_model.init_prior
damp_AR ~ latent_model.damp_prior
@submodel ϵ_t = generate_latent(latent_model.ϵ_t, n - p)
ar = accumulate_scan(ARStep(damp_AR), ar_init, ϵ_t)
return ar
endThe key line @submodel ε_t = generate_latent(latent_model.ε_t, n - p) enables composition by delegating to whatever error model was provided (here, the MA(1) component). The AR dynamics are implemented through a custom accumulation step that maintains the autoregressive state.
function (ar::ARStep)(state, ϵ)
new_val = dot(ar.damp_AR, state) + ϵ
new_state = vcat(state[2:end], new_val)
return new_state
endThis step function works with accumulate_scan to build the AR series by applying the autoregressive equation at each time step. The MA model has a similar structure with its own internal step function. The accumulate_scan pattern enables composable iteration steps, allowing complex processes like renewal models with susceptible depletion to be built by composing simple step operations. This design means we only need to write the single-step operation without worrying about the iteration process, making components more modular and reusable.
The full ARIMA(2,1,1) model we defined in the DSL section can then be generated using the same approach, producing the following Turing.jl model:
@model function EpiAwareBase.generate_latent(latent_model::DiffLatentModel, n)
d = latent_model.d
@assert n>d "n must be longer than d"
latent_init ~ latent_model.init_prior
@submodel diff_latent = generate_latent(latent_model.model, n - d)
return _combine_diff(latent_init, diff_latent, d)
endThe DiffLatentModel’s @submodel diff_latent = generate_latent(latent_model.model, n - d) calls the ARMA model, which in turn calls its composed AR and MA components, then applies differencing through cumulative summation. This recursion through @submodel enables arbitrary composition while maintaining separation between components.
To demonstrate fitting, we combine our ARIMA(2,1,1) with a log-parameterised Poisson observation model, modelling the growth rate using our ARIMA process. From here we use Turing.jl functionality directly to illustrate the interoperability between our DSL and the underlying PPL.
using DynamicPPL, Turing
@model function arima_with_obs(arima_spec, n_timesteps)
@submodel Z_t = generate_latent(arima_spec, n_timesteps)
y_obs ~ product_distribution(LogPoisson.(Z_t))
return y_obs
endWe can then generate synthetic data with fixed parameters which we sample from the prior distribution using the rand, and fix functions and calling the model to simulate the observations. We first define the model.
n_timesteps = 40
gen_model = arima_with_obs(arima211, n_timesteps)Then sample from it,
simulated_params = rand(gen_model)Now we have parameters we can simulated some data using the generative model by fixing the random variables using the sampled parameters and the fix function. We can then call it, like any normal function, to get simulated observations for y.
fixed_model = fix(gen_model, simulated_params)
y_observed = fixed_model()For inference, we condition on the generative model using simulated observations and the condition function or here the equivalent | notation. Now we have a model conditioned on data we can fit it using our choice of approach supported by Turing.jl. Here we decide to use the No-U-Turn Sampler (NUTS) [48], a popular variant of MCMC.
conditioned_model = gen_model | (; y_obs = y_observed)
chains = sample(conditioned_model, NUTS(), MCMCThreads(), 2000, 4)We can then compare our posterior distributions to the true sampled values from our ARIMA(2, 1, 1) model using PairPlots.jl [61] with the CairoMakie.jl [62] backend for visualisation (Figure 3 E). The posterior distributions recover the simulated parameter values.
Case studies
We demonstrate our prototype compositional modelling DSL by recreating three published epidemiological analyses (Figure 2): “On the derivation of the renewal equation from an age-dependent branching process” [44], “EpiNow2: Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters” [63], and “Contemporary statistical inference for infectious disease models using Stan” [45]. Figure 2 highlights component reuse across the case studies: Mishra et al. reuses the AR(2) component from the ARIMA example, the EpiNow2 replication reuses the full ARIMA structure, and Chatzilena et al. demonstrates an alternative Susceptible-Infected-Recovered (SIR) infection process.
We provide an overview of each case study in the following subsections, with full implementation details in the Supplementary Information. The first two case studies have also been replicated using the EpiAwareR R interface [54], with R implementations available in the package vignettes, demonstrating cross-ecosystem accessibility. All code and data for reproducing the analyses are available [64].
On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective
Mishra et al. [44] estimate time-varying effective reproduction numbers (\(R_t\), the average number of secondary infections caused by an infected individual at time \(t\)) from case data by combining the renewal equation with a negative binomial observation model.
Our replication composes an AR(2) latent process for log \(R_t\) (Figure 2), a renewal infection process [65] using a discretised serial interval distribution [66], and a negative binomial observation model for case counts. We use AR to generate the latent log \(R_t\) trajectory, Renewal to compute expected infections via the renewal equation \(I_t = R_t \sum_{s} g_s I_{t-s}\), and NegativeBinomialError to link expected infections to observed cases with overdispersion. We assemble these components and use Pathfinder [67] to initialise NUTS sampling.
Figure 4 shows model components and posterior analysis. Panels A-C demonstrate prior predictive checks for each component in isolation: the AR(2) latent process for log \(R_t\), the renewal model for latent infections, and the negative binomial observation model. Panel D compares the continuous serial interval distribution with its discretised form. Panels E-F show posterior predictive distributions for daily cases and time-varying \(R_t\), recovering the main finding that \(R_t\) in South Korea peaked at approximately 10 before rapidly dropping below 1 in early March 2020.
EpiNow2: Estimate real-time case counts and time-varying epidemiological parameters
EpiNow2 [63] is a widely used package for real-time situational awareness in infectious disease surveillance, estimating case counts and epidemiological parameters whilst accounting for reporting delays, right-truncation, and day-of-week effects.
Our replication reuses the ARIMA(2,1,1) and negative binomial components from the main text and Section 4.1, extending them with piecewise constant weekly \(R_t\) values, incubation period and reporting delay convolutions [68], and day-of-week reporting effects (Figure 2). Unlike Section 4.1 which uses the serial interval, this case study uses the generation time distribution; whilst serial intervals are often used as a proxy, this can be problematic [69] and is primarily done when generation time estimates are unavailable [70]. We use broadcast_weekly to convert the ARIMA(2,1,1) process to piecewise constant weekly \(R_t\) values, LatentDelay to convolve latent infections with incubation period and reporting delay distributions, and ascertainment_dayofweek to multiply expected observations by day-of-week effects via a softmax transformation.
Figure 5 shows model components and posterior analysis. Panels A-C show prior predictive checks for the piecewise constant weekly ARIMA process, the renewal model, and the composite observation model with delays and day-of-week effects. Panel D displays the incubation period and reporting delay distributions. Panels E-F show posterior predictive distributions, demonstrating that the model explicitly accounts for reporting delays whilst recovering similar \(R_t\) dynamics to Section 4.1.
Contemporary statistical inference for infectious disease models using Stan
Chatzilena et al. [45] demonstrate Bayesian inference for compartmental disease models using Stan, fitting an SIR model to single strain influenza outbreak data with both a simple Poisson observation model and a stochastic model with time-varying ascertainment to account for model misspecification and observation error.
Our replication introduces the SIR model as an alternative infection generating process to the renewal model used in the previous case studies (Figure 2). We compose an SIR compartmental model using ODEProcess from the SciML ecosystem [71], a Poisson observation model, and a stochastic observation model using Ascertainment for time-varying effects parameterised as an AR(1) process. We use a softplus transformation to ensure numerical stability when the ODE solver returns small negative values.
Figure 6 shows model components and posterior analysis. Panels A-B demonstrate prior predictive checks for the SIR dynamics and stochastic observation model including time-varying ascertainment. Panel C compares posterior \(R_0\) distributions, with both models recovering \(R_0 \approx 2\), consistent with Chatzilena et al. Panels D-F show posterior compartment trajectories and predictive cases. The deterministic model posterior predictive trajectories (Figure 6 E) appear less well calibrated than those from the stochastic model (Figure 6 F), with several data points falling outside the 95% prediction envelopes. This indicates that the flexible observation model compensates for potential misspecification in the deterministic model.
Discussion
In this paper we have identified barriers in epidemiological modelling arising from the tension between statistical rigour and the flexibility needed for cross-domain collaboration and timely policy-relevant evidence, proposed compositional infectious disease models as a solution, laid out a set of requirements for such frameworks, and showcased a proof of concept approach that meets these requirements. Our proof of concept consists of a domain specific language (DSL) built on top of the Turing.jl probabilistic programming framework, enabling compositional model construction through reusable components. This approach enables building block style model construction, maintaining the statistical rigour of joint models whilst providing the flexibility of chaining together simpler models. The autoregressive model example illustrated how more complex models emerge from simple component combinations using our proposed struct-in-struct pattern (where structures, data containers that group related variables and their types together, contain other structures as fields). Our three case studies (Figure 2), which were based on previous studies [21,44,45], demonstrate how these latent process components composed with infection and observation models can address real-world epidemiological problems across a range of modelling approaches.
A limitation of our proposal is that the requirements were developed without broader community input. However, they were based on our experience building infectious disease modelling frameworks and applied modelling. A key strength of our proposed approach is its modular design, which enables faster model development and component reuse whilst allowing comparison of modelling assumptions without significant reimplementation. For instance, adding a new data source such as wastewater surveillance to an existing case-based model requires only defining a wastewater submodel with its own latent and observation process, avoiding the need to reimplement existing components. This modularity also enables interdisciplinary collaboration, as domain experts such as virologists, clinicians, or environmental scientists can contribute specialised components encoding their knowledge without needing to review other components or understand the programming details. This design supports routine forecasting, where validated components can be refined across seasons for established forecasting and monitoring efforts [72–74]. It also supports outbreak response, where analysts could rapidly adapt existing models to novel pathogens by swapping infection processes whilst retaining validated observation components. For multi-model efforts, a diversity of approaches often improves robustness of findings (e.g., [4,75]), but the lack of common components risks masking differences in model assumptions from differences in implementation, preventing attribution of different results to their underlying causes [76]. However, our proposed approach is a proof of concept and not designed for ongoing use or widespread adoption. We only partially implemented support for automated mappings between latent and observed states, partial pooling approaches, and the current component library remains limited to relatively basic epidemiological patterns. The DSL layer manipulation tooling also cannot coordinate updates of related parameters such as model order and corresponding priors without manual intervention. Whilst our proof of concept has these limitations, the three case studies establish viability for real-world epidemiological problems.
A final limitation of our approach is the learning curve required to adopt this compositional pattern, though this should be mitigated as researchers can embed elements within their existing models and implement their own compositional elements with minimal understanding of the overall architecture.
Key strengths of building on Turing.jl [39] include mature submodel support for nesting models within models, extensive inference algorithm choices, and integration with the wider Julia ecosystem through Distributions.jl [58] and other packages. However, building on an evolving probabilistic programming language involves practical considerations. The Turing.jl PPL is not fully stable and since developing our DSL, the @submodel macro has changed syntax with breaking changes in how it handles prefixes and variables within submodels, meaning our current implementation is not compatible with the latest release. Turing.jl also has limited handling of numerical instability, so large counts being simulated can cause errors, and vectors that combine missing values with other values, which require manual handling. Profiling tools can be difficult to use because the probabilistic programming abstractions and macro expansions obscure the relationship between source code and runtime behaviour, making it challenging to identify performance bottlenecks. That said, recent Turing.jl changes may benefit composability: the reduced difference between distributions and submodels could enable more flexible model construction, and new conditioning workflows avoid passing observations through model layers.
Implementing in Julia [37] provides access to the full features of a programming language rather than a restricted modelling syntax, multiple dispatch for clean component composition, and the mature SciML ecosystem [52] for differential equations and scientific computing. A key strength of the Julia ecosystem for compositional work is that domain experts develop specialised tools that interoperate through shared interfaces. However, this distributed development also creates challenges as it can be difficult to locate appropriate tools, and when issues emerge users may not know whether problems originate in Turing.jl, our DSL layer, or elsewhere in the ecosystem. Computational overhead from abstraction layers is also a concern, though new automatic differentiation backends like Enzyme [77] and Mooncake [78] could reduce costs in future implementations. Whilst our approach requires implementation in Julia, which is not currently common in epidemiology, the EpiAwareR R interface [54] demonstrates that such frameworks can bridge software ecosystems, and similar interfaces could be built for Python using PythonCall.jl [79].
Alternative approaches to compositional modelling exist in the literature. Category theory [80] provides formal mathematical foundations for composability, enabling complex models to be built from smaller building blocks, with the composed model being correct by construction. The AlgebraicJulia ecosystem demonstrates how category theory can be operationalised in software, with AlgebraicPetri.jl [81] and AlgebraicDynamics.jl [82] implementing operadic composition for hierarchical model construction. Symbolic-numeric frameworks such as ModelingToolkit.jl [83] and Catalyst.jl [84] from the SciML ecosystem [52] offer acausal component-based models with performance improvements through automated parallelisation and equation simplification, and link to AlgebraicJulia through shared reaction network representations [85]. In comparison, our approach currently lacks the formal compositional guarantees of category theory and automated optimisations of symbolic-numeric frameworks. However, current AlgebraicJulia tools focus on composing dynamical systems and provide limited direct support for probabilistic modelling; in contrast, our approach targets the full range of probabilistic models used in infectious disease inference. Although category theory could in principle provide guardrails for composing probabilistic models, the software does not yet support this. More complex compartmental models could be composed from reusable reaction components using AlgebraicPetri.jl or Catalyst.jl, with our approach managing the probabilistic aspects such as priors and observation models.
In comparison to our proposed compositional DSL approach, domain-specific infectious disease tools currently provide useful features but are generally designed for a limited number of use cases and do not cover the full range of epidemiological modelling approaches needed in practice [20–22,24]. Of the currently available tools, agent-based modelling approaches show the most promise for compositional modelling [86,87]. However, ABM calibration remains challenging, though differentiable ABMs may improve this [88], and there are settings where ABM approach may not be appropriate. A key aspect of our proposal is being backend agnostic: our DSL could support agent-based backends [89], and could also support hybrid approaches [90]. Generic modelling frameworks can be easier to generalise but difficult to adapt to include domain specific elements [91]. Probabilistic programming languages (PPLs) commonly used in infectious disease modelling also do not meet our proposed composability requirements. Stan is probably the most commonly used, but is optimised for implementing complete models rather than composing submodels [13]. Domain-specific PPLs such as odin/monty and greta trade generality for specialisation, none of the available options fully support nesting models within models, and there is limited support for state-of-the-art inference methods [16,92,93]. Alternative PPLs could support compositional approaches. Gen.jl [94] focuses on lower-level implementation of probabilistic programs compared to Turing.jl, with composition support through nested generative function calls and combinators of generative functions. Despite this, Gen-based tools demonstrate useful functionality for epidemiological modelling, such as AutoGP.jl [95], which implements real-time inference and automated composition of Gaussian process kernels. Genify [96] provides an approach that translates Julia code into Gen models, potentially easing adoption. Python-based probabilistic programming languages such as NumPyro [97] and Oryx [98] built on JAX [99] could enable similar composability [101], with potential advantages from JAX’s focus on efficiency and GPU scaling. However, the smaller JAX-specific ecosystem compared to general Python packages, JAX’s optimisation for neural network tasks, and Python PPLs’ more programmer-oriented syntax that diverges from mathematical notation may create barriers for epidemiological modellers. As well as imperative approaches like Turing.jl, where models are written as procedural code, declarative graph-based approaches such as JuliaBUGS.jl [102] and RxInfer.jl [103] specify models through explicit conditional dependencies between variables. This graph-based representation offers several advantages including more transparent model structure and assumptions, and efficient inference through algorithms that leverage the model’s graphical structure. However, the declarative approach trades off the flexibility of Turing.jl’s imperative style for procedural code, such as loops over recursive updates for dynamical systems, which may limit the range of epidemiological models that can be composed. Building a domain-specific backend on Distributions.jl rather than Turing.jl could enable compatibility with multiple PPLs including JuliaBUGS.jl whilst trading off Turing.jl’s expressive submodel interface.
Areas for future work on our approach include gathering community feedback on the proposed requirements, and expanding the component library to address epidemiological applications across multiple scales and data types. More work is also needed to allow for modifying deeply nested model specifications without the user needing to be aware of the structure of the nested model. Exploring the alternative approaches discussed above also warrants further investigation. This includes integrating our approach with AlgebraicJulia tools for composing dynamical systems, extending category theory based software to support probabilistic models, using symbolic-numeric frameworks for automated optimisation, and exploring alternative PPL backends. Methodological advances are also needed for joint estimation of interdependent epidemiological parameters and integration of individual and population-level observations. Further developing and integrating modular Bayesian inference methods, including Markov melding [42] for combining submodels fitted separately and cut likelihoods [104] for controlling feedback between modules, could enable methodological sharing that is currently difficult. Better understanding where diversity in modelling assumptions is important in multi-model efforts, and separating such differences from implementation details, also requires further study. Future surveillance programmes could also be designed with composable modelling frameworks in mind, enabling estimates and model components to be shared and reused from initial data collection through to policy-relevant analyses. Compositional frameworks also create opportunities for large language model integration. Language models could serve as model construction agents, using a composable framework to construct epidemiological models from component libraries [51]. The explicit structure and validation tools of a composable framework should enable language models to reason about model design and propose structural adaptations more easily and with less room for error. Further work is needed to assess this.
Ecosystem developments would also support composable modelling and could reduce the need for domain-specific abstractions. Turing.jl could better support composability through improvements to submodel handling, for example by returning random variables from nested submodels without requiring explicit return statements, enabling flexible conditioning on submodels, and simplifying post-processing when variable names change due to prefixing. Other ecosystem improvements such as better numerical stability handling, debugging tools that preserve probabilistic model semantics, and generalised inference chaining at the PPL level would also benefit composable approaches. Improved tooling for the Bayesian workflow [18,19] would also strengthen the case for composable modelling in Julia. Improving inference method efficiency through GPU support and within-thread parallelisation would enable scaling composable models to larger problems. One sensible development path would be to build a Turing.jl based composable version of the Bayesian Regression modelling package [28], an R package for flexible Bayesian regression modelling, which could provide a non-domain-specific testbed for composable components as well as being directly usable with our domain specific approach. Finally, ongoing work on compiling Julia applications to standalone executables could enable composable interfaces in other languages with fewer dependencies.
Composable infectious disease modelling offers a path to maintain statistical rigour whilst enabling the rapid model development needed to inform policy decisions. By enabling domain experts to contribute specialised components without understanding entire modelling frameworks, such approaches could accelerate integration of diverse expertise for both routine surveillance and outbreak response. Composable frameworks are also likely key for enabling robust large language model assisted model construction, where explicit component structure and validation tools could reduce errors. The proof of concept presented here indicates that this approach is feasible but substantial work remains to realise this potential. Given the unpredictable nature of future infectious disease threats, investment in adaptable modelling infrastructure that can incorporate diverse data sources and domain knowledge to provide timely evidence for policy is critical.
References
Acknowledgements
We thank the epiforecasts team for helpful comments and suggestions. We thank Poppy for growling as needed.
Data and software availability
All code and data for reproducing the analyses are available [64].
Funding information
We acknowledge the financial support from CDC Grant NU38FT00008 (K.J., S.A, S.F.). This project was made possible by cooperative agreement CDC-RFA-FT-23-0069 from the CDC’s Center for Forecasting and Outbreak Analytics. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention. We acknowledge the financial support from Wellcome 210758/Z/18/Z (S.F.).
AI disclosure
Claude Sonnet 4.5 (https://www.anthropic.com/claude/sonnet) and Claude Opus 4.5 (https://www.anthropic.com/claude/opus) were used to assist with preparation of the manuscript text and as part of code review for the EpiAware and EpiAwareR packages and the analysis applying them to the case studies presented here.