Introduction

This vignette demonstrates how to replicate the analysis from Mishra et al. (2020) of COVID-19 transmission dynamics in South Korea using EpiAwareR. This case study showcases the compositional modelling approach by building a complete epidemiological model from three reusable components.

The analysis estimates the time-varying reproduction number (RtR_t) using:

  • AR(2) latent process for smooth evolution of log RtR_t
  • Renewal equation for infection dynamics
  • Negative binomial observation model for overdispersed case counts

Data Preparation

For this vignette, we simulate data similar to the South Korea COVID-19 outbreak analyzed in Mishra et al. (2020):

# Simulated daily case counts
set.seed(123)
dates <- seq.Date(as.Date("2020-02-15"), by = "day", length.out = 36)

# Simulate epidemic with changing Rt (intervention effect)
infections <- numeric(36)
infections[1:7] <- 50
for (i in 8:36) {
  rt <- if (i < 20) 2.5 else 0.8 # Intervention reduces transmission
  infections[i] <- rpois(1, rt * mean(infections[(i - 7):(i - 1)]))
}

training_data <- data.frame(
  date = dates,
  y_t = pmax(infections, 1) # Ensure positive counts
)

head(training_data)
#>         date y_t
#> 1 2020-02-15  50
#> 2 2020-02-16  50
#> 3 2020-02-17  50
#> 4 2020-02-18  50
#> 5 2020-02-19  50
#> 6 2020-02-20  50

Model Components

1. Latent Process: AR(2) Model

The AR(2) process models the evolution of log RtR_t over time with temporal autocorrelation:

logRt=ϕ1logRt1+ϕ2logRt2+ϵt\log R_t = \phi_1 \log R_{t-1} + \phi_2 \log R_{t-2} + \epsilon_t

where ϵtNormal(0,σ2)\epsilon_t \sim \text{Normal}(0, \sigma^2).

ar2 <- AR(
  order = 2,
  damp_priors = list(
    truncnorm(0.2, 0.2, 0, 1), # φ₁: First damping coefficient
    truncnorm(0.1, 0.05, 0, 1) # φ₂: Second damping coefficient
  ),
  init_priors = list(
    norm(0, 0.2), # Initial value for lag 1
    norm(0, 0.2) # Initial value for lag 2
  ),
  std_prior = halfnorm(0.1) # σ: Innovation standard deviation
)

print(ar2)
#> <EpiAware AR(2) Latent Model>
#>   Damping priors: 2 
#>   Init priors: 2 
#>   Innovation std prior: specified

Prior choices (from Mishra et al. 2020):

  • Damping coefficients truncated to [0,1] ensure stationarity
  • Weak priors on initial values (centered at 0 on log scale → Rt1R_t \approx 1)
  • Small innovation variance for smooth trajectories

2. Infection Model: Renewal Equation

The renewal equation models new infections based on past infections and generation time:

It=Rts=1tItsgsI_t = R_t \sum_{s=1}^{t} I_{t-s} \cdot g_s

where gsg_s is the discretized generation time distribution.

renewal <- Renewal(
  gen_distribution = gamma_dist(6.5, 0.62), # Shape and scale parameters
  initialisation_prior = norm(log(1.0), 0.1) # Prior on initial log infections
)

print(renewal)
#> <EpiAware Renewal Infection Model>
#>   Generation distribution: Gamma 
#>   Initialisation prior: specified

Key parameters:

  • Generation time: Mean ~6.5 days (typical for COVID-19 in early 2020)
  • Gamma distribution allows flexible shape
  • Initial infections seeded near 1 (on natural scale)

3. Observation Model: Negative Binomial

Links latent infections to observed case counts with overdispersion:

YtNegBinom(μ=It,ϕ)Y_t \sim \text{NegBinom}(\mu = I_t, \phi)

where ϕ\phi controls overdispersion (clustering).

negbin <- NegativeBinomialError(
  cluster_factor_prior = halfnorm(0.1)
)

print(negbin)
#> <EpiAware Negative Binomial Observation Model>
#>   Cluster factor prior: truncated(Normal(0, sd), 0, Inf)

Parameterization:

  • Cluster factor = 1/ϕ\sqrt{1/\phi} (coefficient of variation)
  • Half-normal prior encourages moderate overdispersion
  • More interpretable than directly specifying ϕ\phi

Compose and Fit

Create Complete Model

model <- EpiProblem(
  epi_model = renewal,
  latent_model = ar2,
  observation_model = negbin,
  tspan = c(1, 36) # Time span matching our data
)

print(model)
#> <EpiAware Epidemiological Model>
#>   Time span: 1 to 36 
#>   Components:
#>     - Infection model: epiaware_renewal 
#>     - Latent model: epiaware_ar 
#>     - Observation model: epiaware_negbin

The EpiProblem combines components into a joint Bayesian model with automatic validation.

Bayesian Inference

We use NUTS (No-U-Turn Sampler) for posterior inference:

results <- fit(
  model = model,
  data = training_data,
  method = nuts_sampler(
    warmup = 1000, # Adaptation iterations
    draws = 1000, # Posterior samples per chain
    chains = 4 # Independent chains for convergence assessment
  )
)

Note: Fitting typically takes 2-5 minutes on modern hardware.

Results

Convergence Diagnostics

print(results)

Check for:

  • Rhat < 1.1: Chains have converged to the same distribution
  • ESS > 100: Sufficient effective sample size for inference
  • No divergent transitions: NUTS is exploring the posterior efficiently

Parameter Summaries

# Detailed posterior summaries
summary(results)

Key parameters to examine:

  • AR damping coefficients (ϕ1\phi_1, ϕ2\phi_2): Autocorrelation strength
  • Innovation std (σ\sigma): Variability in RtR_t changes
  • Cluster factor: Degree of case count overdispersion
  • Initial infections: Epidemic seeding

Visualization

# Time-varying reproduction number with credible intervals
plot(results, type = "Rt")

# Observed vs predicted case counts
plot(results, type = "cases")

# Posterior distributions for key parameters
plot(results, type = "posterior")

Model Comparisons

Comparing Latent Processes

Test sensitivity to AR order:

# AR(1) alternative
ar1 <- AR(
  order = 1,
  damp_priors = list(truncnorm(0.5, 0.2, 0, 1)),
  init_priors = list(norm(0, 0.2)),
  std_prior = halfnorm(0.1)
)

model_ar1 <- EpiProblem(
  epi_model = renewal,
  latent_model = ar1,
  observation_model = negbin,
  tspan = c(1, 36)
)

results_ar1 <- fit(model_ar1, data = training_data)

# Compare using LOO (leave-one-out cross-validation)
library(loo)
loo_compare(loo(results$samples), loo(results_ar1$samples))

Adding Observation Delays

Account for incubation and reporting delays:

# Incubation period (~5 days)
obs_incubation <- LatentDelay(
  model = negbin,
  delay_distribution = lognorm(1.6, 0.42)
)

# Reporting delay (~2 days)
obs_full <- LatentDelay(
  model = obs_incubation,
  delay_distribution = lognorm(0.58, 0.47)
)

model_delays <- EpiProblem(
  epi_model = renewal,
  latent_model = ar2,
  observation_model = obs_full,
  tspan = c(1, 36)
)

results_delays <- fit(model_delays, data = training_data)

Interpretation

The Mishra et al. (2020) analysis demonstrated:

  1. Flexible RtR_t estimation: AR(2) captures both smooth trends and rapid changes
  2. Uncertainty quantification: Bayesian credible intervals reflect parameter uncertainty
  3. Intervention detection: Sharp decline in RtR_t coincides with public health measures
  4. Compositional flexibility: Easy to test alternative assumptions (AR order, delays)

Key Findings

  • Initial Rt2.5R_t \approx 2.5: Rapid exponential growth phase
  • Post-intervention Rt<1R_t < 1: Control achieved through distancing measures
  • Moderate overdispersion: Case counts show clustering but not extreme variance
  • Good model fit: Posterior predictive checks show data within credible intervals

Extensions

Try modifying the analysis:

  1. Different priors: Test sensitivity to prior choices
  2. Alternative latent models: MA, random walk, spline models
  3. Stratification: Separate models by age group or region
  4. Forecast evaluation: Hold-out validation of predictive performance

References

Mishra, S., Berah, T., Mellan, T. A., et al. (2020). On the derivation of the renewal equation from an age-dependent branching process: an epidemic modelling perspective. arXiv preprint arXiv:2006.16487.

Session Info

sessionInfo()
#> R version 4.5.2 (2025-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8          LC_NUMERIC=C             
#>  [3] LC_TIME=C.UTF-8           LC_COLLATE=C.UTF-8       
#>  [5] LC_MONETARY=C.UTF-8       LC_MESSAGES=C.UTF-8      
#>  [7] LC_PAPER=C.UTF-8          LC_NAME=C.UTF-8          
#>  [9] LC_ADDRESS=C.UTF-8        LC_TELEPHONE=C.UTF-8     
#> [11] LC_MEASUREMENT=C.UTF-8    LC_IDENTIFICATION=C.UTF-8
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> other attached packages:
#> [1] EpiAwareR_0.1.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5         knitr_1.50        rlang_1.1.6       xfun_0.54        
#>  [5] renv_1.1.5        textshaping_1.0.4 jsonlite_2.0.0    backports_1.5.0  
#>  [9] htmltools_0.5.9   ragg_1.5.0        sass_0.4.10       rmarkdown_2.30   
#> [13] evaluate_1.0.5    jquerylib_0.1.4   fastmap_1.2.0     yaml_2.3.11      
#> [17] lifecycle_1.0.4   compiler_4.5.2    fs_1.6.6          Rcpp_1.1.0       
#> [21] JuliaCall_0.17.6  systemfonts_1.3.1 digest_0.6.39     R6_2.6.1         
#> [25] bslib_0.9.0       checkmate_2.3.3   tools_4.5.2       pkgdown_2.2.0    
#> [29] cachem_1.1.0      desc_1.4.3