Description Usage Arguments Value Generation of Continuous Non-Mixture and Mixture Variables Choice of Fleishman's third-order or Headrick's fifth-order method Reasons for Function Errors References See Also Examples

This function extends the techniques of Headrick and Beasley (2004, doi: 10.1081/SAC-120028431) to
create correlated systems of statistical equations containing continuous variables with normal,
non-normal, or mixture distributions. The method allows the user to control the distributions
for the stochastic disturbance (error) terms *E* and independent variables *X*. The user specifies the correlation structure
between *X* terms within an outcome and across outcomes. For a given equation, the user also specifies the correlation between
the outcome *Y* and *X* terms. These correlations are used to calculate the beta (slope) coefficients for the equations with
`calc_betas`

. If the system contains mixture variables and `corr.yx`

is specified in terms of non-mixture
and mixture variables, the `betas`

will be calculated in terms of non-mixture and mixture independent variables. If `corr.yx`

Finally, the user specifies the correlations across error terms. The assumptions are that
1) there are at least 2 equations and a total of at least 1 independent variable, 2) the independent variables are uncorrelated
with the error terms, 3) each equation has an error term,
and 4) all error terms have either a non-mixture or mixture distribution. The outcomes *Y* are calculated as the *E* terms
added to the products of the beta coefficients and the *X* terms. There are no
interactions between independent variables or distinction between subject and group-level terms (as in the hierarchical linear models approach
of `corrsys`

or `corrsys2`

). However, the user does not have to provide the beta coefficients
(except for the intercepts). Since the equations do not include random slopes (i.e. for the *X* terms), the effects of the independent
variables are all considered "fixed." However, a random intercept or a "time" effect with a random slope could be added by specifying
them as independent variables. There are no parameter input checks in order to decrease simulation time. All inputs should be checked
prior to simulation with `checkpar`

. Summaries of the simulation results can be found with `summary_sys`

.
The functions `calc_corr_y`

, `calc_corr_yx`

, and `calc_corr_ye`

use equations
from Headrick and Beasley (2004) to calculate the expected correlations for the outcomes, among a given outcome and covariates from
the other outcomes, and for the error terms. The vignette **Theory and Equations for Correlated Systems of Continuous Variables**
gives the equations, and the vignette **Correlated Systems of Statistical Equations with Non-Mixture and Mixture Continuous
Variables** gives examples. There are also vignettes in `SimCorrMix`

which provide more details on continuous
non-mixture and mixture variables.

1 2 3 4 5 6 7 8 9 10 | ```
nonnormsys(n = 10000, M = NULL, method = c("Fleishman", "Polynomial"),
error_type = c("non_mix", "mix"), means = list(), vars = list(),
skews = list(), skurts = list(), fifths = list(), sixths = list(),
Six = list(), mix_pis = list(), mix_mus = list(), mix_sigmas = list(),
mix_skews = list(), mix_skurts = list(), mix_fifths = list(),
mix_sixths = list(), mix_Six = list(), same.var = NULL,
betas.0 = NULL, corr.x = list(), corr.yx = list(), corr.e = NULL,
seed = 1234, use.nearPD = TRUE, eigmin = 0, adjgrad = FALSE,
B1 = NULL, tau = 0.5, tol = 0.1, steps = 100, msteps = 10,
errorloop = FALSE, epsilon = 0.001, maxit = 1000, quiet = FALSE)
``` |

`n` |
the sample size (i.e. the length of each simulated variable; default = 10000) |

`M` |
the number of dependent variables |

`method` |
the PMT method used to generate all continuous variables, including independent variables (covariates) and error terms; "Fleishman" uses Fleishman's third-order polynomial transformation and "Polynomial" uses Headrick's fifth-order transformation |

`error_type` |
"non_mix" if all error terms have continuous non-mixture distributions, "mix" if all error terms have continuous mixture distributions |

`means` |
a list of length |

`vars` |
a list of length |

`skews` |
a list of length |

`skurts` |
a list of length |

`fifths` |
a list of length |

`sixths` |
a list of length |

`Six` |
a list of length |

`mix_pis` |
a list of length |

`mix_mus` |
a list of length |

`mix_sigmas` |
a list of length |

`mix_skews` |
a list of length |

`mix_skurts` |
a list of length |

`mix_fifths` |
a list of length |

`mix_sixths` |
a list of length |

`mix_Six` |
a list of length |

`same.var` |
either a vector or a matrix; if a vector, |

`betas.0` |
vector of length |

`corr.x` |
list of length |

`corr.yx` |
a list of length |

`corr.e` |
correlation matrix for continuous non-mixture or components of mixture error terms |

`seed` |
the seed value for random number generation (default = 1234) |

`use.nearPD` |
TRUE to convert the overall intermediate correlation matrix formed by the |

`eigmin` |
minimum replacement eigenvalue if overall intermediate correlation matrix is not positive-definite (default = 0) |

`adjgrad` |
TRUE to use |

`B1` |
the initial matrix for algorithm; if NULL, uses a scaled initial matrix with diagonal elements |

`tau` |
parameter used to calculate theta (default = 0.5) |

`tol` |
maximum error for Frobenius norm distance between new matrix and original matrix (default = 0.1) |

`steps` |
maximum number of steps for k (default = 100) |

`msteps` |
maximum number of steps for m (default = 10) |

`errorloop` |
if TRUE, uses |

`epsilon` |
the maximum acceptable error between the final and target correlation matrices (default = 0.001) in the error loop |

`maxit` |
the maximum number of iterations to use (default = 1000) in the error loop |

`quiet` |
if FALSE prints messages, if TRUE suppresses messages |

A list with the following components:

`Y`

matrix with `n`

rows and `M`

columns of outcomes

`X`

list of length `M`

containing *X_{cont(pj)}, X_{comp(pj)}*

`X_all`

list of length `M`

containing *X_{cont(pj)}, X_{mix(pj)}*

`E`

matrix with `n`

rows containing continuous non-mixture or components of continuous mixture error terms

`E_mix`

matrix with `n`

rows containing continuous mixture error terms

`betas`

a matrix of the slope coefficients calculated with `calc_betas`

, rows represent the
outcomes

`constants`

a list of length `M`

with data.frames of the constants for the *X_{cont(pj)}*,
*X_comp(pj)* and *E_p*

`SixCorr`

a list of length `M`

of lists of sixth cumulant correction values used to obtain valid
*pdf*'s for the *X_{cont(pj)}*, *X_comp(pj)*, and *E_p*

`valid.pdf`

a list of length `M`

of vectors where the i-th element is "TRUE" if the constants for the i-th
continuous variable generate a valid pdf, else "FALSE"

`Sigma_X0`

matrix of intermediate correlations calculated by `intercorr`

`Sigma_X`

matrix of intermediate correlations after `nearPD`

or `adj_grad`

function has been used;
applied to generate the normal variables transformed to get the desired distributions

`Error_Time`

the time in minutes required to use the error loop

`Time`

the total simulation time in minutes

`niter`

a matrix of the number of iterations used in the error loop

Mixture distributions describe random variables that are drawn from more than one component distribution. For a random variable
*X_{mix}* from a finite continuous mixture distribution with *k* components, the probability density function (PDF) can be
described by:

*h_X(x) = ∑_{i=1}^{k} π_i f_{X_{comp_i}}(x), ∑_{i=1}^{k} π_i = 1.*

The *π_i* are mixing parameters which determine the weight of each component distribution *f_{X_{comp_i}}(x)* in the overall
probability distribution. As long as each component has a valid PDF, the overall distribution *h_X()* has a valid PDF.
The main assumption is statistical independence between the process of randomly selecting the component distribution and the
distributions themselves. Simulation is done at the component-level for mixture variables.

All continuous variables are simulated using either Fleishman's third-order (`method`

= "Fleishman",
doi: 10.1007/BF02293811) or Headrick's fifth-order (`method`

= "Polynomial", doi: 10.1016/S0167-9473(02)00072-5)
power method transformation (PMT). It works by matching standardized cumulants – the first four (mean,
variance, skew, and standardized kurtosis) for Fleishman's method, or the first six (mean, variance,
skew, standardized kurtosis, and standardized fifth and sixth cumulants) for Headrick's method. The transformation is expressed
as follows:

*Y = c_{0} + c_{1} * Z + c_{2} * Z^2 + c_{3} * Z^3 + c_{4} * Z^4 + c_{5} * Z^5, Z \sim N(0,1),*

where *c_{4}* and *c_{5}* both equal *0* for Fleishman's method. The real constants are calculated by
`find_constants`

for non-mixture and components of mixture variables. Continuous mixture
variables are generated componentwise and then transformed to the desired mixture variables using random multinomial variables
generated based on mixing probabilities. The correlation matrices are specified in terms of correlations with components of the
mixture variables.

Using the fifth-order approximation allows additional control over the fifth and sixth moments of the generated distribution, improving
accuracy. In addition, the range of feasible standardized kurtosis values, given skew and standardized fifth (*γ_3*) and sixth
(*γ_4*) cumulants, is larger than with Fleishman's method (see `calc_lower_skurt`

).
For example, the Fleishman method can not be used to generate a non-normal distribution with a ratio of
*γ_3^2/γ_4 > 9/14* (see Headrick & Kowalchuk, 2007). This eliminates the Chi-squared family of distributions, which has
a constant ratio of *γ_3^2/γ_4 = 2/3*. The fifth-order method also generates more distributions with valid PDF's.
However, if the fifth and sixth cumulants are unknown or do not exist, the Fleishman approximation should be used.

1) The most likely cause for function errors is that the parameter inputs are mispecified. Using `checkpar`

prior to simulation can help decrease these errors.

2) No solutions to `fleish`

or
`poly`

converged when using `find_constants`

. If this happens,
the simulation will stop. It may help to first use `find_constants`

for each continuous variable to
determine if a sixth cumulant correction value is needed. If the standardized cumulants are obtained from `calc_theory`

, the user
may need to use rounded values as inputs (i.e. `skews = round(skews, 8)`

). For example, in order to ensure that skew is exactly 0 for symmetric distributions.

3) The kurtosis for a continuous variable may be outside the region of possible values. There is an associated lower kurtosis boundary for
associated with a given skew (for Fleishman's method) or skew and fifth and sixth cumulants (for Headrick's method). Use
`calc_lower_skurt`

to determine the boundary for a given set of cumulants.

4) No solutions to `calc_betas`

converged when trying to find the beta coefficients. Try different correlation
matrices.

Davenport JW, Bezder JC, & Hathaway RJ (1988). Parameter Estimation for Finite Mixture Distributions. Computers & Mathematics with Applications, 15(10):819-28.

Everitt BS (1996). An Introduction to Finite Mixture Distributions. Statistical Methods in Medical Research, 5(2):107-127. doi: 10.1177/096228029600500202.

Fialkowski AC (2017). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.1. https://CRAN.R-project.org/package=SimMultiCorrData.

Fialkowski AC (2018). SimCorrMix: Simulation of Correlated Data of Multiple Variable Types including Continuous and Count Mixture Distributions. R package version 0.1.0. https://github.com/AFialkowski/SimCorrMix

Fleishman AI (1978). A Method for Simulating Non-normal Distributions. Psychometrika, 43:521-532. doi: 10.1007/BF02293811.

Headrick TC (2002). Fast Fifth-order Polynomial Transforms for Generating Univariate and Multivariate Non-normal Distributions. Computational Statistics & Data Analysis, 40(4):685-711. doi: 10.1016/S0167-9473(02)00072-5. (ScienceDirect)

Headrick TC (2004). On Polynomial Transformations for Simulating Multivariate Nonnormal Distributions. Journal of Modern Applied Statistical Methods, 3(1):65-71. doi: 10.22237/jmasm/1083370080.

Headrick TC, Beasley TM (2004). A Method for Simulating Correlated Non-Normal Systems of Linear Statistical Equations. Communications in Statistics - Simulation and Computation, 33(1). doi: 10.1081/SAC-120028431

Headrick TC, Kowalchuk RK (2007). The Power Method Transformation: Its Probability Density Function, Distribution Function, and Its Further Use for Fitting Data. Journal of Statistical Computation and Simulation, 77:229-249. doi: 10.1080/10629360600605065.

Headrick TC, Sawilowsky SS (1999). Simulating Correlated Non-normal Distributions: Extending the Fleishman Power Method. Psychometrika, 64:25-35. doi: 10.1007/BF02294317.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using
Mathematica. Journal of Statistical Software, 19(3):1 - 17.

doi: 10.18637/jss.v019.i03.

Higham N (2002). Computing the nearest correlation matrix - a problem from finance; IMA Journal of Numerical Analysis 22:329-343.

McCulloch CE, Searle SR, Neuhaus JM (2008). *Generalized, Linear, and Mixed Models* (2nd ed.). Wiley Series in Probability and
Statistics. Hoboken, New Jersey: John Wiley & Sons, Inc.

Pearson RK (2011). Exploring Data in Engineering, the Sciences, and Medicine. In. New York: Oxford University Press.

Schork NJ, Allison DB, & Thiel B (1996). Mixture Distributions in Human Genetics Research. Statistical Methods in Medical Research, 5:155-178. doi: 10.1177/096228029600500204.

Vale CD & Maurelli VA (1983). Simulating Multivariate Nonnormal Distributions. Psychometrika, 48:465-471. doi: 10.1007/BF02293687.

`calc_betas`

, `calc_corr_y`

, `calc_corr_yx`

,
`calc_corr_ye`

, `checkpar`

, `summary_sys`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ```
M <- 3
B <- calc_theory("Beta", c(4, 1.5))
skews <- lapply(seq_len(M), function(x) c(0, B[3]))
skurts <- lapply(seq_len(M), function(x) c(0, B[4]))
fifths <- lapply(seq_len(M), function(x) c(0, B[5]))
sixths <- lapply(seq_len(M), function(x) c(0, B[6]))
Six <- lapply(seq_len(M), function(x) list(NULL, 0.03))
corr.e <- matrix(c(1, 0.4, 0.4^2, 0.4, 1, 0.4, 0.4^2, 0.4, 1), M, M,
byrow = TRUE)
means <- lapply(seq_len(M), function(x) c(0, B[1]))
vars <- lapply(seq_len(M), function(x) c(1, B[2]^2))
corr.x <- list(list(matrix(1, 1, 1), matrix(0.4, 1, 1), matrix(0.4, 1, 1)),
list(matrix(0.4, 1, 1), matrix(1, 1, 1), matrix(0.4, 1, 1)),
list(matrix(0.4, 1, 1), matrix(0.4, 1, 1), matrix(1, 1, 1)))
corr.yx <- list(matrix(0.4, 1), matrix(0.5, 1), matrix(0.6, 1))
Sys1 <- nonnormsys(10000, M, "Polynomial", "non_mix", means, vars,
skews, skurts, fifths, sixths, Six, corr.x = corr.x, corr.yx = corr.yx,
corr.e = corr.e)
## Not run:
# Example: system of three equations for 2 independent variables, where each
# error term has unit variance, from Headrick & Beasley (2002)
# Y_1 = beta_10 + beta_11 * X_11 + beta_12 * X_12 + sigma_1 * e_1
# Y_2 = beta_20 + beta_21 * X_21 + beta_22 * X_22 + sigma_2 * e_2
# Y_3 = beta_30 + beta_31 * X_31 + beta_32 * X_32 + sigma_3 * e_3
# X_11 = X_21 = X_31 = Exponential(2)
# X_12 = X_22 = X_32 = Laplace(0, 1)
# e_1 = e_2 = e_3 = Cauchy(0, 1)
seed <- 1234
M <- 3
Stcum1 <- calc_theory("Exponential", 2)
Stcum2 <- calc_theory("Laplace", c(0, 1))
Stcum3 <- c(0, 1, 0, 25, 0, 1500) # taken from paper
means <- lapply(seq_len(M), function(x) c(0, 0, 0))
vars <- lapply(seq_len(M), function(x) c(1, 1, 1))
skews <- lapply(seq_len(M), function(x) c(Stcum1[3], Stcum2[3], Stcum3[3]))
skurts <- lapply(seq_len(M), function(x) c(Stcum1[4], Stcum2[4], Stcum3[4]))
fifths <- lapply(seq_len(M), function(x) c(Stcum1[5], Stcum2[5], Stcum3[5]))
sixths <- lapply(seq_len(M), function(x) c(Stcum1[6], Stcum2[6], Stcum3[6]))
# No sixth cumulant corrections will be used in order to match the results
# from the paper.
corr.yx <- list(matrix(c(0.4, 0.4), 1), matrix(c(0.5, 0.5), 1),
matrix(c(0.6, 0.6), 1))
corr.x <- list()
corr.x[[1]] <- corr.x[[2]] <- corr.x[[3]] <- list()
corr.x[[1]][[1]] <- matrix(c(1, 0.1, 0.1, 1), 2, 2)
corr.x[[1]][[2]] <- matrix(c(0.1974318, 0.1859656, 0.1879483, 0.1858601),
2, 2, byrow = TRUE)
corr.x[[1]][[3]] <- matrix(c(0.2873190, 0.2589830, 0.2682057, 0.2589542),
2, 2, byrow = TRUE)
corr.x[[2]][[1]] <- t(corr.x[[1]][[2]])
corr.x[[2]][[2]] <- matrix(c(1, 0.35, 0.35, 1), 2, 2)
corr.x[[2]][[3]] <- matrix(c(0.5723303, 0.4883054, 0.5004441, 0.4841808),
2, 2, byrow = TRUE)
corr.x[[3]][[1]] <- t(corr.x[[1]][[3]])
corr.x[[3]][[2]] <- t(corr.x[[2]][[3]])
corr.x[[3]][[3]] <- matrix(c(1, 0.7, 0.7, 1), 2, 2)
corr.e <- matrix(0.4, nrow = 3, ncol = 3)
diag(corr.e) <- 1
# Check the parameter inputs
checkpar(M, "Polynomial", "non_mix", means, vars, skews,
skurts, fifths, sixths, corr.x = corr.x, corr.yx = corr.yx,
corr.e = corr.e)
# Generate the system
Sys1 <- nonnormsys(10000, M, "Polynomial", "non_mix", means, vars, skews,
skurts, fifths, sixths, corr.x = corr.x, corr.yx = corr.yx,
corr.e = corr.e, seed = seed)
# Summarize the results
Sum1 <- summary_sys(Sys1$Y, Sys1$E, E_mix = NULL, Sys1$X, X_all = list(), M,
"Polynomial", means, vars, skews, skurts, fifths, sixths, corr.x = corr.x,
corr.e = corr.e)
# Calculate theoretical correlations for comparison to simulated values
calc_corr_y(Sys1$betas, corr.x, corr.e, vars)
Sum1$rho.y
calc_corr_ye(Sys1$betas, corr.x, corr.e, vars)
Sum1$rho.ye
calc_corr_yx(Sys1$betas, corr.x, vars)
Sum1$rho.yx
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.