Title: | Tools for Assessing Estimability of Linear Predictions |
---|---|
Description: | Provides tools for determining estimability of linear functions of regression coefficients, and 'epredict' methods that handle non-estimable cases correctly. Estimability theory is discussed in many linear-models textbooks including Chapter 3 of Monahan, JF (2008), "A Primer on Linear Models", Chapman and Hall (ISBN 978-1-4200-6201-4). |
Authors: | Russell Lenth [aut, cre, cph] |
Maintainer: | Russell Lenth <[email protected]> |
License: | GPL(>=3) |
Version: | 1.5.1 |
Built: | 2024-12-01 06:56:23 UTC |
Source: | https://github.com/rvlenth/estimability |
Provides tools for determining estimability of linear functions of regression coefficients,
and alternative epredict
methods for lm
, glm
, and mlm
objects that handle non-estimable cases correctly.
Package: | estimability |
Type: | Package |
Details: | See DESCRIPTION file |
When a linear model is not of full rank, the regression coefficients are not uniquely estimable. However, the predicted values are unique, as are other linear combinations where the coefficients lie in the row space of the data matrix. Thus, estimability of a linear function of regression coefficients can be determined by testing whether the coefficients lie in this row space – or equivalently, are orthogonal to the corresponding null space.
This package provides functions nonest.basis
and
is.estble
to facilitate such an estimability test.
Package developers may find these useful for incorporating in their
predict
methods when new predictor settings are involved.
The function estble.subspace
is useful for projecting
a matrix onto an estimable subspace whose rows are all estimable.
The package also provides epredict
methods –
alternatives to the predict
methods in the stats
package for "lm"
, "glm"
, and "mlm"
objects.
When the newdata
argument is specified, estimability of each
new prediction is checked and any non-estimable cases are replaced by NA
.
Russell V. Lenth <[email protected]>
Monahan, John F. (2008) A Primer on Linear Models, CRC Press. (Chapter 3)
lm
and Relatives
These functions call the corresponding S3 predict
methods in the stats package, but with a check for estimability of new predictions, and with appropriate actions for non-estimable cases.
## S3 method for class 'lm' epredict(object, newdata, ..., type = c("response", "terms", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) ## S3 method for class 'glm' epredict(object, newdata, ..., type = c("link", "response", "terms", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) ## S3 method for class 'mlm' epredict(object, newdata, ..., type = c("response", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) eupdate(object, ...)
## S3 method for class 'lm' epredict(object, newdata, ..., type = c("response", "terms", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) ## S3 method for class 'glm' epredict(object, newdata, ..., type = c("link", "response", "terms", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) ## S3 method for class 'mlm' epredict(object, newdata, ..., type = c("response", "matrix", "estimability"), nonest.tol = 1e-8, nbasis = object$nonest) eupdate(object, ...)
object |
An object inheriting from |
newdata |
A |
... |
|
nonest.tol |
Tolerance used by |
type |
Character string specifying the desired result. See Details. |
nbasis |
Basis for the null space, e.g., a result of a call to |
If newdata
is missing or object
is not rank-deficient, this method passes its arguments directly to the same method in the stats library. In rank-deficient cases with newdata
provided, each row of newdata
is tested for estimability against the null basis provided in nbasis
. Any non-estimable cases found are replaced with NA
s.
The type
argument is passed to predict
when it is one of "response"
, "link"
, or "terms"
. With newdata
present and type = "matrix"
, the model matrix for newdata
is returned, with an attribute "estble"
that is a logical vector of length ‘nrow(newdata)’ indicating whether each row is estimable. With type = "estimability"
, just the logical vector is returned.
If you anticipate making several epredict
calls with new data, it improves efficiency to either obtain the null basis and provide it in the call, or add it to object
with the name "nonest"
(perhaps via a call to eupdate
).
eupdate
is an S3 generic function with a method provided for "lm"
objects. It updates the object according to any arguments in ...
, then obtains the updated object's nonestimable basis and returns it in object$nonest
.
The same as the result of a call to the predict
method in the stats package, except rows or elements corresponding to non-estimable predictor combinations are set to NA
. The value for type
is "matrix"
or "estimability"
is explained under details.
The capabilities of the epredict
function for lm
objects is provided
by predict.lm
(if using R version 4.3.0 or later) with rankdeficient = "NA"
;
however, epredict
uses estimability's own criteria to determine which predictions
are set to NA
. An advantage of using epredict
is one of
efficiency: we can compute the null basis once and for all and have it available
additional predictions, whereas predict.lm
will re-compute it each time.
If the user wishes to see a message explaining why NA
s were displayed,
set ‘options(estimability.verbose = TRUE)’.
Russell V. Lenth <[email protected]>
predict.lm
in the stats package;
nonest.basis
.
require("estimability") # Fake data where x3 and x4 depend on x1, x2, and intercept x1 <- -4:4 x2 <- c(-2,1,-1,2,0,2,-1,1,-2) x3 <- 3*x1 - 2*x2 x4 <- x2 - x1 + 4 y <- 1 + x1 + x2 + x3 + x4 + c(-.5,.5,.5,-.5,0,.5,-.5,-.5,.5) # Different orderings of predictors produce different solutions mod1234 <- lm(y ~ x1 + x2 + x3 + x4) mod4321 <- eupdate(lm(y ~ x4 + x3 + x2 + x1)) # (Estimability checking with mod4321 will be more efficient because # it will not need to recreate the basis) mod4321$nonest # test data: testset <- data.frame( x1 = c(3, 6, 6, 0, 0, 1), x2 = c(1, 2, 2, 0, 0, 2), x3 = c(7, 14, 14, 0, 0, 3), x4 = c(2, 4, 0, 4, 0, 4)) # Look at predictions when we don't check estimability suppressWarnings( # Disable the warning from stats::predict.lm rbind(p1234 = predict(mod1234, newdata = testset), p4321 = predict(mod4321, newdata = testset))) # Compare with results when we do check: rbind(p1234 = epredict(mod1234, newdata = testset), p4321 = epredict(mod4321, newdata = testset)) # stats::predict has same capability for lm objects starting in version 4.3.0: if((R.Version()$major >= 4) && (R.Version()$minor >= 3)) stats::predict(mod1234, newdata = testset, rankdeficient = "NA") # Note that estimable cases have the same predictions # change mod1234 and include nonest basis mod134 <- eupdate(mod1234, . ~ . - x2, subset = -c(3, 7)) mod134$nonest # When row spaces are the same, bases are interchangeable # so long as you account for the ordering of parameters: epredict(mod4321, newdata = testset, type = "estimability", nbasis = nonest.basis(mod1234)[c(1,5:2), ]) # Comparison with predict.lm in R >= 4.3.0 if((R.Version()$major >= 4) && (R.Version()$minor >= 3)) stats::predict(mod4321, newdata = testset, rankdeficient = "NA") ## Not run: ### Additional illustration example(nonest.basis) ## creates model objects warp.lm1 and warp.lm2 # The two models have different contrast specs. But the empty cell # is correctly identified in both: fac.cmb <- expand.grid(wool = c("A", "B"), tension = c("L", "M", "H")) cbind(fac.cmb, pred1 = epredict(warp.lm1, newdata = fac.cmb), pred2 = epredict(warp.lm2, newdata = fac.cmb)) ## End(Not run)
require("estimability") # Fake data where x3 and x4 depend on x1, x2, and intercept x1 <- -4:4 x2 <- c(-2,1,-1,2,0,2,-1,1,-2) x3 <- 3*x1 - 2*x2 x4 <- x2 - x1 + 4 y <- 1 + x1 + x2 + x3 + x4 + c(-.5,.5,.5,-.5,0,.5,-.5,-.5,.5) # Different orderings of predictors produce different solutions mod1234 <- lm(y ~ x1 + x2 + x3 + x4) mod4321 <- eupdate(lm(y ~ x4 + x3 + x2 + x1)) # (Estimability checking with mod4321 will be more efficient because # it will not need to recreate the basis) mod4321$nonest # test data: testset <- data.frame( x1 = c(3, 6, 6, 0, 0, 1), x2 = c(1, 2, 2, 0, 0, 2), x3 = c(7, 14, 14, 0, 0, 3), x4 = c(2, 4, 0, 4, 0, 4)) # Look at predictions when we don't check estimability suppressWarnings( # Disable the warning from stats::predict.lm rbind(p1234 = predict(mod1234, newdata = testset), p4321 = predict(mod4321, newdata = testset))) # Compare with results when we do check: rbind(p1234 = epredict(mod1234, newdata = testset), p4321 = epredict(mod4321, newdata = testset)) # stats::predict has same capability for lm objects starting in version 4.3.0: if((R.Version()$major >= 4) && (R.Version()$minor >= 3)) stats::predict(mod1234, newdata = testset, rankdeficient = "NA") # Note that estimable cases have the same predictions # change mod1234 and include nonest basis mod134 <- eupdate(mod1234, . ~ . - x2, subset = -c(3, 7)) mod134$nonest # When row spaces are the same, bases are interchangeable # so long as you account for the ordering of parameters: epredict(mod4321, newdata = testset, type = "estimability", nbasis = nonest.basis(mod1234)[c(1,5:2), ]) # Comparison with predict.lm in R >= 4.3.0 if((R.Version()$major >= 4) && (R.Version()$minor >= 3)) stats::predict(mod4321, newdata = testset, rankdeficient = "NA") ## Not run: ### Additional illustration example(nonest.basis) ## creates model objects warp.lm1 and warp.lm2 # The two models have different contrast specs. But the empty cell # is correctly identified in both: fac.cmb <- expand.grid(wool = c("A", "B"), tension = c("L", "M", "H")) cbind(fac.cmb, pred1 = epredict(warp.lm1, newdata = fac.cmb), pred2 = epredict(warp.lm2, newdata = fac.cmb)) ## End(Not run)
Determine a transformation B
of the rows of a matrix L
such that B %*% L
is estimable.
A practical example is in jointly testing a set of contrasts L
in a linear model, and we need to restrict to the subspace spanned by
the rows of L
that are estimable.
estble.subspace (L, nbasis, tol = 1e-8)
estble.subspace (L, nbasis, tol = 1e-8)
L |
A matrix of dimensions k by p |
nbasis |
A k by b matrix whose columns form a
basis for non-estimable linear functions – such as is returned
by |
tol |
Numeric tolerance for assessing nonestimability. See
|
We require B
such that all the rows of M = B %*% L
are estimable, i.e. orthogonal to the columns of nbasis
.
Thus, we need B %*% L %*% nbasis
to be zero, or equivalently,
t(B)
must be in the null space of t(L %*% nbasis)
.
This can be found using nonest.basis
.
An r by p matrix M = B %*% L
whose rows are all orthogonal to the columns of
nbasis
. The matrix B
is attached as attr(M, "B")
.
Note that if any rows of L
were non-estimable, then r
will be less than k. In fact, if there are no estimable functions
in the row space of L
, then r = 0.
Russell V. Lenth <[email protected]>
### Find a set of estimable interaction contrasts for a 3 x 4 design ### with two empty cells. des <- expand.grid(A = factor(1:3), B = factor(1:4)) des <- des[-c(5, 12), ] # cells (2,2) and (3,4) are empty X <- model.matrix(~ A * B, data = des) N <- nonest.basis(X) L <- cbind(matrix(0, nrow = 6, ncol = 6), diag(6)) # i.e., give nonzero weight only to interaction effects estble.subspace(L, N) # Tougher demo: create a variation where all rows of L are non-estimable LL <- matrix(rnorm(36), ncol = 6) %*% L estble.subspace(LL, N)
### Find a set of estimable interaction contrasts for a 3 x 4 design ### with two empty cells. des <- expand.grid(A = factor(1:3), B = factor(1:4)) des <- des[-c(5, 12), ] # cells (2,2) and (3,4) are empty X <- model.matrix(~ A * B, data = des) N <- nonest.basis(X) L <- cbind(matrix(0, nrow = 6, ncol = 6), diag(6)) # i.e., give nonzero weight only to interaction effects estble.subspace(L, N) # Tougher demo: create a variation where all rows of L are non-estimable LL <- matrix(rnorm(36), ncol = 6) %*% L estble.subspace(LL, N)
This documents the functions needed to test estimability of linear functions of regression coefficients.
nonest.basis(x, ...) ## Default S3 method: nonest.basis(x, ...) ## S3 method for class 'qr' nonest.basis(x, ...) ## S3 method for class 'matrix' nonest.basis(x, ...) ## S3 method for class 'lm' nonest.basis(x, ...) ## S3 method for class 'svd' nonest.basis(x, tol = 5e-8, ...) legacy.nonest.basis(x, ...) all.estble is.estble(x, nbasis, tol = 1e-8)
nonest.basis(x, ...) ## Default S3 method: nonest.basis(x, ...) ## S3 method for class 'qr' nonest.basis(x, ...) ## S3 method for class 'matrix' nonest.basis(x, ...) ## S3 method for class 'lm' nonest.basis(x, ...) ## S3 method for class 'svd' nonest.basis(x, tol = 5e-8, ...) legacy.nonest.basis(x, ...) all.estble is.estble(x, nbasis, tol = 1e-8)
x |
For |
nbasis |
Matrix whose columns span the null space of the model matrix. Such a matrix is returned by |
tol |
Numeric tolerance for assessing rank or nonestimability.
For determining rank, singular values less than |
... |
Additional arguments passed to other methods. |
Consider a linear model . If
is not of full rank, it is not possible to estimate
uniquely. However,
is uniquely estimable, and so is
for any conformable vector
. Since
comprises a linear combination of the rows of
, it follows that we can estimate any linear function where the coefficients lie in the row space of
. Equivalently, we can check to ensure that the coefficients are orthogonal to the null space of
.
The nonest.basis
method for class 'svd'
is not really functional as a method because there is no "svd"
class (at least in R <= 4.2.0). But the function nonest.basis.svd
is exported and may be called directly; it works with results of svd
or La.svd
. We require x$v
to be the complete matrix of right singular values; but we do not need x$u
at all.
The default
method does serve as an svd
method, in that it only works if x
has the required elements of an SVD result, in which case it passes it to nonest.basis.svd
.
The matrix
method runs nonest.basis.svd(svd(x, nu = 0))
. The lm
method runs the qr
method on x$qr
.
The function legacy.nonest.basis
is the original default method in early
versions of the estimability package. It may be called with x
being
either a matrix or a qr
object, and after obtaining the R
matrix,
it uses an additional QR decomposition of t(R)
to obtain the needed basis.
(The current nonest.basis
method for qr
objects is instead based on the
singular-value decomposition of R, and requires much simpler code.)
The constant all.estble
is simply a 1 x 1 matrix of NA
. This specifies a trivial non-estimability basis, and using it as nbasis
will cause everything to test as estimable.
When is not full-rank, the methods for
nonest.basis
return a basis for the null space of . The number of rows is equal to the number of regression coefficients (including any
NA
s); and the number of columns is equal to the rank deficiency of the model matrix. The columns are orthonormal. If the model is full-rank, then nonest.basis
returns all.estble
. The matrix
method uses itself, the
qr
method uses the decomposition of
, and the
lm
method recovers the required information from the object.
The function is.estble
returns a logical value (or vector, if x
is a matrix) that is TRUE
if the function is estimable and FALSE
if not.
Russell V. Lenth <[email protected]>
Monahan, John F. (2008) A Primer on Linear Models, CRC Press. (Chapter 3)
require(estimability) X <- cbind(rep(1,5), 1:5, 5:1, 2:6) ( nb <- nonest.basis(X) ) SVD <- svd(X, nu = 0) # we don't need the U part of UDV' nonest.basis.svd(SVD) # same result as above # Test estimability of some linear functions for this X matrix lfs <- rbind(c(1,4,2,5), c(2,3,9,5), c(1,2,2,1), c(0,1,-1,1)) is.estble(lfs, nb) # Illustration on 'lm' objects: warp.lm1 <- lm(breaks ~ wool * tension, data = warpbreaks, subset = -(26:38), contrasts = list(wool = "contr.treatment", tension = "contr.treatment")) zapsmall(warp.nb1 <- nonest.basis(warp.lm1)) warp.lm2 <- update(warp.lm1, contrasts = list(wool = "contr.sum", tension = "contr.helmert")) zapsmall(warp.nb2 <- nonest.basis(warp.lm2)) # These bases look different, but they both correctly identify the empty cell wcells = with(warpbreaks, expand.grid(wool = levels(wool), tension = levels(tension))) epredict(warp.lm1, newdata = wcells, nbasis = warp.nb1) epredict(warp.lm2, newdata = wcells, nbasis = warp.nb2)
require(estimability) X <- cbind(rep(1,5), 1:5, 5:1, 2:6) ( nb <- nonest.basis(X) ) SVD <- svd(X, nu = 0) # we don't need the U part of UDV' nonest.basis.svd(SVD) # same result as above # Test estimability of some linear functions for this X matrix lfs <- rbind(c(1,4,2,5), c(2,3,9,5), c(1,2,2,1), c(0,1,-1,1)) is.estble(lfs, nb) # Illustration on 'lm' objects: warp.lm1 <- lm(breaks ~ wool * tension, data = warpbreaks, subset = -(26:38), contrasts = list(wool = "contr.treatment", tension = "contr.treatment")) zapsmall(warp.nb1 <- nonest.basis(warp.lm1)) warp.lm2 <- update(warp.lm1, contrasts = list(wool = "contr.sum", tension = "contr.helmert")) zapsmall(warp.nb2 <- nonest.basis(warp.lm2)) # These bases look different, but they both correctly identify the empty cell wcells = with(warpbreaks, expand.grid(wool = levels(wool), tension = levels(tension))) epredict(warp.lm1, newdata = wcells, nbasis = warp.nb1) epredict(warp.lm2, newdata = wcells, nbasis = warp.nb2)