Package 'estimability'

Title: Tools for Assessing Estimability of Linear Predictions
Description: Provides tools for determining estimability of linear functions of regression coefficients, and 'epredict' methods that handle non-estimable cases correctly. Estimability theory is discussed in many linear-models textbooks including Chapter 3 of Monahan, JF (2008), "A Primer on Linear Models", Chapman and Hall (ISBN 978-1-4200-6201-4).
Authors: Russell Lenth [aut, cre, cph]
Maintainer: Russell Lenth <[email protected]>
License: GPL(>=3)
Version: 1.5.1
Built: 2024-11-01 11:26:22 UTC
Source: https://github.com/rvlenth/estimability

Help Index


Estimability Tools for Linear Models

Description

Provides tools for determining estimability of linear functions of regression coefficients, and alternative epredict methods for lm, glm, and mlm objects that handle non-estimable cases correctly.

Details

Package: estimability
Type: Package
Details: See DESCRIPTION file

When a linear model is not of full rank, the regression coefficients are not uniquely estimable. However, the predicted values are unique, as are other linear combinations where the coefficients lie in the row space of the data matrix. Thus, estimability of a linear function of regression coefficients can be determined by testing whether the coefficients lie in this row space – or equivalently, are orthogonal to the corresponding null space.

This package provides functions nonest.basis and is.estble to facilitate such an estimability test. Package developers may find these useful for incorporating in their predict methods when new predictor settings are involved.

The function estble.subspace is useful for projecting a matrix onto an estimable subspace whose rows are all estimable.

The package also provides epredict methods – alternatives to the predict methods in the stats package for "lm", "glm", and "mlm" objects. When the newdata argument is specified, estimability of each new prediction is checked and any non-estimable cases are replaced by NA.

Author(s)

Russell V. Lenth <[email protected]>

References

Monahan, John F. (2008) A Primer on Linear Models, CRC Press. (Chapter 3)


Estimability Enhancements for lm and Relatives

Description

These functions call the corresponding S3 predict methods in the stats package, but with a check for estimability of new predictions, and with appropriate actions for non-estimable cases.

Usage

## S3 method for class 'lm'
epredict(object, newdata, ..., 
    type = c("response", "terms", "matrix", "estimability"), 
    nonest.tol = 1e-8, nbasis = object$nonest)
## S3 method for class 'glm'
epredict(object, newdata, ..., 
    type = c("link", "response", "terms", "matrix", "estimability"), 
    nonest.tol = 1e-8, nbasis = object$nonest)
## S3 method for class 'mlm'
epredict(object, newdata, ..., 
    type = c("response", "matrix", "estimability"), 
    nonest.tol = 1e-8, nbasis = object$nonest)
    
eupdate(object, ...)

Arguments

object

An object inheriting from lm

newdata

A data.frame containing predictor combinations for new predictions

...

Arguments passed to predict or update

nonest.tol

Tolerance used by is.estble to check estimability of new predictions

type

Character string specifying the desired result. See Details.

nbasis

Basis for the null space, e.g., a result of a call to nonest.basis. If nbasis is NULL, a basis is constructed from object.

Details

If newdata is missing or object is not rank-deficient, this method passes its arguments directly to the same method in the stats library. In rank-deficient cases with newdata provided, each row of newdata is tested for estimability against the null basis provided in nbasis. Any non-estimable cases found are replaced with NAs.

The type argument is passed to predict when it is one of "response", "link", or "terms". With newdata present and type = "matrix", the model matrix for newdata is returned, with an attribute "estble" that is a logical vector of length ‘⁠nrow(newdata)⁠’ indicating whether each row is estimable. With type = "estimability", just the logical vector is returned.

If you anticipate making several epredict calls with new data, it improves efficiency to either obtain the null basis and provide it in the call, or add it to object with the name "nonest" (perhaps via a call to eupdate).

eupdate is an S3 generic function with a method provided for "lm" objects. It updates the object according to any arguments in ..., then obtains the updated object's nonestimable basis and returns it in object$nonest.

Value

The same as the result of a call to the predict method in the stats package, except rows or elements corresponding to non-estimable predictor combinations are set to NA. The value for type is "matrix" or "estimability" is explained under details.

Note

The capabilities of the epredict function for lm objects is provided by predict.lm (if using R version 4.3.0 or later) with rankdeficient = "NA"; however, epredict uses estimability's own criteria to determine which predictions are set to NA. An advantage of using epredict is one of efficiency: we can compute the null basis once and for all and have it available additional predictions, whereas predict.lm will re-compute it each time. If the user wishes to see a message explaining why NAs were displayed, set ‘⁠options(estimability.verbose = TRUE)⁠’.

Author(s)

Russell V. Lenth <[email protected]>

See Also

predict.lm in the stats package; nonest.basis.

Examples

require("estimability")

# Fake data where x3 and x4 depend on x1, x2, and intercept
x1 <- -4:4
x2 <- c(-2,1,-1,2,0,2,-1,1,-2)
x3 <- 3*x1 - 2*x2
x4 <- x2 - x1 + 4
y <- 1 + x1 + x2 + x3 + x4 + c(-.5,.5,.5,-.5,0,.5,-.5,-.5,.5)

# Different orderings of predictors produce different solutions
mod1234 <- lm(y ~ x1 + x2 + x3 + x4)
mod4321 <- eupdate(lm(y ~ x4 + x3 + x2 + x1))
# (Estimability checking with mod4321 will be more efficient because
#  it will not need to recreate the basis)
mod4321$nonest

 
# test data:
testset <- data.frame( 
              x1 = c(3,  6,  6,  0,  0,  1), 
              x2 = c(1,  2,  2,  0,  0,  2), 
              x3 = c(7, 14, 14,  0,  0,  3), 
              x4 = c(2,  4,  0,  4,  0,  4))

# Look at predictions when we don't check estimability
suppressWarnings( # Disable the warning from stats::predict.lm
    rbind(p1234 = predict(mod1234, newdata = testset),
          p4321 = predict(mod4321, newdata = testset)))

# Compare with results when we do check:
rbind(p1234 = epredict(mod1234, newdata = testset),
      p4321 = epredict(mod4321, newdata = testset))

# stats::predict has same capability for lm objects starting in version 4.3.0:
if((R.Version()$major >= 4) && (R.Version()$minor >= 3))
  stats::predict(mod1234, newdata = testset, rankdeficient = "NA")

# Note that estimable cases have the same predictions

# change mod1234 and include nonest basis 
mod134 <- eupdate(mod1234, . ~ . - x2, subset = -c(3, 7))
mod134$nonest

# When row spaces are the same, bases are interchangeable
# so long as you account for the ordering of parameters:
epredict(mod4321, newdata = testset, type = "estimability",
    nbasis = nonest.basis(mod1234)[c(1,5:2), ])
    
# Comparison with predict.lm in R >= 4.3.0
if((R.Version()$major >= 4) && (R.Version()$minor >= 3))
  stats::predict(mod4321, newdata = testset, rankdeficient = "NA")

## Not run: 
### Additional illustration
example(nonest.basis)  ## creates model objects warp.lm1 and warp.lm2

# The two models have different contrast specs. But the empty cell
# is correctly identified in both:
fac.cmb <- expand.grid(wool = c("A", "B"), tension = c("L", "M", "H"))
cbind(fac.cmb, 
      pred1 = epredict(warp.lm1, newdata = fac.cmb), 
      pred2 = epredict(warp.lm2, newdata = fac.cmb))

## End(Not run)

Find an estimable subspace

Description

Determine a transformation B of the rows of a matrix L such that B %*% L is estimable. A practical example is in jointly testing a set of contrasts L in a linear model, and we need to restrict to the subspace spanned by the rows of L that are estimable.

Usage

estble.subspace (L, nbasis, tol = 1e-8)

Arguments

L

A matrix of dimensions k by p

nbasis

A k by b matrix whose columns form a basis for non-estimable linear functions – such as is returned by nonest.basis

tol

Numeric tolerance for assessing nonestimability. See is.estble.

Details

We require B such that all the rows of M = B %*% L are estimable, i.e. orthogonal to the columns of nbasis. Thus, we need B %*% L %*% nbasis to be zero, or equivalently, t(B) must be in the null space of t(L %*% nbasis). This can be found using nonest.basis.

Value

An r by p matrix M = B %*% L whose rows are all orthogonal to the columns of nbasis. The matrix B is attached as attr(M, "B"). Note that if any rows of L were non-estimable, then r will be less than k. In fact, if there are no estimable functions in the row space of L, then r = 0.

Author(s)

Russell V. Lenth <[email protected]>

Examples

### Find a set of estimable interaction contrasts for a 3 x 4 design 
### with two empty cells.
des <- expand.grid(A = factor(1:3), B = factor(1:4))
des <- des[-c(5, 12), ]  # cells (2,2) and (3,4) are empty

X <- model.matrix(~ A * B, data = des)
N <- nonest.basis(X)

L <- cbind(matrix(0, nrow = 6, ncol = 6), diag(6))
# i.e., give nonzero weight only to interaction effects

estble.subspace(L, N)

# Tougher demo: create a variation where all rows of L are non-estimable
LL <- matrix(rnorm(36), ncol = 6) %*% L
estble.subspace(LL, N)

Estimability Tools

Description

This documents the functions needed to test estimability of linear functions of regression coefficients.

Usage

nonest.basis(x, ...)
## Default S3 method:
nonest.basis(x, ...)
## S3 method for class 'qr'
nonest.basis(x, ...)
## S3 method for class 'matrix'
nonest.basis(x, ...)
## S3 method for class 'lm'
nonest.basis(x, ...)
## S3 method for class 'svd'
nonest.basis(x, tol = 5e-8, ...)

legacy.nonest.basis(x, ...)

all.estble

is.estble(x, nbasis, tol = 1e-8)

Arguments

x

For nonest.basis, an object of a class in ‘⁠methods("nonest.basis")⁠’. Or, in is.estble, a numeric vector or matrix for assessing estimability of ‘⁠sum(x * beta)⁠’, where beta is the vector of regression coefficients.

nbasis

Matrix whose columns span the null space of the model matrix. Such a matrix is returned by nonest.basis.

tol

Numeric tolerance for assessing rank or nonestimability. For determining rank, singular values less than tol times the largest singular value are regarded as zero. For determining estimability with a nonzero xx, βx\beta'x is assessed by whether or not Nx2<τxx2||N'x||^2 < \tau ||x'x||^2, where NN and τ\tau denote nbasis and tol, respectively.

...

Additional arguments passed to other methods.

Details

Consider a linear model y=Xβ+Ey = X\beta + E. If XX is not of full rank, it is not possible to estimate β\beta uniquely. However, XβX\beta is uniquely estimable, and so is aXβa'X\beta for any conformable vector aa. Since aXa'X comprises a linear combination of the rows of XX, it follows that we can estimate any linear function where the coefficients lie in the row space of XX. Equivalently, we can check to ensure that the coefficients are orthogonal to the null space of XX.

The nonest.basis method for class 'svd' is not really functional as a method because there is no "svd" class (at least in R <= 4.2.0). But the function nonest.basis.svd is exported and may be called directly; it works with results of svd or La.svd. We require x$v to be the complete matrix of right singular values; but we do not need x$u at all.

The default method does serve as an svd method, in that it only works if x has the required elements of an SVD result, in which case it passes it to nonest.basis.svd. The matrix method runs nonest.basis.svd(svd(x, nu = 0)). The lm method runs the qr method on x$qr.

The function legacy.nonest.basis is the original default method in early versions of the estimability package. It may be called with x being either a matrix or a qr object, and after obtaining the R matrix, it uses an additional QR decomposition of t(R) to obtain the needed basis. (The current nonest.basis method for qr objects is instead based on the singular-value decomposition of R, and requires much simpler code.)

The constant all.estble is simply a 1 x 1 matrix of NA. This specifies a trivial non-estimability basis, and using it as nbasis will cause everything to test as estimable.

Value

When XX is not full-rank, the methods for nonest.basis return a basis for the null space of XX. The number of rows is equal to the number of regression coefficients (including any NAs); and the number of columns is equal to the rank deficiency of the model matrix. The columns are orthonormal. If the model is full-rank, then nonest.basis returns all.estble. The matrix method uses XX itself, the qr method uses the QRQR decomposition of XX, and the lm method recovers the required information from the object.

The function is.estble returns a logical value (or vector, if x is a matrix) that is TRUE if the function is estimable and FALSE if not.

Author(s)

Russell V. Lenth <[email protected]>

References

Monahan, John F. (2008) A Primer on Linear Models, CRC Press. (Chapter 3)

Examples

require(estimability)

X <- cbind(rep(1,5), 1:5, 5:1, 2:6)
( nb <- nonest.basis(X) )

SVD <- svd(X, nu = 0)    # we don't need the U part of UDV'
nonest.basis.svd(SVD)    # same result as above

# Test estimability of some linear functions for this X matrix
lfs <- rbind(c(1,4,2,5), c(2,3,9,5), c(1,2,2,1), c(0,1,-1,1))
is.estble(lfs, nb)

# Illustration on 'lm' objects:
warp.lm1 <- lm(breaks ~ wool * tension, data = warpbreaks, 
    subset = -(26:38), 
    contrasts = list(wool = "contr.treatment", tension = "contr.treatment"))
zapsmall(warp.nb1 <- nonest.basis(warp.lm1))

warp.lm2 <- update(warp.lm1, 
    contrasts = list(wool = "contr.sum", tension = "contr.helmert"))
zapsmall(warp.nb2 <- nonest.basis(warp.lm2))

# These bases look different, but they both correctly identify the empty cell
wcells = with(warpbreaks, expand.grid(wool = levels(wool), tension = levels(tension)))
epredict(warp.lm1, newdata = wcells, nbasis = warp.nb1)
epredict(warp.lm2, newdata = wcells, nbasis = warp.nb2)