Generalized Linear Models with Extra Parameters

Description

Estimation of generalized linear models with extra parameters, e.g., parametric links, or families with additional parameters (such as negative binomial).

Usage

glmx(formula, data, subset, na.action, weights, offset,
  family = negative.binomial, xlink = "log", control = glmx.control(...),
  model = TRUE, y = TRUE, x = FALSE, ...)

glmx.fit(x, y, weights = NULL, offset = NULL,
  family = negative.binomial, xlink = "log", control = glmx.control())

Arguments

formula symbolic description of the model.
data, subset, na.action arguments controlling formula processing via model.frame.
weights optional numeric vector of case weights.
offset optional numeric vector(s) with an a priori known component to be included in the linear predictor.
family function that returns a “family” object, i.e., family(x) needs to be a “family” object when x is the numeric vector of extra parameters (by default assumed to be 1-dimensional).
xlink link object or a character that can be passed to make.link. It should link the extra parameters to real parameters.
control a list of control arguments as returned by glmx.control.
model, y, x logicals. If TRUE the corresponding components of the fit (model frame, response, model matrix) are returned. For glmx.fit, x should be a numeric regressor matrix and y should be the response vector.
control arguments.

Details

The function glmx is a convenience interface that estimates generalized linear models (GLMs) with extra parameters. Examples would be binary response models with parametric link functions or count regression using a negative binomial family (which has one additional parameter).

Hence, glmx needs a family argument which is a family-generating function depending on one numeric argument for the extra parameters. Then, either profile-likelihood methods can be used for optimizing the extra parameters or all parameters can be optimized jointly.

If the generated family contains a list element loglik.extra for the derivative of the log-likelihood with respect to the extra parameters (i.e., score/gradient contributions), then this is used in the optimization process. This should be a function(y, mu, extra) depending on the observed response y, the estimated mean mu, and the extra parameters.

Value

glmx returns an object of class “glmx”, i.e., a list with components as follows. glmx.fit returns an unclassed list with components up to converged.

coefficients a list with elements “glm” and “extra” containing the coefficients from the respective models,
residuals a vector of deviance residuals,
fitted.values a vector of fitted means,
optim list of optim outputs for maximizing the “profile” and “full” log-likelihood, respectively,
weights the weights used (if any),
offset the list of offset vectors used (if any),
n number of observations,
nobs number of observations with non-zero weights,
df number of estimated parameters,
loglik log-likelihood of the fitted model,
dispersion estimate of the dispersion parameter (if any),
vcov covariance matrix of all parameters in the model,
family a list with elements “glm” and “extra” where the former contains the “family” object at the optimal extra parameters and the latter the family-generating function,
xlink the link object for the extra parameters,
control control options used,
converged logical indicating successful convergence of optim,
call the original function call,
formula the formula,
terms the terms object for the model,
levels the levels of the categorical regressors,
contrasts the contrasts corresponding to levels,
model the full model frame (if model = TRUE),
y the response vector (if y = TRUE),
x the model matrix (if x = TRUE).

See Also

glmx.control, hetglm

Examples

library("glmx")

## artificial data from geometric regression
set.seed(1)
d <- data.frame(x = runif(200, -1, 1))
d$y <- rnbinom(200, mu = exp(0 + 3 * d$x), size = 1)

### negative binomial regression ###

## negative binomial regression via glmx
if(require("MASS")) {
m_nb1 <- glmx(y ~ x, data = d,
  family = negative.binomial, xlink = "log", xstart = 0)
summary(m_nb1)

## negative binomial regression via MASS::glm.nb
m_nb2 <- glm.nb(y ~ x, data = d)
summary(m_nb2)

## comparison
if(require("lmtest")) {
logLik(m_nb1)
logLik(m_nb2)
coeftest(m_nb1)
coeftest(m_nb2)
exp(coef(m_nb1, model = "extra"))
m_nb2$theta
exp(coef(m_nb1, model = "extra")) * sqrt(vcov(m_nb1, model = "extra"))
m_nb2$SE.theta
}}
[1] 0.251833
## if the score (or gradient) contribution of the extra parameters
## is supplied, then estimation can be speeded up:
negbin <- function(theta) {
  fam <- negative.binomial(theta)
  fam$loglik.extra <- function(y, mu, theta) digamma(y + theta) - digamma(theta) +
    log(theta) + 1 - log(mu + theta) - (y + theta)/(mu + theta)
  fam
}
m_nb3 <- glmx(y ~ x, data = d,
  family = negbin, xlink = "log", xstart = 0, profile = FALSE)
all.equal(coef(m_nb1), coef(m_nb3), tolerance = 1e-7)
[1] TRUE
### censored negative binomial hurdle regression (0 vs. > 0) ###

## negative binomial zero hurdle part via glmx
nbbin <- function(theta) binomial(link = nblogit(theta))
m_hnb1 <- glmx(factor(y > 0) ~ x, data = d,
  family = nbbin, xlink = "log", xstart = 0)
summary(m_hnb1)

Call:
glmx(formula = factor(y > 0) ~ x, data = d, family = nbbin, xlink = "log", 
    xstart = 0)

Deviance residuals:
   Min     1Q Median     3Q    Max 
0.0254 0.2368 0.5597 1.3687 7.2594 

Coefficients (binomial model with nblogit link):
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -0.1827     0.3417  -0.535  0.59275   
x             2.2709     0.7165   3.170  0.00153 **

Extra parameters (with log link):
           Estimate Std. Error z value Pr(>|z|)
log(theta)    1.412      2.345   0.602    0.547
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Log-likelihood: -95.02 on 3 Df
Dispersion: 1
Number of iterations in BFGS optimization: 9 (profile) 1 (full)
## negative binomial hurdle regression via pscl::hurdle
## (see only zero hurdle part)
if(require("pscl")) {
m_hnb2 <- hurdle(y ~ x, data = d, dist = "negbin", zero.dist = "negbin")
summary(m_hnb2)
}

Call:
hurdle(formula = y ~ x, data = d, dist = "negbin", zero.dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-1.0850 -0.6256 -0.3790  0.4535  4.4274 

Count model coefficients (truncated negbin with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.1178     0.2188  -0.538    0.590    
x             3.2573     0.3271   9.959   <2e-16 ***
Log(theta)    0.2066     0.2970   0.696    0.487    
Zero hurdle model coefficients (censored negbin with log link):
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -0.1827     0.3416  -0.535  0.59274   
x             2.2709     0.7165   3.170  0.00153 **
Log(theta)    1.4115     2.3451   0.602  0.54725   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Theta: count = 1.2295, zero = 4.1021
Number of iterations in BFGS optimization: 29 
Log-likelihood: -342.8 on 6 Df