Heteroscedastic Tobit Regression

Fitting tobit regression models with conditional heteroscedasticity.

htobit(formula, data, subset, na.action,
  model = TRUE, y = TRUE, x = FALSE,
  control = htobit_control(…), …)

htobit_fit(x, y, z = NULL, control)

htobit_control(maxit = 5000, start = NULL, grad = TRUE, hessian = TRUE, ...)

Arguments

formula	a formula expression of the form `y ~ x \| z` where `y` is the response and `x` and `z` are regressor variables for the location and the scale of the latent Gaussian distribution respectively.
data	an optional data frame containing the variables occurring in the formulas.
subset	an optional vector specifying a subset of observations to be used for fitting.
na.action	a function which indicates what should happen when the data contain `NA`s.
model	logical. If `TRUE` model frame is included as a component of the returned value.
x, y	for `htobit`: logical. If `TRUE` the model matrix and response vector used for fitting are returned as components of the returned value. For `htobit_fit`: `x` is a design matrix with regressors for the location and `y` is a vector of observations.
z	a design matrix with regressors for the scale.
…	arguments to be used to form the default `control` argument if it is not supplied directly.
control, maxit, start	a list of control parameters passed to `optim` .
grad	logical. Should gradients be used for optimization? If `TRUE`, the default `method` is `"BFGS"`. Otherwise `method = "Nelder-Mead"` is used.
hessian	logical or character. Should a numeric approximation of the (negative) Hessian matrix be computed? Either `FALSE` (or equivalently `"none"`) or `TRUE`. Alternatively, in the latter case, `hessian = "numDeriv"` could be specified to signal that the Hessian should be approximated by `hessian`. Another option is `hessian = "numDeriv"` so that `optim` is used for computing the Hessian.

Details

htobit fits tobit regression models with conditional heteroscedasticity using maximum likelihood estimation. The model assumes an underlying latent Gaussian variable

$$y_i^* \sim \mathcal{N}(\mu_i, \sigma_i^2)$$

which is only observed if positive and zero otherwise: $y_i = \max(0, y_i^*)$. The latent mean $\mu_i$ and scale $\sigma_i$ (latent standard deviation) are linked to two different linear predictors

$$\mu_i = x_i^\top \beta$$

$$\log(\sigma_i) = z_i^\top \gamma$$

where the regressor vectors $x_i$ and $z_i$ can be set up without restrictions, i.e., they can be identical, overlapping or completely different or just including an intercept, etc.

htobit_fit is the lower level function where the actual fitting takes place.

A set of standard extractor functions for fitted model objects is available for objects of class "htobit", including methods to the generic functions print, summary, coef, vcov, logLik, residuals, predict, terms, model.frame, model.matrix, update, estfun and bread (from the sandwich package), Boot (from the car package), and getSummary (from the memisc package, enabling mtable).

See predict.htobit and coef.htobit for more details on some methods with non-standard arguments.

This is a somewhat simpler reimplementation of crch (Messner, Mayr, Zeileis 2016), illustrating how to create a package from scratch. Compared to crch, htobit does not offer: other response distributions beyond Gaussian, truncated rather than censored responses, analytical Hessian, flexible link functions for the scale submodel, boosting rather than full maximum likelihood estimation, among further features.

Value

htobit returns an object of class "htobit", i.e., a list with components as follows. htobit_fit returns an unclassed list with components up to df.

coefficients

a list with elements "location" and "scale" containing the coefficients from the respective models,

counts

count of function and gradient evaluations from optim,

convergence

convergence code from optim,

message

optional further information from optim,

vcov

covariance matrix of all parameters in the model,

residuals

a vector of raw residuals (observed - fitted),

fitted.values

a list with elements "location" and "scale" containing the latent fitted means and standard deviations,

method

the method argument passed to the optim call,

nobs

number of observations,

number of estimated parameters,

call

the original function call,

formula

the original formula,

terms

a list with elements "location", "scale" and "full" containing the terms objects for the respective models,

levels

a list with elements "location", "scale" and "full" containing the levels of the categorical regressors,

contrasts

a list with elements "location" and "scale" containing the contrasts corresponding to levels from the respective models,

model

the full model frame (if model = TRUE),

the numeric response vector (if y = TRUE),

a list with elements "location" and "scale" containing the model matrices from the respective models (if x = TRUE).

References

Messner JW, Mayr GJ, Zeileis A (2016). Heteroscedastic Censored and Truncated Regression with crch. The R Journal, 8(1), 173--181. doi: 10.32614/RJ-2016-012 .

Examples

## data on alcohol and tobacco expenditures in Belgian households
data("AlcoholTobacco", package = "htobit")
AlcoholTobacco$persons <- with(AlcoholTobacco, adults + oldkids + youngkids)

## homoscedastic vs. heteroscedastic tobit model for budget share of alcohol
m0 <- htobit(alcohol ~ age + log(expenditure) + persons, data = AlcoholTobacco)
m1 <- update(m0, . ~ . | age + log(expenditure) + persons)

## comparison of the two models
AIC(m0, m1)
#>    df       AIC
#> m0  5 -9496.072
#> m1  8 -9824.433
BIC(m0, m1)
#>    df       BIC
#> m0  5 -9466.522
#> m1  8 -9777.154
if(require("lmtest")) {
lrtest(m0, m1)
}
#> Loading required package: lmtest
#> Loading required package: zoo
#> 
#> Attaching package: ‘zoo’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.Date, as.Date.numeric
#> Likelihood ratio test
#> 
#> Model 1: alcohol ~ age + log(expenditure) + persons
#> Model 2: alcohol ~ age + log(expenditure) + persons | age + log(expenditure) + 
#>     persons
#>   #Df LogLik Df  Chisq Pr(>Chisq)    
#> 1   5 4753.0                         
#> 2   8 4920.2  3 334.36  < 2.2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## comparison with crch
if(require("crch")) {
c1 <- crch(alcohol ~ age + log(expenditure) + persons | age + log(expenditure) + persons,
  data = AlcoholTobacco, left = 0)
cbind(coef(m1), coef(c1))
}
#> Loading required package: crch
#>                                  [,1]         [,2]
#> (Intercept)              -0.071762027 -0.071762027
#> age                       0.002381390  0.002381390
#> log(expenditure)          0.006243513  0.006243513
#> persons                  -0.001763290 -0.001763290
#> (scale)_(Intercept)       0.175709476  0.175709475
#> (scale)_age               0.064390807  0.064390807
#> (scale)_log(expenditure) -0.277921386 -0.277921386
#> (scale)_persons          -0.110616284 -0.110616284