REGRESSÃO LINEAR MÚLTIPLA

library(readxl)
library(stargazer)
library(tableHTML)
library(effects)

nome_arquivo <- 'wage1.xls'
if(nome_arquivo %in% dir(getwd())){
wage1_file <- read_excel(nome_arquivo)
print("ARQUIVO IMPORTADO")}else{ print("ARQUIVO NÃO DISPONÍVEL")}

## [1] "ARQUIVO IMPORTADO"

MQO = lm(wage ~ educ + exper ,data = wage1_file)

sg <- stargazer::stargazer(MQO ,type = 'html',style = 'all')

, , , , , , , , , , , , , , , , , , , , , ,


	Dependent variable:

	wage

educ	0.644^***
	(0.054)
	t = 11.974
	p = 0.000
exper	0.070^***
	(0.011)
	t = 6.385
	p = 0.000
Constant	-3.391^***
	(0.767)
	t = -4.423
	p = 0.00002

Observations	526
R²	0.225
Adjusted R²	0.222
Residual Std. Error	3.257 (df = 523)
F Statistic	75.990^*** (df = 2; 523) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

Observamos pela tabela que os parâmetros estimados individualmente possuem valor do teste t muito superior a 2 e o p-valor é bastante baixo (p < 0.05), sugerindo que as variáveis independentes selecionadas são significativas para explicar o salário
Adicionalmente, verificamos pela estatística F da mesma tabela que o conjunto das variáveis independentes são significativos para explicar o salário. Obersva-se que F > 10, sendo o p-valor muito reduzido, quase nulo.

INTERVALOS DE CONFIANÇA

confint(MQO,level=0.95)
print('')
confint(MQO,level=0.99)

##                   2.5 %      97.5 %
## (Intercept) -4.89646645 -1.88461261
## educ         0.53856950  0.74997466
## exper        0.04852972  0.09166107
## [1] ""
##                   0.5 %      99.5 %
## (Intercept) -5.37231400 -1.40876507
## educ         0.50516927  0.78337489
## exper        0.04171533  0.09847546

Calculando os intervalos de confinaça dos coeficientes indivudalmente, verificamos, conforme já havíamos concluído, que o valor nulo não pertence ao intervalo de confiança dos coeficientes, mesmo quando tomamos \(\alpha\) = 0.01.

TESTE F E TESTE LM

library(car)
library(lmtest)

TESTE F

print("Teste -- educação")

linearHypothesis(MQO,'educ')

print('###########################################')

print("Teste -- experiência")

linearHypothesis(MQO,'exper')

## [1] "Teste -- educação"
## Linear hypothesis test
## 
## Hypothesis:
## educ = 0
## 
## Model 1: restricted model
## Model 2: wage ~ educ + exper
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    524 7069.1                                  
## 2    523 5548.2  1      1521 143.38 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "###########################################"
## [1] "Teste -- experiência"
## Linear hypothesis test
## 
## Hypothesis:
## exper = 0
## 
## Model 1: restricted model
## Model 2: wage ~ educ + exper
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1    524 5980.7                                 
## 2    523 5548.2  1    432.52 40.772 3.78e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TESTE LM

MQO_2 <- lm(wage ~ educ, data = wage1_file)
RESID_MQO_2 <- resid(MQO_2)
LMREG <- lm(RESID_MQO_2 ~ educ + exper, data = wage1_file)
R_QUADRADO <- summary(LMREG)$r.squared
TESTE_LM <- R_QUADRADO * nobs(LMREG)

paste('Teste LM: ',TESTE_LM)
paste('p-valor: ',1 - pchisq(TESTE_LM,1))

## [1] "Teste LM:  38.0402787457415"
## [1] "p-valor:  6.92991330986104e-10"

Verificamos no teste F que o p-valor para ambas as variáveis é bastante baixo, sugerindo que as variáveis são de fato significativas para explicar o salário, concordando com o resultado encontrado anteriormente

O teste LM tem valor alto, e, mais importante, um p-valor significativamente baixo, o que concorda com as conclusões do teste F, conforme era esperado.
Com estes resultados, podemos afirmar que ambas as variáveis devem ser mantidas no modelo de regressão.

REGRESSÃO MÚLTIPLA - REESCALONAMENTO

tabela <- stargazer(lm(wage ~ educ + exper,data = wage1_file),
                     lm(I(wage*8) ~ educ + exper,data = wage1_file),
                     lm(I(wage*8*30) ~ educ + exper,data = wage1_file),
                     lm(I(wage*8*30*365) ~ educ + exper,data = wage1_file),
                     type = 'html',
                     style = 'all',
                     title = 'Reescalonameto: Hora')

, , , , , , , , , , , , , , , , , , , , , , , ,

**Reescalonameto: Hora**

	Dependent variable:

	wage	I(wage * 8)	I(wage * 8 * 30)	I(wage * 8 * 30 * 365)
	(1)	(2)	(3)	(4)

educ	0.644^***	5.154^***	154.625^***	56,438.230^***
	(0.054)	(0.430)	(12.913)	(4,713.412)
	t = 11.974	t = 11.974	t = 11.974	t = 11.974
	p = 0.000	p = 0.000	p = 0.000	p = 0.000
exper	0.070^***	0.561^***	16.823^***	6,140.357^***
	(0.011)	(0.088)	(2.635)	(961.641)
	t = 6.385	t = 6.385	t = 6.385	t = 6.385
	p = 0.000	p = 0.000	p = 0.000	p = 0.000
Constant	-3.391^***	-27.124^***	-813.729^***	-297,011.300^***
	(0.767)	(6.133)	(183.976)	(67,151.190)
	t = -4.423	t = -4.423	t = -4.423	t = -4.423
	p = 0.00002	p = 0.00002	p = 0.00002	p = 0.00002

Observations	526	526	526	526
R²	0.225	0.225	0.225	0.225
Adjusted R²	0.222	0.222	0.222	0.222
Residual Std. Error (df = 523)	3.257	26.056	781.691	285,317.100
F Statistic (df = 2; 523)	75.990^*** (p = 0.000)	75.990^*** (p = 0.000)	75.990^*** (p = 0.000)	75.990^*** (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

Verifica-se que de fato os testes t e os p-valores não se alteram com o reescalonamento das variáveis dependentes

REGRESSÃO MÚLTIPLA - PADRONIZAÇÃO

tabela <- stargazer(lm(scale(wage)~ scale(exper) + scale(educ),data = wage1_file),
                     lm(scale(wage)~ 0 + scale(exper) + scale(educ),data = wage1_file),
                     type = 'html',
                     style = 'all',
                     title = 'Padronização - Sem restrição da constante')

, , , , , , , , , , , , , , , , , , , , , , , ,

**Padronização - Sem restrição da constante**

	Dependent variable:

	scale(wage)
	(1)	(2)

scale(exper)	0.258^***	0.258^***
	(0.040)	(0.040)
	t = 6.385	t = 6.391
	p = 0.000	p = 0.000
scale(educ)	0.483^***	0.483^***
	(0.040)	(0.040)
	t = 11.974	t = 11.985
	p = 0.000	p = 0.000
Constant	-0.000
	(0.038)
	t = -0.000
	p = 1.000

Observations	526	526
R²	0.225	0.225
Adjusted R²	0.222	0.222
Residual Std. Error	0.882 (df = 523)	0.881 (df = 524)
F Statistic	75.990^*** (df = 2; 523) (p = 0.000)	76.135^*** (df = 2; 524) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

Verificamos que neste caso em particular, a restrição da constante ao valor nulo faz pouca diferença na regressão realizada, mas observa-se que o efeito existe

REGRESSÃO MÚLTIPLA - LOGARITMO

tabela <- stargazer(lm(log(wage)~ exper + educ,data = wage1_file),
                     type = 'html',
                     style = 'all',
                     title = 'Padronização - Com restrição da constante')

, , , , , , , , , , , , , , , , , , , , , , ,

**Padronização - Com restrição da constante**

	Dependent variable:

	log(wage)

exper	0.010^***
	(0.002)
	t = 6.653
	p = 0.000
educ	0.098^***
	(0.008)
	t = 12.848
	p = 0.000
Constant	0.217^**
	(0.109)
	t = 1.997
	p = 0.047

Observations	526
R²	0.249
Adjusted R²	0.246
Residual Std. Error	0.461 (df = 523)
F Statistic	86.862^*** (df = 2; 523) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

REGRESSÃO MÚLTIPLA - POLINÔMIOS

tabela <- stargazer(lm(wage~ exper + educ + I(educ^2) + I(educ^3),data = wage1_file),
                     lm(wage~ exper + educ + I(educ^2),data = wage1_file),
                     type = 'html',
                     style = 'all',
                     title = 'Polinômios - Termo Quadrado, com e sem termo cúbico')

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

**Polinômios - Termo Quadrado, com e sem termo cúbico**

	Dependent variable:

	wage
	(1)	(2)

exper	0.065^***	0.066^***
	(0.011)	(0.011)
	t = 6.023	t = 6.064
	p = 0.000	p = 0.000
educ	0.493	-0.384
	(0.599)	(0.237)
	t = 0.823	t = -1.623
	p = 0.411	p = 0.106
I(educ2)	-0.053	0.044^***
	(0.061)	(0.010)
	t = -0.859	t = 4.459
	p = 0.391	p = 0.00002
I(educ3)	0.003
	(0.002)
	t = 1.593
	p = 0.112
Constant	0.335	2.379
	(1.970)	(1.497)
	t = 0.170	t = 1.589
	p = 0.865	p = 0.113

Observations	526	526
R²	0.257	0.254
Adjusted R²	0.252	0.249
Residual Std. Error	3.195 (df = 521)	3.200 (df = 522)
F Statistic	45.103^*** (df = 4; 521) (p = 0.000)	59.117^*** (df = 3; 522) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

Observa-se que, ao considerarmos o termo quadrado da varável educação (diferente da experiência, conforme a aula), o termo linear não apresenta significância, enquanto para o termo quadrado calcula-se p-valor bastante reduzido.
Considerando também o termo ao cubo em nova regressão, não podemos rejeitar a hipótese nula para nenhumcoeficiente associado à educação, seja linear, quadrático ou cúbico.

REGRESSÃO MÚLTIPLA COM INTERAÇÃO

tabela <- stargazer::stargazer(lm(wage~ exper * educ,data = wage1_file),
                     lm(wage~ exper * educ + I(educ^2),data = wage1_file),
                     lm(wage~ exper * educ + I(exper^2),data=wage1_file),
                     type = 'html',
                     style = 'all',
                     title = 'Termos de Interação')

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

**Termos de Interação**

	Dependent variable:

	wage
	(1)	(2)	(3)

exper	0.046	-0.136^***	0.339^***
	(0.043)	(0.051)	(0.066)
	t = 1.074	t = -2.664	t = 5.150
	p = 0.284	p = 0.008	p = 0.00000
educ	0.602^***	-1.398^***	0.687^***
	(0.090)	(0.343)	(0.089)
	t = 6.693	t = -4.080	t = 7.760
	p = 0.000	p = 0.0001	p = 0.000
I(educ2)		0.072^***
		(0.012)
		t = 6.034
		p = 0.000
I(exper2)			-0.005^***
			(0.001)
			t = -5.730
			p = 0.00000
exper:educ	0.002	0.017^***	-0.005
	(0.003)	(0.004)	(0.004)
	t = 0.591	t = 4.039	t = -1.293
	p = 0.555	p = 0.0001	p = 0.197
Constant	-2.860^**	10.457^***	-5.203^***
	(1.181)	(2.485)	(1.217)
	t = -2.421	t = 4.207	t = -4.274
	p = 0.016	p = 0.00004	p = 0.00003

Observations	526	526	526
R²	0.226	0.276	0.272
Adjusted R²	0.221	0.271	0.266
Residual Std. Error	3.259 (df = 522)	3.154 (df = 521)	3.164 (df = 521)
F Statistic	50.713^*** (df = 3; 522) (p = 0.000)	49.717^*** (df = 4; 521) (p = 0.000)	48.561^*** (df = 4; 521) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

PREVISÃO

MQO <- lm(I(wage*8*30*12) ~ educ + exper + I(exper^2),data = wage1_file)


N_PONTOS <- 10


x_educ <- numeric(N_PONTOS)
x_exper <- numeric(N_PONTOS)
x_fit <- numeric(N_PONTOS)
x_lower <- numeric(N_PONTOS)
x_upper <- numeric(N_PONTOS)


for (i in 1:N_PONTOS){
  x_educ[i] <- rnorm(1,mean = mean(wage1_file$educ),sd = sd(wage1_file$educ))
  x_exper[i] <- abs(rnorm(1,mean = mean(wage1_file$exper),sd = sd(wage1_file$exper)))

  df <- data.frame(educ = x_educ[i],exper = x_exper[i])
  
  x_predict <- predict(MQO,df,interval = 'confidence',level = 0.95)
  x_predict
  
  x_fit[i] <- x_predict[1]
  x_lower[i] <- x_predict[2]
  x_upper[i] <- x_predict[3]
  
}

df <- data.frame(x_educ,x_exper,x_fit,x_lower,x_upper)
  
colnames(df) <- c('educ','exper','fit','lower','upper')

tabela <- tableHTML(round(df,2),border=2,widths=rep(100,6))

Tabela de previsões com Intervalos de Confiança

	educ	exper	fit	lower	upper
1	12.21	8.39	15068.04	14092.05	16044.04
2	13.85	0.49	12703.69	11006.48	14400.9
3	13.17	15.44	19928.14	18832.59	21023.69
4	9.07	40.65	13588.81	11806.68	15370.95
5	10.02	12.15	13188.45	11831.23	14545.67
6	15.12	26.68	25672.43	24164.35	27180.51
7	12.69	42.75	19097.96	16997.85	21198.08
8	10.25	39.24	16021.58	14443.39	17599.76
9	11.58	12.03	15809.44	14725.88	16892.99
10	14.19	9.12	18856.9	17878.9	19834.91

for (i in 1:N_PONTOS){

  df <- data.frame(educ = x_educ[i],exper = x_exper[i])
  
  x_predict <- predict(MQO,df,interval = 'prediction',level = 0.95)
  
  x_fit[i] <- x_predict[1]
  x_lower[i] <- x_predict[2]
  x_upper[i] <- x_predict[3]
  

}

df <- data.frame(x_educ,x_exper,x_fit,x_lower,x_upper)
  
colnames(df) <- c('educ','exper','fit','lower','upper')

tabela <- tableHTML(round(df,2),border=2,widths=rep(100,6))

Tabela de previsões com Intervalos de Previsão

	educ	exper	fit	lower	upper
1	12.21	8.39	15068.04	-2871.58	33007.66
2	13.85	0.49	12703.69	-5289.58	30696.97
3	13.17	15.44	19928.14	1981.62	37874.66
4	9.07	40.65	13588.81	-4412.67	31590.3
5	10.02	12.15	13188.45	-4775.95	31152.85
6	15.12	26.68	25672.43	7696.01	43648.86
7	12.69	42.75	19097.96	1062.22	37133.7
8	10.25	39.24	16021.58	-1960.86	34004.02
9	11.58	12.03	15809.44	-2136.36	33755.23
10	14.19	9.12	18856.9	917.17	36796.64

plot(effect('exper',MQO))

plot(effect('educ',MQO))

VARIÁVEL DUMMY

library(readxl)
library(stargazer)
library(car)
library(lmtest)

nome_arquivo <- 'wage1.xls'
if(nome_arquivo %in% dir(getwd())){
wage1_file <- read_excel(nome_arquivo)
print("ARQUIVO IMPORTADO")}else{ print("ARQUIVO NÃO DISPONÍVEL")}

## [1] "ARQUIVO IMPORTADO"

vetor_logico <- logical(length(wage1_file$educ))
for (i in 1:length(vetor_logico)){
  vetor_logico[i] <- i%%2==0
}

var_dummy1 <- as.numeric(vetor_logico)

wage1_file$var_dummy1 <- var_dummy1

head(wage1_file)

## # A tibble: 6 x 4
##    wage  educ exper var_dummy1
##   <dbl> <dbl> <dbl>      <dbl>
## 1  3.1     11     2          0
## 2  3.24    12    22          1
## 3  3       11     2          0
## 4  6        8    44          1
## 5  5.3     12     7          0
## 6  8.75    16     9          1

VARIÁVEL FACTOR

vetor_factor <- numeric(length(wage1_file$educ))
for (i in 1:length(vetor_factor)){
  vetor_factor[i] <- sample(c(1,2,3),1)
}

wage1_file$var_factor1 <- as.factor(vetor_factor)

relevel(wage1_file$var_factor1,ref = 1)

head(wage1_file)

## # A tibble: 6 x 5
##    wage  educ exper var_dummy1 var_factor1
##   <dbl> <dbl> <dbl>      <dbl> <fct>      
## 1  3.1     11     2          0 3          
## 2  3.24    12    22          1 3          
## 3  3       11     2          0 1          
## 4  6        8    44          1 3          
## 5  5.3     12     7          0 3          
## 6  8.75    16     9          1 2

VARIÁVEL CATEGÓRICA - EXPER

#EXPER:
#NÍVEL 1:       EXPER < 15
#NÍVEL 2: 15 <= EXPER < 30
#NÍVEL 3: 30 <= EXPER

corte <- c(0,15,30,Inf)

wage1_file$exper_categ <- factor(cut(wage1_file$exper,corte))



#wage1_file$exper_categ <- factor(
#  ifelse(wage1_file$exper<15, 1,ifelse(wage1_file$exper<30,2,3)
#  ))

head(wage1_file)

## # A tibble: 6 x 6
##    wage  educ exper var_dummy1 var_factor1 exper_categ
##   <dbl> <dbl> <dbl>      <dbl> <fct>       <fct>      
## 1  3.1     11     2          0 3           (0,15]     
## 2  3.24    12    22          1 3           (15,30]    
## 3  3       11     2          0 1           (0,15]     
## 4  6        8    44          1 3           (30,Inf]   
## 5  5.3     12     7          0 3           (0,15]     
## 6  8.75    16     9          1 2           (0,15]

REGRESSÃO VARIÁVEL CATEGÓRICA

reg_comum <- lm(wage~exper + educ, data = wage1_file)
reg_factor <- lm(wage~exper_categ * (educ) , data = wage1_file)

tabela <- stargazer(
  reg_comum,
  reg_factor,
  column.labels = c('exper contínua','exper categórica'),
  type='html',
  style='all'
)

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,


	Dependent variable:

	wage
	exper contínua	exper categórica
	(1)	(2)

exper	0.070^***
	(0.011)
	t = 6.385
	p = 0.000
exper_categ(15,30]		-0.349
		(1.776)
		t = -0.197
		p = 0.845
exper_categ(30,Inf]		3.662^**
		(1.537)
		t = 2.383
		p = 0.018
educ	0.644^***	0.608^***
	(0.054)	(0.078)
	t = 11.974	t = 7.758
	p = 0.000	p = 0.000
exper_categ(15,30]:educ		0.187
		(0.135)
		t = 1.384
		p = 0.168
exper_categ(30,Inf]:educ		-0.190
		(0.127)
		t = -1.505
		p = 0.133
Constant	-3.391^***	-2.527^**
	(0.767)	(1.041)
	t = -4.423	t = -2.429
	p = 0.00002	p = 0.016

Observations	526	526
R²	0.225	0.234
Adjusted R²	0.222	0.226
Residual Std. Error	3.257 (df = 523)	3.249 (df = 520)
F Statistic	75.990^*** (df = 2; 523) (p = 0.000)	31.690^*** (df = 5; 520) (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

linearHypothesis(reg_factor,matchCoefs(reg_factor,'exper_categ'))

## Linear hypothesis test
## 
## Hypothesis:
## exper_categ(15,30] = 0
## exper_categ(30,Inf] = 0
## exper_categ(15,30]:educ = 0
## exper_categ(30,Inf]:educ = 0
## 
## Model 1: restricted model
## Model 2: wage ~ exper_categ * (educ)
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    524 5980.7                                  
## 2    520 5488.1  4    492.56 11.668 4.421e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparando a regressão comum com utilizando a variável exper categórica, percebe-se que “educ” mantêm seu nível de significância. No entanto, vemos que a variável de interação educ e exper não possuem significância. Por fim, observamos que o salário parece ser influenciado pela experiência ser maios que 30, mas o mesmo não se observa para o grupo entre 15 e 30 anos de experiência.

O teste F (F > 10, p-valor baixo) da regressão sugere que as variáveis são significativos em conjunto.

TESTE RESET

resettest(reg_factor)

## 
##  RESET test
## 
## data:  reg_factor
## RESET = 10.634, df1 = 2, df2 = 518, p-value = 2.978e-05

O baixo p-valor indica que provavelmente existe outra forma de organizar as variáveis que seja mais adequada para explicar a variável dependente

TESTE PARA MODELOS NÃO-ANINHADOS

encomptest(
  lm(wage~exper_categ * (educ) , data = wage1_file),
  lm(wage~exper * (log(educ + 1)) , data = wage1_file)
  )

## Encompassing test
## 
## Model 1: wage ~ exper_categ * (educ)
## Model 2: wage ~ exper * (log(educ + 1))
## Model E: wage ~ exper_categ + educ + exper + log(educ + 1) + exper_categ:educ + 
##     exper:log(educ + 1)
##           Res.Df Df      F    Pr(>F)    
## M1 vs. ME    517 -3 14.256 6.136e-09 ***
## M2 vs. ME    517 -5 19.803 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Este resultado indica que o modelo conjunto é mais adequado para explicar a variável dependente

HETEROSCEDASTICIDADE

coeftest(reg_comum)

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -3.390540   0.766566 -4.4230 1.185e-05 ***
## exper        0.070095   0.010978  6.3853 3.780e-10 ***
## educ         0.644272   0.053806 11.9740 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(reg_comum,vcoc=hccm)

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -3.390540   0.766566 -4.4230 1.185e-05 ***
## exper        0.070095   0.010978  6.3853 3.780e-10 ***
## educ         0.644272   0.053806 11.9740 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observamos que ambas as matrizes de variância e covariância são iguais, sugerindo que não há problemas de heteroscedasticidade nos nossos dados.

linearHypothesis(reg_comum,matchCoefs(reg_comum,'exper'))

## Linear hypothesis test
## 
## Hypothesis:
## exper = 0
## 
## Model 1: restricted model
## Model 2: wage ~ exper + educ
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1    524 5980.7                                 
## 2    523 5548.2  1    432.52 40.772 3.78e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

linearHypothesis(reg_comum,matchCoefs(reg_comum,'exper'),vcov = hccm )

## Linear hypothesis test
## 
## Hypothesis:
## exper = 0
## 
## Model 1: restricted model
## Model 2: wage ~ exper + educ
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1    524                        
## 2    523  1 40.159 5.058e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Observamos que em ambos os casos rejeita-se a hipótese nula com o teste F. O teste de robustez alterou pouco o resultado, sugerindo novamente que a heteroscedasticidade não é um grande problema neste conjunto de dados

bptest(reg_comum)

## 
##  studentized Breusch-Pagan test
## 
## data:  reg_comum
## BP = 31.847, df = 2, p-value = 1.215e-07

O teste Breusch-Pagan sugere que podemos rejeitar a hipótese nula de homoscedasticidade no conjunto de dados (baixo p-valor, BP alto)

MÉTODO DE MÍNIMOS QUADRADOS GENERALIZADOS FACTÍVEIS

logresid2 <- log(resid(reg_comum)^2)
var_MQO <- lm(logresid2 ~ educ + exper, data = wage1_file)
peso <- 1/exp(fitted(var_MQO))

fgls <- lm(wage ~ educ + exper ,data = wage1_file, weights=peso)

tabela <- stargazer(
  reg_comum,
  fgls,
  column.labels=c('Regressão comum','FGLS'), 
  type='html',
  style='all'
)

, , , , , , , , , , , , , , , , , , , , , , , ,


	Dependent variable:

	wage
	Regressão comum	FGLS
	(1)	(2)

exper	0.070^***	0.071^***
	(0.011)	(0.010)
	t = 6.385	t = 7.415
	p = 0.000	p = 0.000
educ	0.644^***	0.326^***
	(0.054)	(0.033)
	t = 11.974	t = 9.786
	p = 0.000	p = 0.000
Constant	-3.391^***	0.308
	(0.767)	(0.435)
	t = -4.423	t = 0.709
	p = 0.00002	p = 0.479

Observations	526	526
R²	0.225	0.171
Adjusted R²	0.222	0.168
Residual Std. Error (df = 523)	3.257	2.006
F Statistic (df = 2; 523)	75.990^*** (p = 0.000)	53.883^*** (p = 0.000)

Note:	p<0.1; p<0.05; p<0.01

Verificamos que a regressão comum (wage ~ educ + exper sem pesos) e a forma FGLS apresenteram resultados e significância bastante semelhantes para a variável experiência, mas há grande diferença para a variável educação.

Capítulo 3

Caio_Donalisio

20/11/2020

REGRESSÃO LINEAR MÚLTIPLA

INTERVALOS DE CONFIANÇA

Calculando os intervalos de confinaça dos coeficientes indivudalmente, verificamos, conforme já havíamos concluído, que o valor nulo não pertence ao intervalo de confiança dos coeficientes, mesmo quando tomamos \(\alpha\) = 0.01.

TESTE F E TESTE LM

TESTE F

TESTE LM

REGRESSÃO MÚLTIPLA - REESCALONAMENTO

Verifica-se que de fato os testes t e os p-valores não se alteram com o reescalonamento das variáveis dependentes

REGRESSÃO MÚLTIPLA - PADRONIZAÇÃO

Verificamos que neste caso em particular, a restrição da constante ao valor nulo faz pouca diferença na regressão realizada, mas observa-se que o efeito existe

REGRESSÃO MÚLTIPLA - LOGARITMO

REGRESSÃO MÚLTIPLA - POLINÔMIOS

REGRESSÃO MÚLTIPLA COM INTERAÇÃO

PREVISÃO

Tabela de previsões com Intervalos de Confiança

Tabela de previsões com Intervalos de Previsão

VARIÁVEL DUMMY

VARIÁVEL FACTOR

VARIÁVEL CATEGÓRICA - EXPER

REGRESSÃO VARIÁVEL CATEGÓRICA

TESTE RESET

O baixo p-valor indica que provavelmente existe outra forma de organizar as variáveis que seja mais adequada para explicar a variável dependente

TESTE PARA MODELOS NÃO-ANINHADOS

Este resultado indica que o modelo conjunto é mais adequado para explicar a variável dependente

HETEROSCEDASTICIDADE

Observamos que ambas as matrizes de variância e covariância são iguais, sugerindo que não há problemas de heteroscedasticidade nos nossos dados.

Observamos que em ambos os casos rejeita-se a hipótese nula com o teste F. O teste de robustez alterou pouco o resultado, sugerindo novamente que a heteroscedasticidade não é um grande problema neste conjunto de dados

O teste Breusch-Pagan sugere que podemos rejeitar a hipótese nula de homoscedasticidade no conjunto de dados (baixo p-valor, BP alto)

MÉTODO DE MÍNIMOS QUADRADOS GENERALIZADOS FACTÍVEIS

Verificamos que a regressão comum (wage ~ educ + exper sem pesos) e a forma FGLS apresenteram resultados e significância bastante semelhantes para a variável experiência, mas há grande diferença para a variável educação.