Home
part 2 regression analysis a run a regression to determine the impact

Search for questionUpload Image

Question

Part 2 REGRESSION ANALYSIS

(a) Run a regression to determine the impact of the 2013 unemployment rate (UnempRate2013) on the per

capita income (PerCapitalne) in a county. What is the estimated slope? Explain what this number

means in words in terms of the unemployment rate and in terms of per capita income. Also indicate

if the relationship is statistically significant at the 10%, 5%, and 1% levels. For this first pass, use

homoskedastic standard errors.

(b) Re-run the regression from part (a) but this time use heteroskedastic standard errors. Are your

coefficients the same as in part (a)? Why? Are your standard errors (of your betas) the same as in part

(a)? Why?

of the population that is college-educated (Ed5CollegePlusPct), percentage of the population that is

black (BlackNonHispanic Pct 2010), and percentage of the population that is Hispanic (Hispanic Pct 2010.

Now, what is the estimated impact of unemployment rate in 2013 on per capita income? Also indicate

if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are

using heteroskedastic standard errors.

(d) Provide economic/econometric intuition as to why the impact of the unemployment rate's impact on

per capita income changed between parts (b) and (c). Note that I am asking you to think about the

context (and hence the "story" behind these data).

(e) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 found in Part 2(c). Write

out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate 2013

is statistically significant or not in this context by relating your answer to your constructed confidence

interval.

(f) You recall from Part 1 that both the means of per capita income and of unemployment rate in 2013 are

quite different across metro and nonmetro areas. You therefore want to explore this in more detail. Run

the regression from Part 2(c) using only metro areas in 2013 (i.e., Metro2013--1). [Hint: You need to

restrict the data based on a criterion before running the regression.] Now, what is the estimated effect

of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically

significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard

errors.

(g) Now, run the regression from Part 2(c) using only non-metro areas in 2013 (Metro2013--0). [Hint:

You need to restrict the data based on a criterion before running the regression]. Now, what is the

estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship

is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic

standard errors.

(h) What did you learn from the comparison between results in parts (f) and (g)? Explain your answer.

Note that I again am asking you to think about the context (and hence the "story" behind these data).

(i) Return to the full sample. Now, run a regression to determine the impact of changing the percentage of

the population which is college educated (Ed5CollegePlusPct) on the per capita income (PerCapitalne)

in a county. Include controls for the unemployment rate in 2013 (UnempRate2013), percentage of the

population that is black (BlackNonHispanicPet2010), percentage of the population that is Hispanic

(HispanicPet2010) and now also include a dummy variable for metro status (Metro2013). Now, what is

the estimated impact of percentage with a college education on per capita income? Also indicate if the

relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using

heteroskedastic standard errors.

(j) It is quite common in econometrics to model income variables nonlinearly. Construct a new variable

and call it "logine" or whatever you prefer, where logine-In (PerCapitalne). Provide summary statistics

for this new variable. (Hint: Think back to how you constructed summary statistics in Part 1.)

(k) Now run a regression model with logine as the dependent variable (and we are also going to start

controlling for metro status in addition to the other controls). In other words, the control variables are

unemployment rate in 2013 (UnempRate2013) as the main regressor, while also including the other

regressors: percentage college educated (Ed5CollegePlusPct), percentage non-Hispanic black in 2010

(BlackNon HispanicPet2010), percentage Hispanic in 2010 (HispanicPct 2010), and metro status in 2013

(Metro2013). Now, what is the estimated effect of UnempRate 2013 in words? Also indicate if the

relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using

heteroskedastic standard errors. [Careful not to leave out any variables in your regression specification

in STATA]

(1) What is the null hypothesis corresponding to the F-statistic as reported in the output for the regression

in part (k)? What is the conclusion of the reported F-test? Explain (i.e. Do you reject or fail to reject

the stated null hypothesis above and how do you know this?)

(m) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 in Part 2(k). As

usual, write out your calculations. Clearly indicate how this confidence interval relates to whether

UnempRate2013 is statistically significant or not in this context by relating your answer to your

constructed confidence interval.

(n) Discuss what the standard error of the regression (SER), R-squared and adjusted R-squared in part (k)

are telling you in terms of the numbers that you have found. Using what you know about the difference

between the two formulas, explain specifically why the R² and R² statistics so similar for this case.

(0) Use an F-test to test the joint significance of the additional regressors: Ed5CollegePlus, BlackNon-

Hispanic Pct 2010, Hispanic Pct 2010, and Metro2013. Find this test statistic and clearly indicate the

conclusions of the test.

(p) If you had more time to study this question and/or more or different data, what would you suggest

doing next? Propose additional variables to add and/or different specifications to try and give specific

reasons why you are suggesting these. Answers will vary for this part of the problem.

Most Viewed Questions Of Econometrics

3 Exercise on Stata [25 pts] To answer this question you are required to use the statistical software Stata. Make sure to create a do file with your code, an automated log file of your answers from that code, and write down in a separate document your answers. You are required to submit all three files (i.e., do file, log file, and Word document) before the due date. Load the dataset nbasal using the bcuse command. The dataset contains data on salaries of NBA players and individual player statistics. 1. What is the structure of the data? (Cross-section, time series, or panel data) [1 pt] 2. How many players are in the data? [1 pt] 3. How many of the players are centers? [1 pt] 4. What is the average years of experience of all players? [1 pt] 5. What percent of players are forwards? [1 pt] 6. Name all dummy variables in this dataset. [1 pt] 7. How many of the guards are not married? [1 pt] 8. What percentage of forwards are married? [1 pt] 9. Plot a histogram of wage. Does the wage variable look symmetrically distributed? Why is it the case? (Note that you don't need to paste your graph into your Word document). [3 pts] 10. Find out the average salary by years of experience [Hint: Combine summarize with the prefix bysort] [2 pts] 11. Generate a variable mean_sal_byexper which equals the average salary of players by years of experience, so that for a given player with 2 years of experience, the variable value will be the average salary for players with 2 years of experience. [Hint: Combine egen with the prefix bysort] [3 pts] 12. Produce a scatterplot of mean_sal_byexper against exper. Make sure exper is on the x-axis. [2 pts] 13. What is the correlation between wage and exper? [2 pts] 14. Create a discrete variable called position which equals 1 if a player is a guard, 2 if a player is a center, and 3 if a player is a forward. Label this variable as "Player's position" [2 pts] 15. Create a pie chart to illustrate the frequency distribution of the variable position. Make sure to include the command that indicates the percentages on each slice. Export the graph as .PDF file. [Hint: You will need to use the plabel (all percent) option in order to display the percentages. In your Word document only give the commands you use]. [3 pts]

Verified Answer

\text { 1. For the Linear Model } Y-X \beta+c \text {, define } X \text { and } \beta \text { as: } (a) Find the least squares estimator of ß. (b) Find the mean, variance, and distribution of the least squares estimator from part(₂) (c) Find the estimator of o². \text { (d) Give the formula for the }(1-\alpha) \% \text { confidence interval for } \beta_{1} \text {. }

Verified Answer

Part 2 REGRESSION ANALYSIS (a) Run a regression to determine the impact of the 2013 unemployment rate (UnempRate2013) on the per capita income (PerCapitalne) in a county. What is the estimated slope? Explain what this number means in words in terms of the unemployment rate and in terms of per capita income. Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels. For this first pass, use homoskedastic standard errors. (b) Re-run the regression from part (a) but this time use heteroskedastic standard errors. Are your coefficients the same as in part (a)? Why? Are your standard errors (of your betas) the same as in part (a)? Why? (c) Run the same regression as in part (b) but now also include the following additional regressors: percentage of the population that is college-educated (Ed5CollegePlusPct), percentage of the population that is black (BlackNonHispanic Pct 2010), and percentage of the population that is Hispanic (Hispanic Pct 2010. Now, what is the estimated impact of unemployment rate in 2013 on per capita income? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (d) Provide economic/econometric intuition as to why the impact of the unemployment rate's impact on per capita income changed between parts (b) and (c). Note that I am asking you to think about the context (and hence the "story" behind these data). (e) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 found in Part 2(c). Write out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate 2013 is statistically significant or not in this context by relating your answer to your constructed confidence interval. (f) You recall from Part 1 that both the means of per capita income and of unemployment rate in 2013 are quite different across metro and nonmetro areas. You therefore want to explore this in more detail. Run the regression from Part 2(c) using only metro areas in 2013 (i.e., Metro2013--1). [Hint: You need to restrict the data based on a criterion before running the regression.] Now, what is the estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (g) Now, run the regression from Part 2(c) using only non-metro areas in 2013 (Metro2013--0). [Hint: You need to restrict the data based on a criterion before running the regression]. Now, what is the estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (h) What did you learn from the comparison between results in parts (f) and (g)? Explain your answer. Note that I again am asking you to think about the context (and hence the "story" behind these data). (i) Return to the full sample. Now, run a regression to determine the impact of changing the percentage of the population which is college educated (Ed5CollegePlusPct) on the per capita income (PerCapitalne) in a county. Include controls for the unemployment rate in 2013 (UnempRate2013), percentage of the population that is black (BlackNonHispanicPet2010), percentage of the population that is Hispanic (HispanicPet2010) and now also include a dummy variable for metro status (Metro2013). Now, what is the estimated impact of percentage with a college education on per capita income? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. (j) It is quite common in econometrics to model income variables nonlinearly. Construct a new variable and call it "logine" or whatever you prefer, where logine-In (PerCapitalne). Provide summary statistics for this new variable. (Hint: Think back to how you constructed summary statistics in Part 1.) (k) Now run a regression model with logine as the dependent variable (and we are also going to start controlling for metro status in addition to the other controls). In other words, the control variables are unemployment rate in 2013 (UnempRate2013) as the main regressor, while also including the other regressors: percentage college educated (Ed5CollegePlusPct), percentage non-Hispanic black in 2010 (BlackNon HispanicPet2010), percentage Hispanic in 2010 (HispanicPct 2010), and metro status in 2013 (Metro2013). Now, what is the estimated effect of UnempRate 2013 in words? Also indicate if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard errors. [Careful not to leave out any variables in your regression specification in STATA] (1) What is the null hypothesis corresponding to the F-statistic as reported in the output for the regression in part (k)? What is the conclusion of the reported F-test? Explain (i.e. Do you reject or fail to reject the stated null hypothesis above and how do you know this?) (m) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 in Part 2(k). As usual, write out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate2013 is statistically significant or not in this context by relating your answer to your constructed confidence interval. (n) Discuss what the standard error of the regression (SER), R-squared and adjusted R-squared in part (k) are telling you in terms of the numbers that you have found. Using what you know about the difference between the two formulas, explain specifically why the R² and R² statistics so similar for this case. (0) Use an F-test to test the joint significance of the additional regressors: Ed5CollegePlus, BlackNon- Hispanic Pct 2010, Hispanic Pct 2010, and Metro2013. Find this test statistic and clearly indicate the conclusions of the test. (p) If you had more time to study this question and/or more or different data, what would you suggest doing next? Propose additional variables to add and/or different specifications to try and give specific reasons why you are suggesting these. Answers will vary for this part of the problem.

Verified Answer

4 Empirical Application [25 pts] To answer this question you are required to use the statistical software Stata. Make sure to create a do file with your code, an automated log file of your answers from that code, and write down in a separate document your answers. You are required to submit all three files (i.e., do file, log file, and Word document) before the due date. Load the same dataset nbasal using the bcuse command. 4.1 Simple versus multiple regression model [15 pts] Estimate coefficients ßo and 3₁ using the OLS estimator of a model that relates the marriage status (marr;) with the annual salary in thousand of USD (wage;): i. Which is the value of Bo? How do you (quantitatively) interpret this value? [2 pts] ii. Which is the value of ₁? How do you (quantitatively) interpret this value? [2 pts] iii. Answer i. and ii. defining the dependent variable as log(wage;) [3 pts] iv. From an economic point of view, is ₁ capturing a causal effect of z on y or purely a correlation? Why? [4 pts] v. Run the following regression, 4.2 Multiple regression model [10 pts] Estimate coefficients Bo, B₁, B2, and 3 using the OLS estimator of a model that relates the position of the player (guardi, center;, and forward;) with the annual salary in thousand of USD (wage;): wage; = Bo + Biguard; + ß₂center; + B3 forward; +u;. i. Which is the value of 8₁? How do you explain this result? Which assumption are we violating in this model? [5 pts] ii. How can you solve the issue in i.? How can I know the average (predicted) wage of a guard, a center, and a forward? [5 pts]

Verified Answer

Task 3: The linear regression model assumptions and diagnostic tests (30 marks) (a) Find out whether the variables are highly correlated between each other. (b) Is there serial correlation? (c) Use appropriate tests to find out if the error variance is heteroscedastic.

Verified Answer

Task 1: Classical two-variable linear regression model (OLS) (30 marks) Collect data on gold prices and the Global Consumer Price Index for the period 2000 - 2022. Data is monthly. (a) Plot the scattergram of gold prices and Global CPI. (b) An investment is supposed to be a hedge against inflation if its price and/or rate of return at least keeps pace with inflation. To test this hypothesis, suppose you decide to fit the following model, assuming the scatterplot in (a) suggests that this is appropriate: Gold price, =B₁ + B₂CPI₂+U₂ (c) Test for station arity. (d) Estimate the above regression model. Obtain the estimates of the parameters, their standard errors, R², RSS, and ESS, etc. (e) Interpret the results. (f) Establish a 95% confidence interval for ₂ and 33. (g) How would you test the assumption of the normality of the error term? Show the tests you use.

Verified Answer

2. For the Linear Model Y-XB+c, define Y, X, ß, and (a) Find the least squares estimator of B. (b) Find the mean, variance, and distribution of your least squares estimator from part (a).1 (c) Find the estimator of o². (d) Find the formulas for SST, SSM, and SSE for this particular model. (e) Give the ANOVA Table for this particular model. (f) Determine the appropriate null and alternative hypotheses for this particular model using the F-test statistic from the ANOVA Table.

Verified Answer

Question 1 5 marks A boutique beer brewery produces 2 types of beers, Dark-ale and Light-ale daily with a total cost function: TC = 3QD + QD X QL + 4QL where: QD is the quantity of the Dark-ale beer (in kegs) and Q₁ is the quantity of the Light-ale beer (in kegs). The prices that can be charged are determined by supply and demand forces and are influenced by the quantities of each type of beer according to the following equations: PD = 32 QD + QL for the price (in dollars per keg) of the Dark-ale beer and P₁ = 42+2QD - -QL for the price (in dollars per keg) of the Light-ale beer. The total revenue is given by the equation:TR = PD XQD + PL X QL and the profit given by the equation Profit = TR - TC First, use a substitution of the price variables to express the profit in terms of QD and Q₁ only. Using the method of Lagrange Multipliers find the maximum profit when total production (quantity)is restricted to 192 kegs. Note Qp or Q₁ need not be whole numbers. Question 25 marks A farmer discovers that his land has been targeted as a chemical dumping ground with a chemical that is dangerous for growing any crops. It is known that the chemical concentration decays according to the exponential decay process. At the time of discovery, the concentration of the chemical was 15% of the original. One week later, the chemical content reduced to 14%. The police have two suspects, who were both in prison for 15 weeks each at different times for other offences but providing them with alibis (proof of innocence). Suspect A served his sentence ending 35 weeks before the time of the discovery and Suspect B was released from prison 40 weeks before the time of the discovery. Use the exponential decay model to determine whether any of the suspects are innocent.

Verified Answer

2- Analysing cross-sectional data, we obtained the output shown below. wage refers to monthly wage in thousands of CZK, female is a dummy variable (for woman = 1, for man = 0), exper refers to year of working experience. How would you interpret the estimated intercept and estimated coefficients for regressors female and wage.

Verified Answer

1. (a) For each of the variables, what is the average, the standard deviation, the minimum value, and the maximum value in your data set? (b) What is the number of observations in your data set?

Verified Answer