Correlation and Regression (97 problems)


The following math problems could be found for the requested category. If you want to find a specific problem, please use the search box on the left menu

For

use a table to determine if the correlation coefficient from #9 is significant.

To see this answer, please subscribe.

The table below lists the numbers audience impressions (in hundred s of millions) listening to songs and corresponding numbers of albums sold (in hundreds of thousands). The number of audience impression is a count of the times people have heard the song. The table is based on data from USA Today. Does it appear that album sales are affected very strongly by the number of audience impressions?

To see this answer, please subscribe.

If there is no correlation between a dependent variable (Y) and an independent variable (X), can one predict Y knowing X? What is the best guess for Y?

To see this answer, please subscribe.

Assignment: (The topic of discussion is Logistic Regression)

Examine the results of Exercise 2 at the end of the chapter. (exercise 2 data attached).

Set up the 9-step hypothesis testing procedure for the analysis.

This is the format for the 9-step hypothesis I need to use:

Exercise 2

9-step Hypothesis Testing Procedure:

1. Evaluate the Data

2. Review Assumptions

3. State Hypotheses

4. Select the Test Statistic

5. Distribution of the Test Statistic

6. State the Decision Rule

7. Calculate the Test Statistic

8. Statistical Decision

9. Conclusion

Exercise 2:

Recode the depressed state of mind variable into a dichotomous variable with two groups: those who rated themselves as rarely depressed are scored 0, and those who rated themselves as sometimes to routinely depressed are scored 1. Using the new variable as the outcome measure, determine which of the following variables increase the odds of being depressed:

Smoking history: recoded into currently smoking = 1, and not currently smoking = 0.

2. Gender: Male = 0, Female = 1

3. Quality of life in the past month: recoded so that values 1 to 3 (sometimes to very unhappy) become 0, and 4 to 6 (sometimes to extremely happy) become 1.

4. Total score on the Inventory of Positive Psychological Attitudes scale (IPPA): Enter recoded smoking history and gender in the first block and recoded quality of life and total IPPA score in the second block.

Results for Exercise 2:

A logistic regression was run to answer the research question (n=653). The results are contained in Exercise Figure 13-1. The variables were entered in tow blocks. Smoking status and gender were entered in block 1, which was significant (p=.003), and accounted for 1.8 to 2.4 percent of the variance. The Hosmer and Lemeshow Test indicated a good fit (p=.808). Only smoking made a significant contribution (p=.001). Quality of life and total IPPA score were entered in block 2, which was significant (p=.000). The total model was significant (p=.000), and accounted for 34.3 to 45.7 percent of the variance. The model was a good fit. (Hosmer and Lemeshow, chi square = 4.068, df = 8, p = .851). The sensitivity of the model in predicting depression was 72.3 percent. The specificity in predicting those who were not depressed was 80. Three of the variables, smoking status (p=.032), quality of life (p=.000), and Total IPPA Score (p=.000) were significant predictors. The odds of being depressed were 2 times higher for those who smoked. Higher quality of life was related to lower probability of depression. The IPPA scale is scored from 30 to 210, therefore, a 1 point increase in that score would be of no practical interest. To calculate the effect of a 30 point change in IPPA, multiply the b-weight for TOTAL times 30 and raise 2.1718 to that power (.057 x 30 = 1.71), raising 2.1718 to the power of 1.71 = 5.53. So, for every 30 point increase in the IPPA score, the odds of being depressed go down (negative b-weight) 5.5 times.

Exercise Figure 13-1

LOGISTIC REGRESSION

Dependent Variable Encoding

Original Value

Internal Value

Rarely
0

Sometimes to routinely

1

BLOCK 1: Method = Enter

Omnibus Tests of Model Coefficients

Chi Square

df

Sig

Step 1 Step

11.848

2

.003

Block

11.848

2

.003

Model

11.848

2

.003

Model Summary

Step

-2Log likelihood

Cox &

Snell R Square

Nagelkerke R Square

1

892.445

.018

.024

Hosmer and Lemeshow Test

Step

Chi Square

Df

Sig.

1
.059
1
.808

Variables in the Equation

B

S.E.

Wald

df

Sig

Exp(B)

Step 1(a) SMOKEREC

.903
.273
10.978

1

.001
2.468
Gender
.169
.166
1.037

1

.309
1.184
Constant
-.275
.137
4.039

1

.044
.759

a Variables entered on step1: SMOKERERC, GENDER

Block 2: Method = Enter

Omnibus Tests of Model Coefficients

Chi Square

df

Sig.

Step 1 Step
262.275
2
.000
Block
262.275
2
.000
Model
274.123
4
.000

Model Summary

Step

-2 Log likelihood

Cox &

Snell R Square

Nagelkerke R Square

1

630.170

.343

.457

Hosmer and Lemeshow Test

Step

Chi Square

df

Sig.

1
4.068
8
.851

Classification Table (a)

Observed

Predicted

Depression recoded

rarely Sometimes to

routinely

Percentage correct

Step 1 Depression Rarely

recoded

Sometimes

to routinely

273

87

66

227

80.5

72.3

Overall percentage

76.6

a The cut value is .500

Variables in the Equation

B

S.E.

Wald

df

Sig

Exp(B)

Step 1(a) SMOKEREC

.771

.359

4.621

1

.032

2.162

GENDER

.225

.206

1.187

1

.276

1.252

QOLREC

-1.066

.292

13.320

1

.000

.344

TOTAL

-.057

.005

117.961

1

.000

.944

Constant

9.433

.832

128.681

1

.000

12499.450

a Variable(s) entered on step 1: QOLREC, TOTAL

To see this answer, please subscribe.

If the scatter plot shows a weak positive linear relationship, what will be the general change in y associated with a downward change in x? Show with an example.

This answer is free. To see it, please login.

The graph shows the relationship between the number of games won by a Chicago basketball team and the average salary (in millions) for a ten-year period. Use the scatter plot, the residuals plot, and part of the regression analysis to answer the question.

Dependent variable is: # of wins

R-squared= 65%

Variable | Coefficient

Constant | 302,045.37

Salary | 8.973

a) Is a linear model appropriate here?

b) Interpret the meaning of R-squared in this context.

c) What is the correlation between salary and average wins?

To see this answer, please subscribe.

Listed below are heights (in inches) and weights (in pounds) for supermodels. Is there a correlation between height and weight? If there is a correlation, does it mean that there is a correlation between height and weight of all adult women?

71

70.5

71

72

70

70

66.5

70

71

Weight

125

119

128

128

119

127

105

123

115

To see this answer, please subscribe.

A researcher demonstrated a correlation of .60 between teacher attire and student academic performance across 150 grade schools in her state. She concluded that encouraging teachers to be properly attired will increase academic performance. Do you agree or disagree with her conclusion? Support your answer.

To see this answer, please subscribe.

To analyze their relative effects on pre-menopausal bone mass, investigators studied the impact of several variables on vertebral bone density (VBD). Subjects were 63 pre-menopausal patients from a busy New York OB-GYN practice between the ages of 19 and 40. Among the findings were a correlation between an estrogen score and VBD (r=0.44, p < 0.001) and between age at menarche and VBD (r= -0.30, p = 0.03). For this study, answer the following questions:

a. What is the independent variable(s)?
b. What is the dependent variable?
c. What is the sampled population?
d. What is the target population?
e. What are the appropriate null and alternative hypotheses?
f. Do you think the null hypothesis was rejected? Explain why or why not.
g. What was the more relevant objective of the study, prediction or establishing and measuring a relationship (association)?

h. Are the variables directly or inversely related?

This answer is free. To see it, please login.

A study was conducted to compare the average time spent in the lab each week versus the course grade for computer students. The results are recorded in the table below. Determine if there is correlation between the number of hours spent in the lab and the course grade.

Hours in Lab

10

11

16

9

7

15

16

10

Grade (%)

96

51

62

58

89

81

46

51

To see this answer, please subscribe.

The age and systolic BP for 6 randomly selected subjects are shown in the table below. Find the value of the Pearson Product Moment Correlation Coefficient (r).

To see this answer, please subscribe.

Describe whether the given scatter plot shows little or no association, a positive association, a negative association, whether it is weak, moderate, or strong, and whether it is linear or not.

To see this answer, please subscribe.

State Budget and Days Late. New York State has become notorious for approving the state budget after the annual deadline April 1. The amounts of the budget (in billions of dollars adjusted for inflation) and the number of days late are listed below with corresponding entries representing the same year. (The data are in order by row.)

Does it appear that the size of the budget affects the number of days that the budget is late? Use a 0.05 significance level.

To see this answer, please subscribe.

The number of convertibles sold at a car dealership in a month and the average monthly temperature (in degree Fahrenheit) are given in the Table below :

Convertible sold, x

0
2
6
5
7
9

Average monthly temperature, y

40
48
58
62
72
79

A. Find the sample correlation coefficient r?

B. Determine whether there is a positive linear correlation, a negative linear correlation, or no correlation between the variables, and explain the results, noting particularly which variable is the dependent, and which the independent, one.

C. Is there enough evidence to conclude that there is a linear correlation between the number of convertibles sold and the average monthly temperature?

Use α = 0.05 and show your work.

To see this answer, please subscribe.
Data set 25 in appendix b includes pairs of data for the sunspot number and the number of US car sales for 21 different years. Excel was used to find that the value of the linear correlation coefficient is r= -0.284.
  1. Is there a significant linear correlation between the sunspot number and the number of US car sales? Explain
  1. What proportion of the variation in the number of US car sales can be explained by the variation in the sunspot number?
This answer is free. To see it, please login.

WidgeCorp is considering branching out into cold beverages. You have been asked to come up with a regression model to forecast monthly sales of cold beverages for the next year. Think about what variables you might include in your model.

Your manager asks for your thoughts.

Respond to anybody who asks you to explain your variable selection (as per the previous question).

To see this answer, please subscribe.

A significant relationship was found to exist between a man’s weight “y” in pounds, his height “H” in inches, his waist size “W” in centimeters, and his cholesterol level “C” in milligrams. The multiple regression equation y = -199 + 2.55H + 2.18W – 0.00534C expresses this relationship. If a man has a height of 72 inches, a waist circumference of 105 cm, and a cholesterol level of 250 mg, what is the best predicted value of his weight?

To see this answer, please subscribe.

A mortgage department of a large bank is studying its recent loans. Of particular interest is how such factors as the value of the home (in thousands of dollars), education level of the head of the household, age of the head of the household, current monthly mortgage payment (in dollars), and gender of the head of the household (male = 1, female = 0) relate to the family income. Are these variables effective predictors of the income of the household A random sample of 25 recent loans is obtained.

Income

Value

Years of

Mortgage

($ thousands)

($ thousands)

Education

Age

Payment

Gender

$190

14

53

$230

1

$40.30

121

15

49

370

1

39.6

161

14

44

397

1

40.8

161

14

39

181

1

40.3

179

14

53

378

0

40

99

14

46

304

0

38.1

114

15

42

285

1

40.4

202

14

49

551

0

40.7

184

13

37

370

0

40.8

90

14

43

135

0

37.1

181

14

48

332

1

39.9

143

15

54

217

1

40.4

132

14

44

490

0

38

127

14

37

220

0

39

153

14

50

270

1

40.6

145

14

50

279

1

40.3

174

15

52

329

1

40.1

177

15

47

274

0

41.7

188

15

49

433

1

40.1

153

15

53

333

1

40.6

150

16

58

148

0

40.4

173

13

42

390

1

40.9

163

14

46

142

1

40.1

150

15

50

343

0

38.5

139

14

45

373

0

a) Determine the regression equation.

b) What is the value of R2? Comment on the value.

c) Conduct a global hypothesis test to determine whether any of the independent variables are different from zero.

d) Conduct individual hypothesis test to determine whether any of the independent variables can be dropped.

e) If variables are dropped, recomputed the regression equation and R2.

This answer is free. To see it, please login.
The data in the table represent 10 Chevrolet Corsicas randomly selected from those sold by a national car rental agency. The selling prices y are in thousands of dollars, the mileages are in thousands of miles, the variable is coded so that 1 represents a car with cruise control and 0 represents a car without cruise control, and is the age of the car in months.

The data below are from a textbook that is more than 10 years old.

35

22

32

18

25

36

33

17

26

21

0

1

0

1

0

1

1

0

1

0

22

19

19

21

18

20

19

18

15

24

7.0

8.5

7.0

8.9

7.6

7.4

7.3

8.6

7.9

8.0

(a) Identify the multiple regression equation that expresses the selling price in terms of mileage, cruise control, and age.

Paste your STATDISK picture here or on a separate page.

Show the equation.

(b) Use the multiple regression equation to predict the selling price of a car with 30,000 mi, no cruise control, and an age of 16 months.

(c) Use the multiple regression equation to predict the selling price of a car with 27,000 mi, cruise control, and an age of 20 months.

(d) Use STATDISK to find the best prediction equation for selling price (SP). Identify that equation. Explain how you found it.

[Hint: Try SP with each of the three variables as in linear regression. Try SP with each of three pairs of two variables. Look at all three variables on SP.)

To see this answer, please subscribe.

The properties for Linear Correlation are..?

To see this answer, please subscribe.