Statistics homework help. STAT 371 Final Exam

1. Here is the proportion of black pigmentation on the nose (0 to 1) and the age (years) for each

of 6 male lions that we will treat as a simple random sample:

proportion black 0.14 0.30 0.59 0.48 0.79 0.51

age 1.5 4.3 5.4 7.3 8.8 5.4

(a) Find the correlation, r, for these data.

(0) 0.049, (1) 0.118, (2) 0.131, (3) 0.302, (4) 0.390,

(5) 0.427, (6) 0.472, (7) 0.730, (8) 0.907, (9) 0.940

(b) Assuming a straight-line model, estimate the intercept (β0) of the line

age = (intercept βˆ

0) + (slope βˆ

1) × (proportion black).

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.082, (4) 0.733,

(5) 5.140, (6) 6.928, (7) 10.073, (8) 12.884, (9) 14.940

(c) Assuming a straight-line model, estimate the slope (β1) of the line

age = (intercept βˆ

0) + (slope βˆ

1) × (proportion black).

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.082, (4) 0.733,

(5) 5.140, (6) 6.928, (7) 10.073, (8) 12.884, (9) 14.940

(d) A larger sample of 32 lions led to this regression model:

age = 0.88 + 10.65×(proportion black).

What age is predicted by this model for a male lion whose nose is 40% black? (That is,

it has proportion 0.40 of black pigment.)

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.082, (4) 0.733,

(5) 5.140, (6) 6.928, (7) 10.073, (8) 12.884, (9) 14.940

(e) A larger sample of 32 lions led to this regression model:

age = 0.88 + 10.65×(proportion black).

Which correlation, r, is possible for the data that led to this model? Hint: You do not

have the data, so you cannot calculate r; but enough information is provided to answer

the question.

(0) -1, (1) -0.5, (2) 0, (3) 0.88, (4) 10.65,

(f) Returning to the sample of size 6 lions in this problem’s introduction, and supposing the

population of ages is normally distributed, find the center of a 95% confidence interval

for the population mean age.

(0) 0.23, (1) 0.47, (2) 2.01, (3) 2.51, (4) 2.64,

(5) 5.45, (6) 7.33, (7) 8.81, (8) 10.90, (9) 12.08

(g) Find the error margin of the same 95% confidence interval for the population mean age.

(0) 0.23, (1) 0.47, (2) 2.01, (3) 2.51, (4) 2.64,

(5) 5.45, (6) 7.33, (7) 8.81, (8) 10.90, (9) 12.08

1

2. Consider the Wilcoxon rank sum test. Suppose two independent simple random samples of

sizes nX and nY are drawn from populations X and Y that have the same shape. Suppose

there are no ties in the data.

(a) Suppose nX = 2 and nY = 3. Find RXmin = the minimum possible rank sum for the X

sample.

(0) 0, (1) 1, (2) 2, (3) 3, (4) 4, (5) 5, (6) 6, (7) −1, (8) −2, (9) −3,

(b) Suppose the X sample is 11, 14, and the Y sample is 13, 15, 12. Find RX = the rank

sum for the X sample.

(0) 0, (1) 1, (2) 2, (3) 3, (4) 4, (5) 5, (6) 6, (7) 7, (8) 8, (9) 9,

(c) Find UX = the value of the test statistic based on the X sample.

(0) 0, (1) 1, (2) 2, (3) 3, (4) 4, (5) 5, (6) 6, (7) 7, (8) 8, (9) 9,

(d) Suppose we are testing H0 : the two populations are identical vs. HA : the X population

is shifted relative to the Y population. Find the p-value corresponding to UX = 3.

(0) 1

10 = 0.01, (1) 2

10 = 0.2, (2) 3

10 = 0.3, (3) 4

10 = 0.4, (4) 5

10 = 0.5, (5) 6

10 = 0.6, (6)

7

10 = 0.7, (7) 10

10 = 1.0, (8) 11

10 = 1.1, (9) 12

10 = 1.2,

(e) Why is running this test at the usual level α = 0.05 unhelpful? (Hint: What is the

smallest possible p-value?)

(0) The smallest possible p-value is 0.0, but this is unlikely, so we will only rarely reject

H0.

(1) The smallest possible p-value is 0.1, which is greater than α = 0.05, so we can never

reject H0.

(2) The smallest possible p-value is 0.1, which is greater than α = 0.05, so we will always

reject HA.

(3) The smallest possible p-value is 0.2, which is greater than α = 0.05, so we can never

reject H0.

(4) The smallest possible p-value is 0.2, which is greater than α = 0.05, so we will always

reject HA.

(5) The smallest possible p-value is 0.3, which is greater than α = 0.05, so we can never

reject H0.

(6) The smallest possible p-value is 0.3, which is greater than α = 0.05, so we will always

HA.

(8) The smallest possible p-value is 1.0, so we will always affirm H0.

(9) The smallest possible p-value is 1.2, so we will always affirm H0.

2

3. A simple random sample of three people are tested for strength in a one-hand bicep curl

with a dumbbell. Each person is measured for right-handed maximum weight curled and

left-handed maximum weight curled:

Person Right Left

Jack 40 36

Jill 31 28

Barry 50 45

(a) Suppose it is reasonable to assume normality. Find the observed value of the test statistic

for a test of whether there is a difference in population mean right and left arm strengths.

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.082, (4) 0.543,

(5) 4.273, (6) 6.928, (7) 10.073, (8) 12.884, (9) 14.940

(b) Find the observed value of the test statistic for a test of whether there is a difference in

population median right and left arm strengths.

(0) 0.000, (1) 0.023, (2) 0.543, (3) 1.000, (4) 1.538,

(5) 2.000, (6) 2.839, (7) 3.000, (8) 3.403, (9) 4.000

4. Each part of this question consists of a fragment of R code. Choose the best description, from

among these, of the output of that code fragment.

(0) Z, the test statistic for a Z test of H0 : µX = 0

(1) t, the test statistic for a Welch’s t test of H0 : µX − µY = 0 vs. HA : µX − µY 6= 0

(2) p-value for a bootstrap test of H0 : µX − µY = 0 vs. HA : µX − µY 6= 0

(3) center of a level 1 − α confidence interval for µX

(4) error margin of a level 1 − α confidence interval for µX

Here are the question parts.

(a) mean(x)

(b) (sum(t.hats < -abs(t.obs)) + sum(t.hats > abs(t.obs))) / B

(c) n = length(x); -pt(alpha/2, df=n-1) * sd(x) / sqrt(n)

(d) (mean(x) – 0) / (sigma / sqrt(length(x)))

(e) ((mean(x) – mean(y)) – 0) / sqrt(sd(x)^2 / length(x) + sd(y)^2 / length(y))

3

5. A study investigated the effect of exercise on attempted weight loss. It randomly selected

12 people from among a large company’s employees, and then randomly assigned 4 to do no

exercise program, 4 to do a mild walking program, and 4 to do an intensive bicycling program.

Here are the weight losses after six weeks of participants in the study (note that a negative

loss is a gain):

Exercise Weight loss (pounds)

None 1.5 −0.8 −0.3 0.0

Walk 0.7 1.7 3.0 1.3

Bike 4.5 4.0 3.7 2.7

Here is a partial ANOVA table for these data from R:

——— Df Sum Sq Mean Sq F value Pr(>F)

program (b) 26.432 13.216 (d) 0.00113

Residuals 9 (c) 0.835

(a) State a suitable null hypothesis for this study’s ANOVA.

(0) H0 : exercise level and weight loss are independent.

(1) H0 : weight loss depends on exercise level

(2) H0 : Mnone = Mwalk = Mbike, where Mnone is the population median weight loss for

employees in no exercise program, with similar definitions for the other two M’s.

(3) H0 : σ

2

none = σ

2

walk = σ

2

bike, where σ

2

none is the population variance of weight losses

for employees in no exercise program, with similar definitions for the other two σ

2

’s.

(4) H0 : µnone = µwalk = µbike, where µnone is the population mean weight loss for

employees in no exercise program, with similar definitions for the other two µ’s.

(b) What is the value of the missing exercise program degrees of freedom?

(0) 0, (1) 1, (2) 2, (3) 3, (4) 4,

(5) 5, (6) 6, (7) 9, (8) 11, (9) 12,

(c) What is the value of the missing SSE?

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.093, (4) 0.543,

(5) 4.720, (6) 6.289, (7) 7.515, (8) 12.841, (9) 15.828

(d) What is the value of the missing F statistic?

(0) 0.049, (1) 0.118, (2) 0.023, (3) 0.093, (4) 0.543,

(5) 4.720, (6) 6.289, (7) 7.515, (8) 12.841, (9) 15.828

4

(e) Supposing the ANOVA assumptions are met, what conclusion do you draw in the context

of the problem? Choose the best answer.

(0) Do not reject H0. The data are not strong evidence exercise level and weight loss

are not independent.

(1) Reject H0. The data are strong evidence exercise level and weight loss are not

independent.

(2) Do not reject H0. The data are not strong evidence median weight loss is not the

same for all three populations.

(3) Reject H0. The data are strong evidence median weight loss is not the same for all

three populations.

(4) Do not reject H0. The data are not strong evidence weight loss variance is not the

same for all three populations.

(5) Reject H0. The data are strong evidence weight loss variance is not the same for all

three populations.

(6) Do not reject H0. The data are not strong evidence mean weight loss is not the

same for all three populations.

(7) Reject H0. The data are strong evidence mean weight loss is not the same for all

three populations.

(f) Consider an ANOVA test for a data set with three sample means: 11, 12, and 13. For

which three corresponding sample standard deviations would the p-value be smallest?

(Hint: No calculation is necessary.)

(0) 0.11, 0.12, 0.13

(1) 0.11, 0.11, 0.11

(2) 11, 12, 13

(3) 11, 11, 11

6. Consider a simple random sample of size n drawn from a population with (finite) mean µ and

variance σ

2

. As n increases, which of these random variables is or becomes approximately

normally distributed?

For each part, answer either 0 for “No” or 1 for “Yes.”

(a) √

P −π

π(1−π)/n

, where P =

X

n

is a sample proportion from a population with proportion π

of successes

(b) Xn, the last point in the sample

(c) Z =

X¯−µ

σ/√

n

(d) T =

X¯−µ

S/√

n

(e) M, the sample median

(f) S, the sample standard deviation

(g) IQR, the sample interquartile range

5

7. A survey asked several thousand teens “What do you think are the chances you will be married

in the next ten years?” Here is a contingency table of the responses by biological sex:

Female Male Total

Almost no chance 119 103 222

Some chance, but probably not 150 171 321

A 50-50 chance 447 512 959

A good chance 735 710 1445

Almost certain 1174 756 1930

Total 2625 2252 4877

(a) Under H0 : “Biological sex and perceived chance of marriage are independent”, find the

approximate expected count of females who respond “Almost certain.”

(0) 102, (1) 108, (2) 891, (3) 1039, (4) 1174,

(5) 2084, (6) 3809, (7) 4771, (8) 12841, (9) 15828

(b) The expected count for males who respond “A 50-50 chance” is 442.8. The chi-square

statistic is a sum of ten terms. The term in the chi-square statistic for males who respond

“A 50-50 chance” is .

(0) 10.2, (1) 10.8, (2) 89.1, (3) 103.9, (4) 117.4,

(5) 208.4, (6) 380.9, (7) 477.1, (8) 1284, (9) 1583

(c) Find the degrees of freedom for the chi-square test for this contingency table.

(0) 0, (1) 1, (2) 2, (3) 3, (4) 4,

(5) 5, (6) 6, (7) 7, (8) 8, (9) 10

(d) Software gives a chi-square statistic of 69.8 for the whole table. Find the P-value.

Upper Tail Points for χ

2 Distributions

α (right-tail area)

df .25 .10 .05 .01 .001

1 1.3 2.7 3.8 6.6 10.8

2 2.8 4.6 6.0 9.2 13.8

3 4.1 6.3 7.8 11.3 16.3

4 5.4 7.8 9.5 13.3 18.5

5 6.6 9.2 11.1 15.1 20.5

6 7.8 10.6 12.6 16.8 22.5

7 9.0 12.0 14.1 18.5 24.3

8 10.2 13.4 15.5 20.1 26.1

9 11.4 14.7 16.9 21.7 27.9

10 12.5 16.0 18.3 23.2 29.6

(0) p-value < .001

(1) .001 < p-value < .01

(2) .01 < p-value < .05

(3) .05 < p-value < .10

(4) .10 < p-value < .25

(5) .25 < p-value

6

8. Mark each statement as either

(0) false, or (1) true

(a) In a hypothesis test, the p-value is the probability that H0 is true.

(b) Increasing the confidence level from 90% to 95%, while keeping everything else the same,

increases a confidence interval’s margin of error.

(c) In a hypothesis test with significance level α = 0.05 and power 1 − β = 0.80, the

probability H0 is rejected when H0 is true is 0.80.

(d) A researcher who randomly samples 50 students at a Badger football game to test

whether the distribution of the class (freshman, sophomore, junior, senior, graduate)

of students at the game is the same as the distribution (from the registrar) across all

UW students should use a chi-squared test for independence.

(e) zα/2 < tn−1,α/2

for each n > 2 and each α such that 0 < α < 1.

(f) In a test of H0 : µX − µY = 0 against HA : µX − µY 6= 0 from two independent simple

random samples from populations X and Y , if σ

2

X and σ

2

Y

are known to be very different,

then a two-sample t test using a pooled variance estimate is appropriate.

(g) Doubling the largest value in a large sample of positive measurements increases the

sample standard deviation.

(h) Doubling the largest value in a large sample of positive measurements increases the

interquartile range.

(i) A bootstrap test relies on many samples of size n from the population.

7

9. Consider these statistical methods:

(0) Z test for µ (for one sample or differences from paired data)

(1) t test for µ (for one sample or differences from paired data)

(2) bootstrap test for µ (for one sample or differences from paired data)

(3) 2-sample t test for µX − µY

(4) Welch’s t test for µX − µY

(5) bootstrap test for µX − µY

(6) ANOVA

(7) χ

2

test for goodness of fit

(8) χ

2

test for independence

(9) correlation and linear regression

For each question, below, write the digit corresponding to the most appropriate method,

above. It is ok to use a method more than once.

(a) Fred interviews a random sample of 20 students in his Intro Botany class and asks

each student, “How many minutes did you study for the final exam?” and “How many

points did you earn on the exam?” He makes a scatter plot of (minutes, points) for

the 20 students and notices that the points are, more-or-less, along a line. What is the

relationship between minutes of study time and points earned?

(b) Lisa randomly selects 20 cars in the large UW parking “Lot 60.” For each car, she

measures the front-left tire pressure and records the difference between her measurement

and the pressure specified by the manufacturer. (Note: The specified pressure is not a

measurement). She makes a normal QQ plot of these differences and notices that the

points are, more-or-less, along a line. For the Lot 60 population, is the average front left

tire under-inflated by more than 5 psi?

(c) Andre counts the number of each grade of evergreen tree seedling in the box of 50 he

bought from a nursery. The nursery advertised that it would select trees in his box

randomly from a population with 50% graded “best,” 30% graded “good,” and 20%

graded “satisfactory.” Are the counts evidence that his 50 seedlings weren’t randomly

taken from the promised population?

(d) Diamond randomly selects 10 students on each floor of her 4-floor dorm and asks the 40

students how much sleep they got the night before. Is there a difference in population

average sleep times across the 4 floors?

(e) Hao randomly selects 50 right-handed students from his dorm. He counts the number of

times each student can bounce a ping-pong ball on a paddle before it hits the ground.

He tests each student twice, once holding the paddle in the right hand and once in the

left. Is there a difference between the two population mean numbers of bounces?

8

For z = a.bc, look in row a.b and column .0c to find P(Z < z). e.g.

For z = 1.42, look in row 1.4 and column .02 to find P(Z < 1.42) = .9222 (on next page).

Cumulative N(0, 1

2

) Distribution, z ≤ 0

✟✟✟✟✟✟

z 0

α

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-3.6 .0002 .0002 .0001 .0001 .0001 .0001 .0001 .0001 .0001 .0001

-3.5 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002

-3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002

-3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003

-3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005

-3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007

-3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010

-2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014

-2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019

-2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026

-2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036

-2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048

-2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064

-2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084

-2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110

-2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143

-2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183

-1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233

-1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294

-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367

-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455

-1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559

-1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681

-1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823

-1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985

-1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170

-1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379

-0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611

-0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867

-0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148

-0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451

-0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776

-0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121

-0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483

-0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859

-0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247

-0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641

9

Upper Tail Points tn−1,α for the Student’s tn−1 Distributions

❍❍❍❍❍

❍

T ∼ tn−1

α

tn−1,α

degrees of freedom right-tail area α

n − 1 .25 .10 .05 .025 .01 .005 .001

1 1.000 3.078 6.314 12.706 31.821 63.657 318.309

2 .816 1.886 2.920 4.303 6.965 9.925 22.327

3 .765 1.638 2.353 3.182 4.541 5.841 10.215

4 .741 1.533 2.132 2.776 3.747 4.604 7.173

5 .727 1.476 2.015 2.571 3.365 4.032 5.893

6 .718 1.440 1.943 2.447 3.143 3.707 5.208

7 .711 1.415 1.895 2.365 2.998 3.499 4.785

8 .706 1.397 1.860 2.306 2.896 3.355 4.501

9 .703 1.383 1.833 2.262 2.821 3.250 4.297

10 .700 1.372 1.812 2.228 2.764 3.169 4.144

11 .697 1.363 1.796 2.201 2.718 3.106 4.025

12 .695 1.356 1.782 2.179 2.681 3.055 3.930

13 .694 1.350 1.771 2.160 2.650 3.012 3.852

14 .692 1.345 1.761 2.145 2.624 2.977 3.787

15 .691 1.341 1.753 2.131 2.602 2.947 3.733

16 .690 1.337 1.746 2.120 2.583 2.921 3.686

17 .689 1.333 1.740 2.110 2.567 2.898 3.646

18 .688 1.330 1.734 2.101 2.552 2.878 3.610

19 .688 1.328 1.729 2.093 2.539 2.861 3.579

20 .687 1.325 1.725 2.086 2.528 2.845 3.552

21 .686 1.323 1.721 2.080 2.518 2.831 3.527

22 .686 1.321 1.717 2.074 2.508 2.819 3.505

23 .685 1.319 1.714 2.069 2.500 2.807 3.485

24 .685 1.318 1.711 2.064 2.492 2.797 3.467

25 .684 1.316 1.708 2.060 2.485 2.787 3.450

26 .684 1.315 1.706 2.056 2.479 2.779 3.435

27 .684 1.314 1.703 2.052 2.473 2.771 3.421

28 .683 1.313 1.701 2.048 2.467 2.763 3.408

29 .683 1.311 1.699 2.045 2.462 2.756 3.396

30 .683 1.310 1.697 2.042 2.457 2.750 3.385

40 .681 1.303 1.684 2.021 2.423 2.704 3.307

60 .679 1.296 1.671 2.000 2.390 2.660 3.232

100 .677 1.290 1.660 1.984 2.364 2.626 3.174

∞ .674 1.282 1.645 1.960 2.326 2.576 3.090

10