# Stats-Assignments

Note: 2-1 problem set, 3-2 problem set, 4-2 problem set, 5-2 , Problem set and 6-2 problem set

Each is 20 total is 100

2-1 Problem Set: Summary Stats | |

Instructions | |

This problem set assignment will involve activities designed to solidify the concepts learned in both Modules One and Two. Problems will be similar to those you will face on the quiz in Module Four and will include one or two real-world applications to prepare you to think like a biostatistician. Check the module resource list to see which videos on the StatCrunch Help channel will help with this assignment. For support on the concepts of descriptive statistics, variables and sampling, visit the suggest Khan Academy videos in the module resources list. To complete this assignment, review the Module Two Problem Set document. |

**IHP 525 Module Two Problems Set 2-1**

–

1. **Five-City Project.** The Stanford Five-City Project is a comprehensive community health education study of five moderately sized Northern California towns. Multiple-risk factor intervention strategies were randomly applied to two of the communities. The other three cities served as controls. Outline the design of this study in schematic form.

2. **Campus survey.** A researcher conducts a survey to learn about the sexual behavior of college students on a particular campus. A list of the undergraduates at the university is used to select participants. The investigator sends out 500 surveys but only 136 are returned.

a. Consider how the low response rate could bias the results of this study.

b. Speculate on potential limitations in the quality of information the researcher will receive on questionnaires that are returned.

3. **Employee counseling.** An employer offers its employees a program that will provide up to four free psychological counseling sessions per calendar year. To evaluate satisfaction with this service, the counseling office mails questionnaires to every 10th employee who used the benefit in the prior year. There were 1000 employees who used the benefit. Therefore, 100 surveys were sent out. However, only 25 of the potential respondents completed and returned their questionnaire.

a. Describe the population for the study.

b. Describe the sample.

c. What concern is raised by the fact that only 25 of the 100 questionnaires were completed and returned?

d. Suppose all 100 questionnaires were completed and returned. Would this then represent an SRS? What type of sample would it be?

4. **Body weight expressed as a percentage of ideal.** Body weights of 18 diabetics expressed as a percentage of ideal (defined as body weight ÷ ideal body weight × 100) are shown here. Construct a stem-and-leaf plot of these data and interpret your findings.

107 119 99 114 120 104 88 114 124 116

101 121 152 100 125 114 95 117

5. **Air samples.** An environmental study looked at suspended particulate matter in air samples (µg/m3) at two different sites. Data are listed here. Construct side-by-side stemplots to compare the two sites.

Site 1: 68 22 36 32 42 24 28 38

Site 2: 36 38 39 40 36 34 33 32

6. **What would you report?** A small data set (n = 9) has the following values {3.5, 8.1, 7.4, 4.0, 0.7, 4.9, 8.4, 7.0, 5.5}. Plot the data as a stemplot and then report an appropriate measure of central location and spread for the data.

7. **Melanoma treatment.** A study by Morgan and coworkers used genetically modified white blood cells to treat patients with melanoma who had not responded to standard treatments. In patients in whom the cells were cultured ex vivo for an extended period of time (cohort 1), the cell doubling times were {8.7, 11.9, 10.0} days. In a second group of patients in whom the cells were cultured for a shorter period of time (cohort 2), the cell doubling times were {1.4, 1.0, 1.3, 1.0, 1.3, 2.0, 0.6, 0.8, 0.7, 0.9, 1.9} days. In a third group of patients (cohort 3), actively dividing cells were generated by performing a second rapid expansion via active cell transfer. Cell doubling times for cohort 3 were {0.9, 3.3, 1.2, 1.1} days. Data are available in Excel and SPSS formats on the companion website as file MORGAN2006.*. Here is where you would log in to the companion website: http://publichealth.jbpub.com/Gerstman2e/Login.aspx?ref=/Gerstman2e/default.aspx

a. Create side-by-side boxplots of these data.

b. In addition, calculate the mean and standard deviations within each group. Comment on your findings.

3-2 Problem Set: Probability | |

Instructions | |

This problem set will give you practice in solving problems relating to probability learned in this module. Problems will be similar to those you will face on the quiz in Module Four and will include one or two real-world applications to prepare you to think like a biostatistician. Check the videos in the module resource list to see which ones will help with this assignment. To complete this assignment, review the Module Three Problem Set document. |

**IHP 525 Module Three Problems Set 3-2**

1. What is the probability of being born on:

a) February 28?

b) February 29?

c) February 28 or February 29?

2. A patient newly diagnosed with a serious ailment is told he has a 60% probability of surviving 5 or more years. Let us assume this statement is accurate. Explain the meaning of this statement to someone with no statistical background in terms he or she will understand.

3. A lottery offers a grand prize of $10 million. The probability of winning this grand prize is 1 in 55 million (about 1.8×10-8). There are no other prizes, so the probability of winning nothing = 1 – (1.8×10-8) = 0.999999982. The probability model is:

Winnings (X) |
0 | $10 x 106 |

P(X = xi) |
0.999999982 | 1.8 x 10-8 |

a) What is the expected value of a lottery ticket?

b) Fifty-five million lottery tickets will be sold. How much does the proprietor of the lottery need to charge per ticket to make a profit?

4. Suppose a population has 26 members identified with the letters A through Z.

a) You select one individual at random from this population. What is the probability of selecting individual A?

b) Assume person A gets selected on an initial draw, you replace person A into the sampling frame, and then take a second random draw. What is the probability of drawing person A on the second draw?

c) Assume person A gets selected on the initial draw and you sample again without replacement. What is the probability of drawing person A on the second draw?

5. Let A represent cat ownership and B represent dog ownership. Suppose 35% of households in a population own cats, 30% own dogs, and 15% own both a cat and a dog. Suppose you know that a household owns a cat. What is the probability that it also owns a dog?

6. What is the complement of an event?

7. Accidents occur along a 5-mile stretch of highway at a uniform rate. The following “curve” depicts the probability density function for accidents along this stretch:

a) What is the probability that an accident occurred in the first mile along this stretch of highway?

b) What is the probability that an accident did *not* occur in the first mile?

c) What is the probability that an accident occurred between miles 2.5 and 4?

8. Suppose there were 4,065,014 births in a given year. Of those births, 2,081,287 were boys and 1,983,727 were girls.

a) If we randomly select two women from the population who then become pregnant, what is the probability both children will be boys?

b) If we randomly select two women from the population who then become pregnant, what is the probability that the first woman’s child will be a boy and the second woman’s child will be a boy?

c) If we randomly select two women from the population who then become pregnant, what is the probability that both children will be boys given that at least one child is a boy?

9. Explain the difference between mutually exclusive and independent events.

10. Suppose a screening test has a sensitivity of 0.80 and a false-positive rate of 0.02. The test is used on a population that has a disease prevalence of 0.007. Find the probability of having the disease given a positive test result.

4-2 Problem Set: Statistical Inference and Hypothesis Testing | |

Instructions | |

This problem set will give you practice in solving problems relating to statistical inference and hypothesis testing learned in this module. Problems will be similar to those you will face on the quiz in Module Six and will include one or two real-world applications to prepare you to think like a biostatistician. To complete this assignment, review the Module Four Problem Set document. |

**IHP 525 Module Four Problems Set 4-2**

1. **Pediatric asthma survey, n = 50.** Suppose that asthma affects 1 in 20 children in a population. You take an SRS of 50 children from this population. Can the normal approximation to the binomial be applied under these conditions? If not, what probability model can be used to describe the sampling variability of the number of asthmatics?

2. **Fill in the blanks.** A particular random sample of n observations can be used to calculate a sample proportion. The count of successes in the sample will vary according to a (a) ____________________ probability distribution with parameters n and (b) ____________________. When the sample is large, the number of success will vary according to a normal distribution with μ = (c) ____________________ and σ = (d) ____________________. At the same time, the sampling distribution of proportion in large samples will be Normally distributed with mean p with standard deviation equal to (e) ____________________.

3. **Misconceived hypotheses.** What is wrong with each of the following hypothesis statements?

a) H0: μ = 100 vs. Ha: μ ≠ 110

b) H0: = 100 vs. Ha: < 100

c) H0: = 0.50 vs. Ha: ≠ 0.50

4. **Patient satisfaction.** Scores derived from a patient satisfaction survey are Normally distributed with μ = 50 and σ = 7.5, with high scores indicating high satisfaction. An SRS of *n* = 36 is taken from this population.

a) What is the SE of *x̅ *for these data?

b) We seek evidence against the hypothesis that a particular group of patients comes from a population in which μ = 50. Sketch the curve that describes the sampling distribution of *x̅* under the null hypothesis. Mark the horizontal axis with values that are ±1, ±2, and ±3 standard errors above and below the mean.

c) Suppose a sample of *n* = 36 finds an *x̅ *of 48.8. Mark this finding on the horizontal axis of your sketch. Then calculate the zstat for the result. Does this observation provide strong evidence against H0?

5. **Patient satisfaction (sample mean of 48.8).** In Exercise 4, you tested H0: μ = 50 based on a sample of *n* = 36 showing sample mean = 48.8. The population had standard deviation σ = 7.5.

a) What is the one-sided alternative hypothesis for this test?

b) Calculate the z-statistic for the test.

c) Convert the zstat to a P-value, and interpret the results.

5-2 Problem Set: Confidence Intervals | |

Instructions | |

This problem set will give you practice in solving problems relating to confidence intervals learned in this module. Problems will be similar to those you will face on the quiz in Module Six and will include one or two real-world applications to prepare you to think like a biostatistician. To complete this assignment, review the Module Five Problem Set document. |

**IHP 525 Module Five Problem Set 5-2**

1. **Newborn weight.** A study takes an SRS from a population of full-term infants. The standard deviation of birth weights in this population is 2 pounds. Calculate 95% confidence intervals for μ for samples in which:

a) n = 81 and = 6.1 pounds

b) n = 36 and = 7.0 pounds

c) n = 9 and = 5.8 pounds

2. **SIDS.** A sample of 49 sudden infant death syndrome (SIDS) cases had a mean birth weight of 2998 g. Based on other births in the county, we will assume σ = 800 g. Calculate the 95% confidence interval for the mean birth weight of SIDS cases in the county. Interpret your results.

3. **Hemoglobin.** Hemoglobin levels in 11-year-old boys vary according to a Normal distribution with σ = 1.2 g/dL. (a) How large a sample is needed to estimate mean μ with 95% confidence so the margin of error is no greater than 0.5 g/dL? (b) How large a sample is needed to estimate μ with margin of error 0.5 g/dL with 99% confidence?

4. **P-value and confidence interval.** A two-sided test of H0: μ = 0 yields a P-value of 0.03. Will the 95% confidence interval for μ include 0 in its midst? Will the 99% confidence interval for μ include 0? Explain your reasoning in each instance.

5. **Critical values for a t-statistic.** The term critical value is often used to refer to the value of a test statistic that determines statistical significance at some fixed α level for a test. For example, ±1.96 are the critical values for a two-tailed z-test at α = 0.05.

a) In performing a t-test based on 21 observations, what are the critical values for a one-tailed test when α = 0.05? That is, what values of the tstat will give a one-sided P-value that is less than or equal to 0.05? What are the critical values for a two-tailed test at α = 0.05?

6. **Menstrual cycle length.** Menstrual cycle lengths (days) in an SRS of nine women are as follows: {31, 28, 26, 24, 29, 33, 25, 26, 28}. Use this data to test whether mean menstrual cycle length differs significantly from a lunar month. (A lunar month is 29.5 days.) Assume that population values vary according to a Normal distribution. Use a two-sided alternative. Show all hypothesis-testing steps.

7. **Menstrual cycle length.** Exercise 6 calculated the mean length of menstrual cycles in an SRS of n = 9 women. The data revealed days with standard deviation s = 2.906 days.

a) Calculate a 95% confidence interval for the mean menstrual cycle length.

b) Based on the confidence interval you just calculated, is the mean menstrual cycle length significantly different from 28.5 days at α = 0.05 (two sided)? Is it significantly different from μ = 30 days at the same α-level? Explain your reasoning. (Section 10.4 in your text considered the relationship between confidence intervals and significance tests. The same rules apply here.)

8. **Water fluoridation.** A study looked at the number of cavity-free children per 100 in 16 North American cities BEFORE and AFTER public water fluoridation projects. The table below lists the data. You will need to manually type the data into StatCrunch to use that tool to calculate the requested information.

a) Calculate delta values for each city. Then construct a stemplot of these differences. Interpret your plot.

b) What percentage of cities showed an improvement in their cavity-free rate?

c) Estimate the mean change with 95% confidence.

Problems retrieved from Gerstman, B. B. (2015). *Basic biostatistics: Statistics for public health practice* (2nd ed.). Burlington, MA: Jones and Bartlett. ISBN: 978-1-284-03601-5

6-2 Problem Set: Comparing Independent Means | |

Instructions | |

This problem set will involve activities designed to solidify the concepts learned in this module. Problems will be similar to those on the quiz you will take in Module Eight and will include one or two real-world applications to prepare you to think like a biostatistician. To complete this assignment, review the Module Six Problem Set document. |

**IHP 525 Module Six Problems Set 6-2**

9. A pharmacist reads that a 95% confidence interval for the average price of a particular prescription drug is $30.50 to $35.50. Asked to explain the meaning of this, the pharmacist says “95% of all pharmacies sell the drug for between $30.50 and $35.50.” Is the pharmacist correct? Explain your response.

10. Hemoglobin levels in 11-year-old boys vary according to a normal distribution with σ=1.2 g/dL.

a) How large a sample is needed to estimate µ with 95% confidence so the margin of error is no greater than 0.5 g/dL?

b) How large a sample is needed to estimate µ with margin of error 0.5 g/dL with 99% confidence?

11. A *t*-test calculates *tstat* = 6.60. Assuming the study had more than just a few observations, you do not need a *t* table or software utility to draw a conclusion about the test. What is this conclusion, and why is a look-up table unnecessary?

12. A researcher fails to find a significant different in mean blood pressure in 36 matched pairs. The standard deviation of the difference was 5 mmHg. What was the power of the test to find a mean difference of 2.5 mmHg at α = 0.05 (two-sided)?

13. Would you use a one-sample, paired-sample, or independent-sample *t* procedure in the following situations?

a) A lab technician obtains a specimen of known concentration from a reference lab. He/she tests the specimen 10 times using an assay kit and compares the calculated mean to that of the known standard.

b) A different technician compares the concentration of 10 specimens using 2 different assay kits. Ten measurements are taken with each kit. Results are then compared.

14. In a study of maternal cigarette smoking and bone density in newborns, 77 infants of mothers who smoked has a mean bone mineral content of 0.098 g/cm3 (*s*1 = 0.026 g/cm3). The 161 infants whose mothers did not smoke has a mean bone mineral content of 0.095 g/cm3 (*s*2 = 0.025 g/cm3).

a) Calculate the 95% confidence interval for µ1 – µ2.

b) Based on the confidence interval you just calculated, would you reject or retain *H0*: µ1 – µ2 = 0 at α=0.05?

15. A randomized, double-blind, placebo-controlled study evaluated the effect of the herbal remedy *Echinacea purpurea* in treating upper respiratory tract infections in 2- to 11-year olds. Each time a child had an upper respiratory tract infection, treatment with either echinacea or a placebo was given for the duration of the illness. One of the outcomes studied was “severity of symptoms.” A severity scale based on four symptoms was monitored and recorded by the parents of subjects for each instance of upper respiratory infection. The peak severity of symptoms in the 337 cases treated with echinacea had a mean score of 6.0 (standard deviation 2.3). The peak severity of symptoms in the placebo group (n2 = 370) had a mean score of 6.1 (standard deviation 2.4). Test the mean difference for significance. Discuss your findings.

Problems retrieved from Gerstman, B. B. (2015). *Basic biostatistics: Statistics for public health practice* (2nd ed.). Burlington, MA: Jones and Bartlett. ISBN: 978-1-284-03601-5