MHA610 Introduction to Biostatistics Assignments and DQs
MHA610 Introduction to Biostatistics Assignments and DQs
Course Guide
This course explores the application of fundamental statistical methods to the healthcare environment. Course content includes both descriptive and inferential methods including: data analysis, statistical estimation, regression analysis, analysis of variance, hypothesis testing, and analysis of longitudinal data.
Note: This course uses software that is not Mac OS compatible. Access to a Windows PC or a Windowsbased platform is required.
Table of Contents
Course at a Glance
Course Description
This course explores the application of fundamental statistical methods to the healthcare environment. Course content includes both descriptive and inferential methods including: data analysis, statistical estimation, regression analysis, analysis of variance, hypothesis testing, and analysis of longitudinal data.
Course Design
The purpose of this course is to provide an introduction to statistics relating to health care research. Students will be introduced to various health data sources, and will proceed to analyze, assess, and evaluate these data using basic statistical concepts and methodology. Students will learn how to use statistical software in order to obtain and interpret descriptive and inferential statistical results.
Students will also learn basic principles of probability theory and how to draw conclusions from available data utilizing statistical tools, assessing whether observed statistics could occur by chance alone. Notions of probability are of fundamental importance, and students will utilize both frequentist and Bayesian probability concepts in their evaluations.
Statistics not only plays a crucial role in undertaking and interpreting research in the health sciences, but also arises in quotidian settings. For example, what does it mean to have a 20% chance of precipitation today? At the conclusion of this course, the students’ increased understanding of statistics and probability will empower them in their studies, their work, and their daily lives.
Prerequisites
There are no prerequisites for MHA610.
Course Learning Outcomes
Upon successful completion of this course, students will be able to
 Apply basic statistical principles for describing, analyzing, and interpreting health
 Apply statistical methods of estimation and hypothesis testing in biostatistics and
 Analyze relationships between quantitative variables using correlation and linear
 Evaluate health care delivery and services using epidemiological data and appropriate statistical
 Communicate the findings and implications from statistical analyses to health care
Course Materials
Required Text
Triola, M. M., & Triola M. F. (2006). Biostatistics for the biological and health sciences. Boston, MA: Pearson Education, Inc.
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
 The software and data sets for the course may be accessed through
Note: This course uses software that is compatible with both Mac and Windowsbased platforms. In addition, Microsoft Excel and Word will be used extensively throughout the course.
Required Resources
Supplemental Materials
Koziol, J. (2014). (2014). [PDF]. College of Health, Ashford University: San Diego, CA.
Websites
Centers for Disease Control. (2014). . Retrieved from
Centers for Disease Control and Prevention. (2014). Retrieved from
(2012). Retrieved from byageandgender
Recommended Resources
Multimedia
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+births%29/0_zipwy4i7
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+births%29/0_ignho54w
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+brainsize%29/0_qhhcxu1d
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+crossover%29/0_srbv0wj1
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+HCCtest%29/0_4qwt7r8z
Koziol, J. (Producer). (2014). [Video file]. Retrieved from mortality%29/0_snyfubtn
Course Grading
Multiple measures of assessment are used in the course, allowing students opportunities to demonstrate their learning in more than one way and giving consideration to individual learning styles. Course components that will be assessed include:
Discussions
Each week students will participate in online discussions with classmates, which are related to the week’s readings. These discussions replace the interactive dialogue that occurs in the traditional classroom setting. Each week, students’ initial discussion posts are due by 11:59 p.m. (in the time zone in which each student resides) on Day 3 (Thursday). Students will have until 11:59 p.m. on Day 7 (the following Monday) to make the required minimum number of response posts to classmates. Discussions represent 26% of the overall course grade.
Quizzes
In Weeks Three and Six, students will demonstrate and reinforce their understanding of the week’s content by taking openbook quizzes. There is no time limit to complete the quiz, and each quiz can be taken two times. The quiz must be completed in one sitting, by Day 6 of the week in which it is due. The questions are multiple choice and true/false. Each quiz is worth 5 percent. Quizzes represent 16% of the overall course grade.
Assignments
There are written assignments due in Weeks One through Five of this course. These assignments must reflect college level writing. Assignments represent 40% of the overall course grade.
Final Project
The final assignment for this course is a Final Project. The purpose of the Final Project is for you to culminate the learning achieved in the course by taking a new approach to the datasets that you have looked at throughout the course. The Final Project represents 18% of the overall course grade.
Grading Percent Breakdown
Activity 
Grading Percent 
Discussions  26 
Quizzes  16 
Assignments  40 
Final Project  18 
Total  100 
Week One
Course Content
To be completed during the first week of class
Overview
Activity  Due Date  Format  Grading Percent 
Post Your Introduction  Day 1  Discussion  2 
Hospital Data  Day 3 (1st post)  Discussion  4 
U.S. Mortality Rates Histogram  Day 7  Assignment  8 
Weekly Learning Outcomes
This week students will
 Calculate summary statistics from
 Create appropriate graphs and charts for nominal and ordinal
Introduction
During Week One, you will be introduced to quantitative (continuous) and qualitative (discrete or categorical) data. You will learn appropriate graphical techniques for displaying and summarizing both types of data. You will learn about descriptive statistics for location (i.e., mean, median) and scale (i.e., standard deviation, range), for reporting purposes. You will begin to learn fundamentals of probability theory.
Required Resources
Text
Triola, M.M., & Triola, M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 1: Introduction
 After reading the chapter, review your grasp of the material in Chapter 1 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 1. Solutions to these problems are given at the end of the
 Chapter 2: Describing, Exploring, and Comparing Data
 After reading the chapter, review your grasp of the material in Chapter 2 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 2. Solutions to these problems are given at the end of the
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Materials
 Koziol, J. (2014). MHA610_Week 1_Discussion_Hospital data [Excel file].
 Koziol, J. (2014). MHA610_Week 1_Discussion_Hospital data [Statdisk file].
Website
World Life Expectancy. (2012). Retrieved from
 This website houses the data that will be used for the U.S. Mortality Rates histogram assignment for this week.
Recommended Resources
Multimedia
Koziol, J. (Producer). (2014). MHA610 Week 1 Assignment (Part 1) [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
 These screencasts help explain the Week One
Discussions
Participate in the following discussions:
 Post Your Introduction. 1^{st} Post Due by Day 1. Post a brief introduction on the first day of class. Share any past experiences (academic or professional) that you have had with epidemiology, biostatistics, or
health data analysis. What topic are you most interested in as it relates to epidemiology and biostatistics? Briefly explain why you are interested in this topic. Additionally, describe what you are looking forward to learning in this course.
Guided Response: Review several of your classmates’ posts. Welcome at least three of your peers to this course. What similarities in experience did you note between you and your classmate? Did your colleague’s description of his or her topic of interest differ from your own? If so, did the description spark your interest in that topic as well? If so, how? Introduction should be at least 250 words in APA format.
 Hospital Data. 1^{st} Post Due by Day 3. The MHA610_Week 1_Discussion_Hospital Data Excel file (available in the classroom) and MHA610_Week 1_Discussion_Hospital Data Statdisk file (available in the classroom) contains basic demographic information on 250 patients admitted to a community hospital over a two week period. The first row of the worksheet indicates the variable names:
Gender  Male (M) or female (F) 
Ethnicity  
SevIllnessCode  These are All Patient Refined Diagnosis Related Groups (APRDRG) categories of severity of illness, ranging from: 
SevIllnessDescr  Mild (Category 1) to extreme (Category 4) 
Age  In years 
Wt  Patient weight in kilograms 
Ht  Patient height in centimeters 
BMI  Patient body mass index (BMI) where BMI = wt/ht*2, with weight in kilograms and height in meters 
APRDRG  Denotes All Patient Refined Diagnosis Related Group, a widely used inpatient classification system. 
For this discussion, describe and summarize the demographic information on these patients. You may use tables or graphs (or both) for this purpose. Your goal is to convey to the reader an accurate snapshot of these patients. Support your response with correct scholarly sources. You initial post must be at least 250500 words.
Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Review your colleague’s summary of the data. Did the method of presentation provide you with any new insights? If so, what are they? If not, what suggestions might you make to your colleague that could improve his or her representation of the data? All initial and peer postings should be at least 250500 words in APA format supported by scholarly sources. MHA610 Introduction to Biostatistics Assignments and DQs
Assignment
U.S. Mortality Rates. Due by Day 7. Examine the burden of disease in the United States to provide important information on which parameter is to base decisions on public health priorities.
To do this, we will utilize mortality data for the United States. In the first part of this assignment, you will download and examine mortality data for your home state.
Go to
 Choose your home state under the Choose State option (panel on left hand side)
 Select BOTH under the Choose gender option in the middle
 Scroll down to the bottom of the page, and read the fine print to learn for which year the mortality data have been
 Copy and paste the relevant mortality data into
 Drag your mouse over all of the Cause of Death rows, (50 rows), right click, and select Copy,
 Open Excel and paste your selection into Excel. You should have a spreadsheet with 50 row and 19 columns (Columns AS).
For the first part of the assignment, you will prepare a histogram of the leading causes of death (regardless of age) in your state. Follow the steps below in order to prepare your histogram:
 Sort the Data
 The numbers of deaths, all ages, are given in Column
 Select all the
 Then, select Data>Sort>Sort by Column C, Values, largest to smallest. (Make sure that my data has headers is not
 You now have the leading causes of death in your state in Column A (cause) and Column C (frequency).
 If you already know how to draw a histogram in Excel, proceed to do so with Columns A and C, making sure to truncate the data to the 30 leading
 If you do not know how to draw a histogram in Excel, here’s one method:
 Choose the Chart Wizard, chart type column, chart subtype clustered column (Step 1).
 Click Next for Step
 At Step 2, Click the Series button, which will open a new
 Click Add under Series,
 Enter Causes of Death in the Name box;
 Clear the Values box, then
 Drag your mouse over the 30 largest frequencies for the Values; and,
 Drag your mouse over the first 30 causes of death (Column A) for the Category (X) axis labels box.
 Click Next, and you’ll be brought to the Chart Options
 Add a suitable title (e.g., Leading Causes of Death, 2010, your state)
 Label the Y axis Frequency.
 Click Next, and place the histogram in a new
You now have a histogram with the leading causes of death for your state. This presents one picture of the burden of disease in your state, but it isn’t the only picture. We shall now look at a different metric: years of life lost due to each cause.
To do this, we will assume that the average life span is 80 years, and we will calculate how many years of life are lost for each cause of death, according to the age at death.
Please note that the ages are in categories (0 – 14, 15 – 24, 25 – 34, …, 65 – 74, and 75+). For this exercise, we will assume that the average age of death is at the middlepoint of each of these intervals (eg., 7.5, 19.5, 29.5, …, 69.5, and 80 for the last age category respectively). For example, an individual death in the 1524 (19.5) age group incurs equals 60.5 years of life lost (8019.5 = 60.5).
To make this histogram, we will compute a new column of values, years of life lost for each cause of death. (This entails writing a simple formula in Excel for the calculation corresponding to the first row of data, then dragging the formula down that column. If you have never done this calculation before in Excel, consult the screencast for detailed instructions.)
 Go back to the original Excel spreadsheet that contained your
 Using the formula above, create a column that calculates the years of life lost.
 Now, sort the data by the years of life lost column, in descending order, before drawing a histogram of the results.
 Finally, create a histogram of the 30 leading causes of death, in decreasing order of years of life
 Do not forget to label the yaxis and provide a title for the
You now have two histograms representing the burden of disease in your state. The first histogram orders the causes of death in terms of overall mortality, and the second orders causes of death in terms of years of life lost.
Create a report of your findings that contains both of the histograms. The report should be at least 250500 words supported by scholarly resources and in APA format. Assume that your task is to assess and prioritize public health needs in your state, and you need to inform and persuade policy makers for improving the wellbeing of your state’s constituents. Describe which findings are most relevant for this.
You should also explain any methodological or data limitations that exist in either histogram. In particular, describe your conclusions would be altered if you were to refine your findings by reanalyzing mortality rates based on gender and race in addition to age. The assignment should be at least 500 words in APA format supported by scholarly sources.
Week Two
Course Content
To be completed during the second week of class
Overview
Activity  Due Date  Format  Grading Percent 
Game of Chance  Day 3 (1st post)  Discussion  4 
Sex Ratios  Day 7  Assignment  8 
Weekly Learning Outcomes
This week students will
 Calculate probabilities of events using fundamental notions and rules of probability
 Apply the binomial distribution to discrete data
 Apply the Poisson distribution to discrete data
Introduction
In Week One, you were introduced to some fundamentals of probability. You will continue your exploration of probability theory in Week Two, including Bayes theorem for determination of posterior probabilities on the basis of prior and marginal probabilities. You will examine properties and parameterization of the two basic discrete probability distributions, the binomial and the Poisson. You will begin to use the binomial and Poisson distributions for inferential procedures.
Required Resources
Text
Triola, M.M., & Triola, M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 3: Probability
 After reading the chapter, review your grasp of the material in Chapter 3 by solving the oddnumbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 3. Solutions to these problems are given at the end of the
 Chapter 4: Discrete Probability Distributions
 After reading the chapter, review your grasp of the material in Chapter 4 by solving the oddnumbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 4. Solutions to these problems are given at the end of the
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Material
Koziol, J. (2014). (2014). [PDF]. College of Health, Ashford University: San Diego, CA.
 This document provides an example that will be used in the Game of Chance discussion for this
Website
Center for Disease Control. (2014). Retrieved from
 This website houses the data that will be used for the Sex Ratios assignment for this
Recommended Resources
Multimedia
Koziol, J. (Producer). (2014). MHA610 Week 2 Assignment (Part 1) [Video file]. Retrieved from
Koziol, J. (Producer). (2014). MHA610 Week 2 Assignment (Part 2) [Video file]. Retrieved from
 These screencasts help explain the Week Two
Discussion
Participate in the following discussion:
Game of Chance. 1^{st} Post Due by Day 3. For this discussion, select a game of chance, explain it briefly if it is likely to be unfamiliar to your classmates, then calculate probabilities of various outcomes like winning or losing
in this game. For example, you might choose your state lottery, scratch card game, a card game like poker, or a dice game like Craps or Yahtzee, as your game of chance.
As illustration, read a lottery analysis in
Guided Response: Respond to at least two of your classmates who chose a different game of chance than you by Day 7 at 11:59PM. Did your colleague provide enough explanation of the game to allow you to understand the analysis? Was the analysis provided by your classmate correct? If so, what optimal strategy for playing that particular game was described? If not, what suggestions would you make to your colleague to amend any issues?
Assignment
Sex Ratios. Due by Day 7. The normal male to female live birth sex ratio ranges from about 1.03 to 1.07. The sex ratio is defined as the ratio of male births to female births. You might expect boy and girl births to be equally likely, but in fact, baby boys are somewhat more common than baby girls.
Higher sex ratios are thought to reflect prenatal sex selection, especially among cultures where sons are prized more heavily than daughters. We will review sex ratios in the United States as a whole, as well as in individual states, to determine whether sex ratios vary significantly among various ethnic and racial groups.
To do this analysis, we will utilize natality data for the United States, provided by the Centers for Disease Control.
In the first part of the assignment, we will look at sex ratios for your home state, over the time period 1995 to 2002, by race. To obtain this information:
 Go the
 Click on Births under the WONDER Online Databases to bring you to the Natality Information screen
 On this screen, click Natality for 19952002.
 On the following screen, click I Agree in order to agree to abide by the government rules for data use (primarily, concerning confidentiality).
 This will bring us to the Natality, 19952002 Request
 In the block Organize table layout, group results by year, followed by race, and then gender.
 In the block Select maternal residence, choose your state.
 You can leave blocks 3 through 6 at their default values (i.e., All).
 Click Send.
 A new screen will open, with data (births) tabulated by Year, Race, and Gender.
 Click Export, click Save, and a text file named Natality, _19952002 .txt or something similar will be downloaded onto your computer.
We can now process the downloaded data in Excel.
 Load the text file into Excel. This will probably open the Text Import
 Accept the defaults, and you should have a spreadsheet with the natality data
 We will need to edit the data slightly before calculating sex ratios and drawing graphs of the sex ratios. To do this:
 Scroll down to the end of the spreadsheet, and delete the rows with the extraneous information about the dataset. (This starts on or about row )
 You may also delete the columns with headings Year CodeRace Code, and Gender Code since we will not be using them, however this is not
 Next, sort the data, in order to delete some extraneous rows. Select the remaining columns, choose Data > Sort, then sort by Race in ascending
 Scroll down to the end of the worksheet, and delete all rows with blanks for Race.
 We will now add a new column to the worksheet for
 Go to the first blank column in the worksheet: this column should be immediately to the right of a column labeled Births.
 In the first row of this column, type Ratios.
 Now, we will calculate different proportions of births, using formulas in excel. It is important to use excel to do the calculation, because it will allow you to quickly complete all of the
 First, calculate the ratio of female births to total births for the American Indian race (female births/total births).
 Next, calculate the ratio of male births to total births for the American Indian race (male births/total births).
 Finally, calculate the ratio of male births to female births (male births/total births)
 If you don’t know how to do this calculation easily in Excel, please check out the screencast, which reviews
 Once you have completed the first three cells in the ratio column, you can select them and copy
 Select the remaining cells in the column and
 You have now completed calculating all of the ratios, however, you may wish to double check to ensure that the formulas have adjusted for each
 Once you have the Ratio column filled out, select that column, then Copy.
 With the column still selected you want to select, click Paste Special and then Values. This will convert the formulas you entered to numbers, so they do not change when you do the next
 Select all the columns, then Data>Sort>Notes in ascending order. We will be graphing the sex ratios for the years 1995 to 2002, by
 Feel free to drop the two to four races that have the fewest numbers of births in your
 Draw a line chart with markers with the year along the Xaxis (we are looking at 1995 through 2002) and sex ratio along the Yaxis (with sex ratios typically between 1 and 1.1, though this may vary in your state).
 If your version of excel has the Chart Wizard:
 In step two of the Chart Wizard, choose the Series tab; in this window you’ll be adding all the information for the various
 Under category (X) axis labels, drag your mouse over the cells 1995, 1996…
 For values, draw your mouse over the seven successive sex ratios for the particular racial group you chose; in the name box, enter the racial group; do this for each of the groups you want to display.
 Select Next when you have finished with all the racial groups, and you will be brought to the Chart Options
 Here, you can customize your graph, with a title and X and Y axis labels (i.e., your state births, year, and sex ratio respectively).
 Continue with Next, and finish the
 If your version of excel does not have the Chart Wizard, you will need to do some reformatting of your data before you can create a line chart. It is good practice to create a new worksheet in order to preserve your original
 Your data should mimic the way you want your line chart to look. In this case, you want to create horizontal labels for each of the years (1995 through 2002) and vertical labels for each of the races. It should follow this format:
 If your version of excel has the Chart Wizard:
Year 1  Year 2  Year 3  
Race A  Ratio for Race A in Year 1  Ratio for Race A in Year 2  Ratio for Race A in Year 3 
Race B  Ratio for Race B in Year 1  Ratio for Race B in Year 2  Ratio for Race B in Year 3 
 After you have reformatted your data, select all of the data, then select Insert, then Line, then Line with Markers.
 You should now have a line chart with each race having its own line, the ratios on the Y axis, and the years on the Xaxis.
 You may wish to modify the Yaxis by rightclicking on it. Your upper and lower values on the axis should be just above and below your highest and lowest ratio
 In a Word document, paste the graph you created (or, alternatively, submit your Excel workbook along with the Word document) and describe your findings, making sure to:
 Summarize the sex ratios for each of the racial
 Explain whether the sex ratios are relatively constant through the 1995 to 2002 period for all of the racial groups or if there are trends?
 Explain any racial groups that have noticeably higher or lower sex ratios than other
 Explain the conclusions you are drawing from your
In the second part of this assignment, you will undertake some formal statistical procedures with the natality data. We will repeat the previous steps, with some slight modifications.
 Return to the
 Click on Births under the WONDER Online Databases to get to the Natality Information
 Select Natality for 2007 – 2012.
 On the next screen, click I Agree in order to agree to abide by the government rules for data use (primarily, concerning confidentiality).
 This will bring us to the Natality, 20072012 Request
 In block Organize table layout, group results by race and then gender (not year).
 In block Select maternal residence, choose your state.
 You can leave block 3 at its default values (typically, All).
 In block Select birth characteristics; select All Years under Year, and 1st child born alive to mother under Live Birth Order.
 Blocks 5 and 6 can be left at their default
 Click Send. A new screen will open, with data (births) tabulated by race and
 Click Export, click Save, and a text file named Natality 20072012.txt (or something similar) will be downloaded onto your computer.
We have only four racial groups in this dataset: American Indians or Alaska Natives, Asian or Pacific Islanders, Black or African Americans, and Whites.
Using the normal approximation to the binomial distribution (without continuity correction), calculate z statistics for assessing whether the proportion of boys is .51 in each of the 4 racial groups, where n is the total number of births in a particular cohort, p = .51, q = 1 – p = .49, and x is the number of boy births; z = ((x – np) / sqrt(npq) ).
Under the null hypothesis that the proportion of boys should be 0.51, and under the normal approximation to the binomial distribution, the z statistics should have (approximately) standard normal distributions, (mean 0, standard deviation 1). Do any of the z statistics suggest that the proportion of boy births in any particular racial group differs significantly from .51?
Comment on your findings in your written report. Describe whether you think your results would change if we hadn’t limited consideration to the firstborn. This assignment should be at least 250500 words in APA format supported by scholarly sources.
Week Three
Course Content
To be completed during the third week of class
Overview
Activity  Due Date  Format  Grading Percent 
Confidence Intervals  Day 3 (1st post)  Discussion  4 
Week Three Quiz  Day 6  Quiz  8 
Immune Responses  Day 7  Assignment  8 
Weekly Learning Outcomes
This week students will
 Apply the normal distribution to continuous data
 Explain the use of statistical estimators in practice.
 Construct confidence intervals for sample
Introduction
In Week Two, you examined two fundamental discrete probability distributions, the binomial and the Poisson. In this week, you will be introduced to the fundamental continuous probability distribution, the normal or Gaussian. You will learn how the normal distribution is parameterized by the mean and the variance, and how to undertake probability calculations based on the normal distribution. You will be introduced to the central limit theorem, and how it relates to the normal distribution.
You will also learn about sampling distributions (especially, the t distribution), and properties of estimators. Estimation is a key concept in statistics, and you will learn how to construct confidence intervals for sample estimators. You will learn about planning of experiments, for which sample size and power are fundamental notions.
Required Resources
Text
Triola, M.M., & Triola, M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 5: Normal Probability
 After reading the chapter, review your grasp of the material in Chapter 5 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 5. Solutions to these problems are given at the end of the
 Chapter 6: Estimates and Sample Sizes with One
 Review your grasp of the material in Chapter 6 by solving the oddnumbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 6. Solutions to these problems are given at the end of the
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Materials
 Koziol, J. (2014). MHA610_Week 3_Assignment_Data [Excel file].
 Koziol, J. (2014). MHA610_Week 3_Assignment_Data [Statdisk file].
Recommended Resources
Multimedia
Koziol, J. (Producer). (2014). MHA610 Week 3 Assignment (Part 1) [Video file]. Retrieved from
Koziol, J. (Producer). (2014). MHA610 Week 3 Assignment (Part 2) [Video file]. Retrieved from
 These screencasts help explain the Week Three
Discussion
Participate in the following discussion:
Confidence Intervals. 1^{st} Post Due by Day 3. In this discussion, we will investigate confidence intervals for binomial probabilities. The discussion is in two parts.
 Return to the data you had generated in the second part of the Week Two assignment. You should have total numbers of firstborn boys and girls in your state between the years 2007 and 2012 separately by racial group: American Indians or Alaska Natives, Asian or Pacific Islanders, Black or African Americans, and Whites. For the first part of this discussion, construct and report the 95% confidence intervals for the proportions of firstborn boys, separately for each racial group. (Use the normal approximation to the binomial distribution.) Comment on the confidence intervals: can you infer from the confidence intervals that the proportions of firstborn boys differ among the racial groups? Explain what the widths of the confidence intervals tell
 Leading up to elections, you often hear results of polls of voters’ preferences, with statements such as: “This poll was taken from a random sample of 600 potential voters, and has an accuracy exceeding 96%.” You may want to interpret the accuracy statement in terms of “margin of error”, as explained in the text, Section 62. Remember, the width of a confidence interval is a measure of the precision of the estimate
Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Consider the 95% confidence intervals your colleague presented. Do all the intervals overlap with those you presented in your initial post? Did the inferences presented by your colleague match with yours? Compare the proportion of boy births in his or her state with those in your state. What statistically significant differences can you note? Do you concur with your colleague’s interpretation of the polling statement? What suggestions might you make to aid your colleague in evaluating this type of polling result? All initial and peer postings should be at least 250500 words in APA format supported by scholarly sources.
Quiz
Week Three Quiz. Due by Day 6. Complete the 10question quiz on the readings from Weeks One through Three. You may wish to review all of the oddnumbered questions from the text that you have completed in Weeks One, Two, and Three. There is no time limit to this quiz. You will have two attempts to take the quiz. If multiple attempts are made, eCollege will take the last grade earned not the highest grade earned.
Assignment
Immune Responses. Due by Day 7. Background: Abnormal immune responses can trigger a range of autoimmune diseases, in which an individual’s immune system is attacking normal tissues in the body. Well known examples of autoimmune diseases are type 1 diabetes mellitus, lupus, and multiple sclerosis.
Ideally, one would like to harness the immune system to attack abnormal substances or tissues like cancer, while sparing the normal (unaffected) tissue. Many tumor cells produce antigens (proteins) that theoretically ought to trigger an immune response: that is, one’s immune system ought to recognize cancer cells as somehow foreign or abnormal, and thereafter eliminate these cells from the body. The field of cancer immunotherapy is actively pursuing this study.
Tumor antigens may also be useful for diagnostic tests; high levels of tumor antigens could be taken as markers or indicators of cancer. In this assignment, you will be examining levels of tumorassociated antigens (TAAs) as determined from immunoassays (i.e., biochemical tests that measure the concentrations of the tumorassociated antigens in serum samples).
 Download the Excel file MHA610_Week 3_Assignment_Data.xls (available in the classroom), and open it.
 The spreadsheet contains data on 250 individuals: 90 normal individuals from San Diego (the controls), and 160 individuals from Korea and China, all of whom were diagnosed with hepatocellular carcinoma (HCC).
 Serum samples were taken from the controls and from the cases at time of diagnosis of HCC. Levels of a panel of 12 tumorassociated antigens (TAAs) were assessed via immunoassays in all individuals;
 The levels are given in the columns with headings Ab14, HCC1, IMP1, KOC, MDM2, NPM1, P16, P53, P90, RaIA, and Survivin. (These are the designations of the 12 TAAs, all of which were thought to be potentially predictive of )
 The underlying question is whether we can effectively discriminate between the cases and controls on the basis of the levels of these TAAs. This is sometimes termed a classification problem in the statistics and biostatistics literature: we wish to classify individuals as normal or cancer patients on the basis of their TAA
 We will examine these data in Statdisk. Use the MHA610_Week 3_Assignment_Data.CSV file (available in the classroom) to upload this information into
 Serum samples were taken from the controls and from the cases at time of diagnosis of HCC. Levels of a panel of 12 tumorassociated antigens (TAAs) were assessed via immunoassays in all individuals;
 If you choose the latter option, Start Statdisk, then choose File>Open and select the .csv file you created (unless you changed the name, it ought to be csv)
 Check the box that specifies the data contains column titles or headers, select Comma separated for how the data are delimited, click finish, and the dataset will have been successfully imported into
 NOTE: you may want to read through the remainder of the assignment first, before proceeding with this step. This may save you some work afterwards!
 Note that Statdisk operates on columns of data, and that both cases and controls are contained in each column of TAA levels. It will be necessary to separate the cases and controls for further analyses. This can be accomplished either by copying within Statdisk or by reverting to the original Excel workbook, copying in Excel, exporting as a .csv file, and then importing into Statdisk. (Don’t say you weren’t warned!)
 Explain if you would characterize any or all of the TAA levels as approximately normally distributed for the controls and for the
 Provide plots and statistics in support of your
 Explain if any of the TAAs are useful for discriminating between the cases and
 Provide plots and statistics in support of your
 All writing assignments should be at least 250500 words in APA format supported by scholarly
BONUS. In the above, we pooled all cases together. Summarize whether you think this is legitimate or whether the levels of any of the TAAs appear to differ significantly between the cases from China and the cases from Korea. Provide evidence in support of your conclusion.
Week Four
Course Content
To be completed during the fourth week of class
Overview
Activity  Due Date  Format  Grading Percent 
Exploring tTests and Confidence Intervals for Continuous Data  Day 3 (1st post)  Discussion  4 
A Crossover Clinical Trial  Day 7  Assignment  8 
Weekly Learning Outcomes
This week students will
 Explain general principles of hypothesis
 Calculate z tests and t
Introduction
In this week, you will learn general principles of hypothesis testing, including, type I and type II errors, significance level and power. You will then be introduced to inferential statistics arising in hypothesis testing, including, notably, z tests and t tests. You will learn how to undertake z tests and t tests for single populations and for comparisons of two populations. You will explain and give examples of their use.
Required Resource
Text
Triola, M.M., & Triola, M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 7: Hypothesis Testing with One
 After reading the chapter, review your grasp of the material in Chapter 7 by solving the oddnumbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 7. Solutions to these problems are given at the end of the
 Chapter 8: Inferences from Two
 After reading the chapter, review your grasp of the material in Chapter 8 by solving the oddnumbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 8. Solutions to these problems are given at the end of the
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Materials
 Koziol, J. (2014). MHA610_Week 4_Assignment_Crossover_Trial_Data [Excel file].
 Koziol, J. (2014). MHA610_Week 4_Assignment_Crossover_Trial_Data [Statdisk file].
Recommended Resource
Multimedia
Koziol, J. (Producer). (2014). MHA610 Week 4 Assignment [Video file]. Retrieved from
 This screencast helps explain the Week Four
Discussion
Participate in the following discussion:
Exploring tTests and Confidence Intervals for Continuous Data. 1^{st} Post Due by Day 3. In this discussion, we will investigate ttests and confidence intervals for continuous data. To do this, we will revisit the TAA data that you studied in the Week Three assignment.
You may recall from the Week Three assignment that you have available data on 12 TAAs, from 90 normal individuals (controls) and 160 hepatocellular carcinoma patients (cases). These data are in the Excel file MHA610_Week 3_Assignment_data.xls (available in the classroom); the levels of the 12 TAAs are given in the columns with headings Ab14, HCC1, IMP1, KOC, MDM2, NPM1, P16, P53, P90, RaIA, and Survivin.
 First, randomly select three of the 12 TAAs for further
 Next, perform two sample ttests for comparing the levels of each of your three TAAs between the cases and the
 Then, Use the ttests to order the TAAs in terms of relative ability to discriminate between the cases and controls, from best to worst discriminator. Is this ordering helpful if you want to select a subset of TAAs to discriminate between cases and controls? Assume for now that you can judge the relative merits of your three TAAs by the magnitudes of their respective twosided pvalues from the two sample ttests, so that your best discriminator is the TAA with the smallest pvalue.
 Lastly, Construct and report 95% confidence intervals for the mean level of your best TAA discriminator in the controls, the mean level of your best TAA discriminator in the cases, and the difference in mean levels (cases – controls). Discuss whether your confidence intervals are concordant with the ttests.
Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Do your ttests and ordering coincide with those of your colleague? If not, why? Do you agree with your colleague’s assessment of the usefulness of the ordering to discriminate between cases and controls? Why? Did your best TAA discriminator agree with that of your colleague? If not, why not? Are your confidence intervals identical to those of your colleague? If not, can you determine where a mistake was made? All initial and peer postings should be at least 250500 words in APA format and supported by scholarly sources.
Assignment
A Crossover Clinical Trial. Due by Day 7. Background: Randomized controlled trials are the gold standard for clinical research. Biostatisticians are heavily involved in such trials, from the planning stage (e.g., sample size and power considerations) through the analysis of findings (e.g., estimation of treatment effects). In this assignment, we will examine treatment outcomes in a two treatment, two period (twobytwo) crossover design.
In the twobytwo crossover design, subjects are randomly assigned to one of two groups. The first group initially receives treatment A in the first period of the trial followed by treatment B in the second period of the trial, and the other group initially receives treatment B in the first period of the trial followed by treatment A in the second
period. The response, or primary endpoint of the trial, is measured at least twice in each patient, at the end of the first period and again at the end of the second period. Each patient is his or her own control for comparison of treatment A and treatment B.
Crossover designs are used when the treatments alleviate a condition, rather than effect a cure. After the response to the treatment administered in the first period is measured, there is a washout period in which any lingering effect of the treatment administered in the first period dissipates, and then the response to the second treatment is measured.
An advantage of a crossover design is increased precision afforded by comparison of both treatments on the same subject, compared to a parallel group clinical trial (in which patients are randomized onto different treatment arms). Disadvantages of crossover trials are complex statistical analyses of findings (typically, by complex analyses of variance), potential difficulties in separating the treatment effects from the time effect (patients may respond differently in the first period and the second period), and the carryover effect (the effect of the treatment given in the first period may not totally wash out, but may carry over onto the second period).
We will give a simple example of a twobytwo crossover trial, and undertake analyses of the trial results via t
tests. The trial was meant to assess the efficacy of a new experimental therapy for interstitial cystitis (IC). Interstitial cystitis is a chronic bladder condition affecting primarily women; symptoms include bladder pressure and pain, urgency, and occasionally pelvic pain. The new experimental therapy was meant to reduce pain and urgency relative to standard therapy. A total of 24 patients were enrolled in the trial; trial results are given in the Excel workbook titled MHA610_Week 4_Assignment_Crossover_Trial_Data.xls (available in the classroom).
Open the workbook, and examine the worksheet. The first row contains column headings, and the next 24 rows represent the 24 patients entered into the trial. The group one patients received experimental therapy in the first period of the trial followed by standard therapy in the second period of the trial. The group two patients received standard therapy in the first period of the trial followed by experimental therapy in the second period.
The primary outcome of the trial was an area under the curve (AUC) calculation of relative pain and urgency the patient experienced following therapy: the smaller the AUC, the less severe the patient’s pain and urgency.
AUC_period1 denotes each patient’s AUC during the first period of the trial, and AUC_period2 denotes the
patient’s AUC during the second period of the trial. The column headed Rx denotes the treatment each patient received during the first period of the trial.
 We will first test for carryover
 The t test formulation for the test for carryover proceeds as follows: calculate the total (sum) of the AUC_period1 and AUC_period2 values for each patient in group one (12 patients) and separately for each patient in group two (12 patients).
 The test for carryover is the two sample t test for assessing whether these AUC totals differ significantly between group one and group two under the assumption that the variances of the AUC totals in the two groups are
 Calculate the sample means and standard deviations for the AUC totals for each group, and perform the two sample t Analyze whether there is a significant carryover effect in this clinical trial.
 We will next test for treatment
 The t test formulation for assessing treatment effects proceeds as follows:
 Calculate the difference of the AUC values for each patient in group one, that is, the 12 individual AUC_period1 – AUC_period2 values, and similarly calculate for each patient in group
 If there is no treatment effect, one would expect the AUC_period1 and AUC period 2 values to be similar, except perhaps for an offset due to period effects; we need to account for potential period effects when we compare the group one and group two AUC
 It turns out that the ttest for a treatment effect is the two sample t test for assessing whether these AUC_period1 – AUC_period2 differences differ significantly between group one and group two, under the assumption that the variances of the AUC differences are the same in the two
 Calculate the sample means and standard deviations for the AUC differences as defined above in each group, and perform the two sample t Analyze whether there a significant treatment effect in this clinical trial.
 The t test formulation for assessing treatment effects proceeds as follows:
Here’s an informal explanation of this t test. Consider the following schematic representation of the twobytwo crossover trial.
Group  Period One  Period Two 
1. AB Sequence  Treatment A + Period One  Treatment B + Period Two 
2. BA Sequence  Treatment B + Period One  Treatment A + Period Two 
In this representation, Treatment A is the direct effect of treatment A on each patient’s response (AUC value) and similarly for Treatment B; Period One is the effect of period one on each patient’s response and similarly for Period Two. (We are assuming there are no carryover effects.)
Now, consider first the individuals in group one. During Period One, their responses, (i.e., AUC_period1 values), are estimating effects due to treatment A and period one. During Period Two, their responses (i.e., AUC_period2 values) are estimating effects due to treatment B and period two. So when we take the average of the group one AUC_period1 – AUC_period2 values, (let’s call this average ), we have a combined estimate of the effects (Treatment A – Treatment B) + (Period 1 – Period 2).
Next, consider the individuals in group two. When we take the average of the group two AUC_period1 – AUC_period2 values (let’s call this average y), we have a combined estimate of the effects (Treatment B – Treatment A) + (Period 1 – Period 2).
Lastly, consider the random variable Z = – . This random variable estimates solely the quantity (Treatment A
– Treatment B); the period effects (Period 1 – Period 2) cancel out. Under the null hypothesis of no treatment effects, (Treatment A – Treatment B) = 0, so the mean of Z should be zero. The two sample t test for treatment effects outlined above is equivalent to the t test of whether the mean of Z equals zero. Note that since we have equal numbers of patients in group one and group two, there was no need to take sample means when we constructed our t test; but in general, with unequal sample sizes, you should work with sample means when performing the t tests.
Briefly summarize your findings from this trial. Explain whether the new treatment appears promising in 500 words in APA format supported by scholarly sources.
BONUS. Graphical representations of the findings can be quite illuminating. As a bonus, you are asked to prepare graphical representation(s) of the data. For example, you might prepare a simple plot of mean responses (mean
AUC values) for each treatment arm and for each period. Or, you could give patient profile plots of individual AUC values by period and treatment. Describe whether histograms, boxplots, or scatter plots would work with these data. If you assume that there are no significant carryovers or period effects in this trial, explain how you would display the treatment effects in a 250 words in APA format supported by scholarly sources.
Week Five
Course Content
To be completed during the fourth week of class
Overview
Activity  Due Date  Format  Grading Percent 
Graphs  Day 3 (1st post)  Discussion  4 
Brain Size and Intelligence  Day 7  Assignment  8 
Weekly Learning Outcomes
This week students will
 Distinguish between correlation and regression with multivariable
 Apply univariate and multiple regression analyses to
 Evaluate chisquare tests for goodness of fit with multinomial
 Evaluate chisquare tests for contingency table
Introduction
You will be introduced to the notions of correlation and regression: you will utilize these techniques, and then you will evaluate and interpret results of your analyses. You will learn that correlation does not imply causation; whereas in regression, the notions of dependent variable and independent variable connote more of a causeandeffect association.
You will learn about goodnessoffit tests for multinomial distributions, and chisquare tests for contingency tables. These are statistical tests for discrete data and are also meant to reinforce the concept that different statistical procedures need to be utilized in research, depending on what study questions are asked, and what data are available to you for analysis and assessment.
Required Resources
Text
Triola, M.M., & Triola M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 9: Correlation and
 After reading the chapter, review your grasp of the material in Chapter 9 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 9. Solutions to these problems are given at the end of the
 Chapter 10: Multinomial Experiments and Contingency
 After reading the chapter, review your grasp of the material in Chapter 10 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 10. Solutions to these problems are given at the end of the
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Materials
 Koziol, J. (2014). MHA610_Week 5_Discussion_regression_data [Excel file].
 Koziol, J. (2014). MHA610_Week 5_Discussion_regression_data [Statdisk file].
 Koziol, J. (2014). MHA610_Week 5_Assignment_Brain_Data [Excel file].
 Koziol, J. (2014). MHA610_Week 5_Assignment_Brain_Data [Statdisk file].
Recommended Resource
Multimedia
Koziol, J. (Producer). (2014). MHA610 Week 5 Assignment [Video file]. Retrieved from
 The video helps explain the Week Five
Discussion
Participate in the following discussion:
Graphs. 1^{st} Post Due by Day 3. It is important to look at data in a graphical form. Patterns are the essence of data exploration, and the eye’s ability to discern forms and patterns makes visual display integral to the process. The visual display of quantitative information can help us see connections and relationships in the data, which are oftentimes difficult to detect in tables of numbers. We should look at data in a graphical form, and not rely solely on computational or statistical metrics.
In this discussion, we will explore graphs in linear regression. Our data are taken from an article by Frank Anscombe in a 1973 article in The American Statistician, which discusses scatterplots in relation to regression analyses.
First, download the dataset MHA610_Week 5_Discussion_regression_data.xls (available in the classroom). This is a simple Excel workbook, with data on one sheet. There are eight columns of data, with headings X1, Y1, X2, Y2, X3, Y3, X4, Y4. Import the data into Statdisk using the MHA610_Week 5_Discussion_regression_data.csv file (available in the classroom), and perform the following analyses.
 Calculate the regressions of Y1 on X1, Y2 on X2, Y3 on X3, and Y4 on X4, and compare the results (summary statistics). Explain what, if anything, you find unusual about these
 Plot each set of data, along with the fitted regression line. Describe what the graphs tell you about the relationships between the X’s and the Y’s.
 Explain what lessons you draw from this
Place the summary statistics and the plots in a separate Word document and attach that document to your initial post. Address the questions in the body of your initial discussion post.
Guided Response: Respond to at least two of your peers by Day 7, 11:59PM. Do your summary statistics and plots agree with those of your colleague? If not, how and why do they differ? Did your colleague’s conclusions broaden your perspective on linear regression? All initial and peer postings should be at least 250500 words in APA format supported by scholarly sources.
Assignment
Brain Size and Intelligence. Due by Day 7. Background: Is brain size a measure of intelligence? Brain size tends to vary with body size: for example, sperm whales and elephants have brains up to five times as massive as
human brains. So across species, brain size is not a perfect measure of intelligence. And within species, the underlying organization (complexity of connections) and molecular activity of the brain are likely to be more directly associated with intelligence than mere size.
In this assignment, we will investigate relationships between physiological measures of the brain, and intelligence. Download and open the Excel workbook, MHA610_Week 5_Assignment_Brain_Data.xls (available in the classroom). The workbook contains data on 20 youths, in rows two through 21. Eight variables (the columns) were recorded on each individual; the column headings are given in row one. The column headings are as follows:
IQ the individual’s IQ
ORDER the birth order (1 = firstborn, 2 = not firstborn) PAIR marker for genotype
SEX gender, 1 = male, 2 = female
CCSA corpus callosum surface area (in cm2) HC head circumference (in cm)
TOTSA total brain surface area (in cm2) TOTVOL total brain volume (in cm3) WEIGHT body weight (in kg)
The neuroanatomical measures CCSA, TOTSA, and TOTVOL were determined from magnetic resonance imaging (MRI) of the brains, followed by automated image analyses of the scans. The corpus callosum is a bundle of neural fibers beneath the cortex, connecting the left and right cerebral hemispheres of the brain; it is the communication highway between the two hemispheres. (The more lanes to the highway, the faster the traffic ought to flow.)
The following questions can be answered in Excel, StatDisk, or other statistics software you may have available.
 Examine all of the pairwise correlations among the physiological measures CCSA, HC, TOTSA, TOTVOL, and WEIGHT. Which two variables have the strongest correlation? Report the correlation, and plot the scattergram for these two variables. Also, report the correlation and plot the scattergram for the two variables that have the weakest
 Determine whether the physiological parameters CCSA, HC, TOTSA, TOTVOL, and WEIGHT are significant predictors of That is, run a sequence of univariate regressions, with IQ as the dependent variable, and the physiological parameters as the independent variables. Report the best univariate regression with statistics and a graph of the regression. Describe whether IQ can be accurately predicted from any of these brain measures individually or in combination.
BONUS. Power law distributions, that is, functional relationships between two variables in which one variable is roughly a power of the other, are often used to model physiological data. One of the oldest power laws, the squarecube law, was introduced by Galileo in the 1600’s: empirically, the squarecube law states that as a shape grows in size, its volume grows faster than its surface area. We shall investigate the squarecube law with two variables from our dataset, CCSA and TOTVOL. If CCSA varies with some power of TOTVOL, for example,
CCSA = k * (TOTVOL) (k is an unknown constant here), then a simple way of estimating the exponent is via
linear regression: take log(CCSA) as the dependent variable and log(TOTVOL) as the independent variable; the fitted regression coefficient (slope) is an estimate of the exponent. (Do you see why this is true?) Perform this linear regression, and report your results. Describe whether the regression coefficient is significantly different from 2/3. (The 2/3rd power law occurs often in nature.)
Week Six
Course Content
To be completed during the fifth week of class
Overview
Activity  Due Date  Format  Grading Percent 
Health and Nutritional Status  Day 3 (1st post)  Discussion  4 
Week Six Quiz  Day 6  Quiz  8 
Final Project  Day 7  Assignment  18 
Weekly Learning Outcomes
This week students will
 Explain analysis of variance as a generalization of two sample z and t
 Apply oneway and twoway analyses of variance to
Introduction
In this week, you will be introduced to a class of statistical procedures known collectively as analysis of variance (ANOVA). In its basic form, ANOVA is a statistical procedure for assessing whether the means of several groups are equal. So, for example, a oneway ANOVA is a straightforward generalization of t tests when you are presented with more than two groups of observations. Twoway and higherorder ANOVAs allow simultaneous inference on separate groupings and are qualitatively similar to multivariable linear regression in concept and aims.
Required Resources
Text
Triola, M.M., & Triola M.F. (2006). Biostatistics for the Biological and Health Sciences. Boston, Ma: Pearson Education, Inc.
 Chapter 11: Analysis of Variance
o After reading the chapter, review your grasp of the material in Chapter 11 by solving the odd numbered questions in the Review Exercises and the Cumulative Review Exercises at the end of Chapter 11. Solutions to these problems are given at the end of the text.
Triola, M. M., & Triola M. F. (2006). [Student companion website].
Boston, MA: Pearson Education, Inc. Retrieved from:
Supplemental Materials
 Koziol, J. (2014). MHA610_Week 6_Discussion_NNYFS_workingdata [Statdisk file].
 Koziol, J. (2014). MHA610_Week 6_Discussion_NNYFS_workingdata [Excel file].
Websites
Center for Disease Control and Prevention. (2014). Retrieved from
 This website houses data that is useful in the Health and Nutritional Status discussion for this week. Center for Disease Control and Prevention. (2014). Retrieved from
 This website will assist you with the final project for this
Recommended Resources
Multimedia
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+births%29/0_zipwy4i7
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+births%29/0_ignho54w
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+brainsize%29/0_qhhcxu1d
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+crossover%29/0_srbv0wj1
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
+HCCtest%29/0_4qwt7r8z
Koziol, J. (Producer). (2014). [Video file]. Retrieved from
Discussion
Participate in the following discussion:
Health and Nutritional Status. 1^{st} Post Due by Day 3. Since 1971, the National Center for Health Statistics had been assessing the health and nutritional status of both children and adults in the United States, through periodic National Health and Nutritional Examination Survey (NHANES) surveys. These surveys are an invaluable resource to epidemiological and public health research; the surveys can be used to determine the prevalence of major diseases and risk factors, to assess nutrition and health promotion, and to guide public health policy.
All initial and peer postings should be at least 250500 words in APA format supported by scholarly sources.
In 2012, the NHANES National Youth Fitness Survey (NNYFS) was conducted in conjunction with NHANES to obtain physical activity and fitness levels of U.S. youths aged 3 through 15. Initial data from the NNYFS were released in 2013 and serve as the basis for this discussion problem.
Begin by downloading the Excel file MHA610_Week 6_Discussion_NNYFS_workingdata.xls (available in the classroom). This workbook was created by merging two datasets from the NNYFS: the and the For the purposes of this discussion, many variables were eliminated from the original datasets, as well as observations with missing data on height and weight. The Excel workbook thus consists of one worksheet, with 1576 rows (the first row contains headers, and the next 1575 rows are observed values for the participants), and 11 columns of variables. The columns in the Excel file are the following:
SEQN the respondent sequence number (index for all the files)
RIAGENDR gender of the participant, 1 = male, 2 = female RIDRETH1 race/Hispanic origin:
1 = Mexican American 2 = other Hispanic
3 = nonHispanic white 4 = nonHispanic black 5 = other
RIDEXAGY age in years at time of physical exam INDHHIN2 annual household income, categorized
INDFMIN2 annual family income, categorized INDFMPIR ratio of family income to poverty, 0 to 5 BMXWT weight, in kg
BMXHT height, in cm
BMXBMI body mass index (kg/m^2)
BMDBMIC BMI category: 1 = underweight
2 = normal weight 3 = overweight
4 = obese
. = missing
For purposes of this discussion, you are asked to answer the three following questions:
 Does BMI vary significantly between boys and girls?
 Does BMI vary significantly among the racial/ethnic groups?
 Is there any trend to BMI with age?
Comments:
There are several ways to address these questions. For example, you might take BMXBMI as your outcome variable of interest: it is continuous, so you could then perform a twosample t test for (1), a one way analysis of variance for (2), and a simple regression analysis (with age as the predictor variable) for (3).
Alternatively, you might reduce the problem to consideration of binomial probabilities: for example, you could classify everyone as obese or not obese (or maybe, overweight/obese vs underweight/normal), then compare binomial outcomes for (1) and (2) (z tests with the normal approximation or contingency tables), and conduct a t test on ages for (3).
Neither approach is wrong—the key is interpreting your findings!
If you prefer to do the analyses in Statdisk, there is a file, MHA610_Week 6_Discussion_NNYFS_workingdata.csv (available in the classroom), ready to be read into Statdisk. (It’s the original Excel workbook, saved as csv.) No need to go through any additional steps, unless you wish to restructure the data in Excel.
Incidentally, the income variables are not needed for these questions, but as a bonus, you might want to investigate whether obesity is related to socioeconomic status (as reflected by family income).
Guided Response: Respond to at least two of your peers who chose a different of analysis that you by Day 7, 11:59PM. Did you arrive at the same conclusions as your colleague even though you chose different methods? If so, which method do you think is preferable and why? If not, which method do you believe produces more credible results and why? (You might consult the text to support your argument.). All initial and peer postings should be at least 250500 words in APA format supported by scholarly sources.
Quiz
Week Six Quiz. Due by Day 6. Complete this quiz on the readings from Weeks Four through Six. It may be helpful to review the odd numbered questions from your text that you completed in Weeks Four, Five, and Six. There is no time limit to this quiz. You will have two attempts to take the quiz. If multiple attempts are made, eCollege will take the last grade earned not the highest grade earned.
Final Project
Final Project. Due by Day 7. In this final assignment, we will revisit datasets that we have utilized in previous assignments, but with new objectives.
 In the Week One assignment, you looked at mortality in your particular state, with two different metrics: the first was numbers of deaths, and the second was years of life lost. For this question, return to the original dataset, but this time first pool all cancer causes of death together, so that cancer constitutes the only category for cause of death. Then, repeat your analyses from Week One. How do your conclusions change?
 In the Week Two assignment, you looked at sex ratios for births in your
 Take the data you have assembled from the second part of your Week Two assignment, namely, numbers of firstborn boy and girl births in your state between 2007 and 2012, separately by racial group (i.e., American Indians, Asians, Blacks, and Whites). Form a twobyfour contingency table from these data: the two row categories are female (girl) and male (boy), and the four column categories are the four racial groups. Calculate the chisquare statistic from this contingency table, and interpret the
 Return to the website, and obtain the numbers of births in your state between 2007 and 2012, by month. (Disregard gender, or race, or birth order—you want all births). Calculate a chisquare statistic to assess whether there is any seasonality to births. (Your null hypothesis is that births should be equally likely to occur in any of the 12 months. We are ignoring the varying lengths of the months to simplify calculations.) How would you interpret your findings? Explain in 500 words in APA format supported by scholarly
BONUS: Give a graphical representation of your findings for this portion highlighting what you consider significant.
 In the Week Three assignment, you were given levels of tumorassociated antigens in a sample of 90 normal (noncancer) individuals, and 160 hepatocellular carcinoma (HCC) patients. Here is a proposed diagnostic test for HCC:
 For each individual, calculate a numerical score:
 score = 3.95 + 10.7 * HCC1 – 4.14 * P16 + 13.95 * P53 + 28.92 * P90 + 6.48 * survivin
 For each individual, calculate a numerical score:
 (This equation was derived from logistic )
 If this score is positive (i.e., > 0), diagnose this individual as an HCC patient; if this score is negative (i.e., <0), diagnose this individual as normal (i.e., noncancer).
 Apply this rule to the entire cohort of 250 individuals. Report the sensitivity of this rule, the specificity, the false positive rate, the false negative rate, and the overall accuracy. Do you think the score function provides a good diagnostic test for HCC?
 In the Week Four assignment, we considered a simple twobytwo crossover trial of a new experimental treatment for interstitial cystitis. We calculated t tests for carryover and treatment effects, but we have not yet considered period effects. It is unlikely that there are any period effects in this trial, but we may want to test this formally. If there were a period effect, then patient responses under either treatment would likely be systematically higher in one period than the other. (Here’s an analogy: Think of taking the same test twice. You would likely perform better on the test the second time, since you have learned from your experience of taking the first test.) Explain how you would devise a t test for assessing a period effect in this trial. (Hint: look at the explanation of the t test for treatment effects given in the Week Four assignment. There, we based the test on the random variable X – Y. Suppose we look instead at X + Y?)
 In the Week Five assignment, you investigated measures of brain size and intelligence in a sample of 20 youths. A potential shortcoming of your prior analyses is that you did not take into account all available information in the dataset, in particular, gender. Answer the following questions and explain your answers:
 Do any of the physiologic variables CCSA, HC, TOTSA, TOTVOL, and WEIGHT differ significantly between males and females?
 Do IQs differ significantly by gender?
 Undertake a paired analysis of IQs, in order to assess whether firstborns have higher IQs than nonfirstborns. In this regard, there are 10 pairs of related youths, as denoted by the variable PAIR.
Completing the Final Project
The Final Project:
 Must include a title page with the following:
 Title of paper
 Student’s name
 Course name and number
 Instructor’s name
 Date submitted
 Must contain 5 sections, each starting on a new page; the section headings can be called Question 1, Question 2, Question 3, Question 4, Question 5
 Each section must have two subsections, with headings Results and
 The Results subsections must include your analyses of that particular question. Your results may include figures, tables, and statistical analyses, laid out in a logical
 The Conclusions subsections must contain your inferences relative to that question based on your results, and any discussion points you wish to
 Length of the Results subsections must vary by question, but should encompass all of your relevant
 Length of the Conclusions subsections typically will not exceed one
 If you have used any external references (e.g., the text), you should include a separate reference page, formatted according to APA style as outlined in the Ashford Writing
Course Map
The course map illustrates the careful design of the course through which each learning outcome is supported by one or more specific learning activities in order to create integrity and pedagogical depth in the learning experience.
Learning Outcome 
Week 
Activity 
1. Apply basic statistical principles for describing, analyzing, and interpreting health data.  1
1 2 2 3 3 3 4 4
5
5
6 6 6 
§ U.S. Mortality Rates Assignment
§ Hospital Patient Data – Discussion § Sex Ratios Assignment § Games of Chance – Discussion § Immune Responses Assignment § Confidence Intervals – Discussion § Week Three Quiz § A Crossover Clinical Trial Assignment § ttests and Confidence Intervals for Continuous Data – Discussion § Brain Size and Intelligence Assignment § Graphs with Linear Regression – Discussion § Final Project – Assignment § Week Six Quiz § Health and Nutritional Status – Discussion 
2. Apply statistical methods of estimation and hypothesis testing in biostatistics and epidemiology.  2
2 3 3 
§ Sex Ratios Assignment
§ Games of Chance – Discussion § Confidence Intervals – Discussion § Immune Responses Assignment 
3  § Week Three Quiz
§ A Crossover Clinical Trial Assignment § ttests and Confidence Intervals for Continuous Data – Discussion § Graphs with Linear Regression – Discussion § Final Project Assignment § Quiz Six Quiz § Health and Nutritional Status – Discussion 

4  
5  
5 

6 

6  
6  
3. Analyze relationships between quantitative  5  § Brain Size and Intelligence Assignment
§ Graphs with Linear Regression – Discussion § Final Project Assignment § Week Six Quiz § Health and Nutritional Status – Discussion 
variables using correlation and linear  
regression.  5  
6 

6  
6  
4. Evaluate health care delivery and services  1  § U.S. Mortality Rates Assignment
§ Sex Ratios Assignment § Immune Responses Assignment § A Crossover Clinical Trial Assignment § ttests and Confidence Intervals for Continuous Data – Discussion § Health and Nutritional Status – Discussion § Brain Size and Intelligence Assignment § Final Project Assignment 
using epidemiological data and appropriate  2  
statistical methods.  3  
4  
4  
5 

5 

6 
5. Communicate the findings and implications from statistical analyses to health care administration.  1
1 2 3 3 4
4 5
6 6 
§ U.S. Mortality Rates Assignment
§ Hospital Patient Data – Discussion § Sex Ratios Assignment § Immune Responses Assignment § Confidence Intyervals – Discussion § ttests and Confidence Intervals for Continuous Data – Discussion § A Crossover Clinical Trial Assignment § Brain Size and Intelligence Assignment § Final Project Assignment § Health and Nutritional Status – Discussion 
MHA610 Alignment Map
The alignment map illustrates the careful design of the course through the alignment of course learning outcomes with both program learning outcomes and professional standards, to ensure integrity and pedagogical depth in the learning experience. MHA610 Introduction to Biostatistics Assignments and DQs
Course Learning Outcome  Program Learning Outcome  Professional Standards (ACHE) 
1. Apply basic statistical principles for describing, analyzing, and interpreting health data.  PLO8: Apply problem solving approaches in the resolution of health care services.  Knowledge of the Healthcare Environment
The understanding of the healthcare system and the environment in which healthcare managers and providers function. 
2. Apply statistical methods of estimation and hypothesis testing in biostatistics and epidemiology.  PLO4: Utilize health care information technology and statistical reasoning in organizational planning and decisionmaking.  Business Skills and Knowledge
The ability to apply business principles, including systems thinking, to the healthcare environment. 
3. Analyze relationships between quantitative variables using correlation and linear regression.  . PLO4: Utilize health care information technology and statistical reasoning in organizational planning and decisionmaking.  Business Skills and Knowledge The ability to apply business principles, including systems
thinking, to the healthcare environment. 
4. Evaluate health care services using epidemiological data and appropriate statistical methods.  PLO4: Utilize health care information technology and statistical reasoning in organizational planning and decisionmaking.  Knowledge of the Healthcare Environment
The understanding of the healthcare system and the environment in which healthcare managers and providers function. 
Course Learning Outcome  Program Learning Outcome  Professional Standards (ACHE) 
5. Communicate the findings and implications from statistical analyses to health care administration.  PLO8: Apply problem solving approaches in the resolution of health care services.  Communication and Relationship Management
The ability to communicate clearly and concisely with internal and external customers, establish and maintain relationships, and facilitate constructive interactions with individuals and groups. 