Due: July 2
You (and your partner, if you choose) have been hired as data analysts for a Major League Baseball team. They are interested in analyzing data about various factors related to performance on the baseball field.
Using Excel’s random number generator, select one team and month from the 2016 season. Note: you must use the given Excel template, otherwise your team and month numbers will recalculate in perpetuity. To ensure it does not, check that the Calculation Options (under the Formulas tab) is set to Manual.
In order to facilitate this process, the data has been grouped by months, though the team is obviously interested in drawing statistical conclusions for all seasons. Using the link below – with your team’s initials in place of XXX – locate the appropriate batting game logs for the month.
Copy the appropriate month’s data into the Excel template and use the appropriate functions and tools to complete the following analyses. You may rearrange the data as needed to simplify your analyses.
Calculate the Mean, Median, and Mode for the following game information for your assigned month:
· 2B: Number of Doubles hit in each game
· HR: Number of Home Runs hit in each game
· BB: Number of Walks drawn in each game
· SO: Number of Strikeouts in each game
· GDP: Number of Double Plays grounded into each game
· R: Number of Runs scored in each game
Calculate measures of variability for each variable: Range, Standard Deviation, and Coefficient of Variation.
Create a frequency distribution and histogram (with appropriate classes/widths if necessary) displaying runs scored per game across the month.
Calculate the correlation between the Dependent Variable – Number of Runs Scored – and each of the Independent Variables above.
Summarize your analyses in a report addressed to the manager of your team (you’ll have to look it up). Your report should include, but is not limited to…
· Your interpretation of the measures of Central Tendency and Variability. What do these values mean and which best describe the variables of interest?
· What can we learn from the frequency distribution and histogram?
· Which variable has the strongest relationship with the number of runs scored? Which has the weakest relationship? What is your interpretation of these relationships?
· What other variables would be valuable for drawing conclusions? Beyond the data presented, identify and analyze at least one variable beyond those given above. Why is this variable of interest? What conclusions can we draw from the statistical analysis of this additional variable?
· In light of all of these analyses, what specific recommendations do you have for the manager and your team?
Explanations/Rationale should be included for all analyses and conclusions. In addition to your report, you should include a copy of your Excel workbook including the completed analyses. Your report and Excel data file should be submitted to Blackboard. In addition, a printed copy of your report should be turned in during class. Reports should be double-spaced, 12-point Times New Roman.