MATH 1308 Signature Statistics Inquiry

Questions:

Course Code: MATH 1308

University: The University Of Texas At Arlington

Country: United States

Country: United States

Questions:

Describe your data by explaining what variable or variables you are summarizing/displaying and why you chose the variable(s). Include in your description your reasons for how you decided to summarize and/or display your data in the way in which you did.

• Provide at least 3 different numerical summaries and/or displays of your data. Some examples of numerical summaries include mean, median, standard deviation, 5-number summary, correlation coefficient, and/or linear regression equation. Some examples of displays include pie chart, Pareto chart, frequency/relative frequency distribution, histogram, stem-and-leaf plot, and/or scatter plot.

• If you collect your own data, choose a topic of great interest to you personally. Your topic may address a social, political, health, educational (or etc.) issue of interest for a specified population of interest. Your topic should be something which can be tested by collecting a sample or doing an experiment. Do not limit your inquiry to such a narrowly defined population that anyone could collect a census to answer the question(s) definitively; the inquiry should lead you to make an inference about a population. Use a random sampling technique to collect data for an observational study, or design an experiment to test a claim. You may use public social media polls, conduct campus surveys, etc. Perhaps you will purchase randomly selected popular candy to determine a color frequency, or purchase and measure pinto beans after soaking in various solutions! Suppose you are interested in the social media habits of college students, so you may ask, “How many social media accounts do you have?” You may consider if there is a difference between males and females in terms of a preferred social media type. Or perhaps you want to know what percentage of college students utilized math tutoring. The ideas are endless!

Answer:

Data Description:

The Data on body temperature by stat crunch is selected for statistical inquiry and analysis. The data has 3 variable among which one is categorical. The Gender variable has two categories of male and female. The other variables are body temperature and heart rate. The body temperature measured in Fahrenheit and the heart rate in bpm. The size of the data is 130.

Research Proposal:

One might be interested in the question whether the body temperate in the given sample differs from male to female. We might be interested in similar question regarding heart rate. Further we can check if there is any relation between temperature and heart rate.

Descriptive Statistics:

The following is the summary statistics for body temperature and heart rate. It seems there is no much difference between the body temp of male and female, which needs to be confirmed by the test. In heart rate, even though the means look to be near for male and female, there is huge difference in variances.

Summary statistics for Body Temp:Group by: Gender

Gender

n

Mean

Variance

Std. dev.

Std. err.

Median

Range

Min

Max

Q1

Q3

Female

65

98.393846

0.55277404

0.74348775

0.092218306

98.4

4.4

96.4

100.8

98

98.8

Male

65

98.104615

0.48825962

0.69875576

0.086669986

98.1

3.2

96.3

99.5

97.6

98.6

Summary statistics for Heart Rate:Group by: Gender

Gender

n

Mean

Variance

Std. dev.

Std. err.

Median

Range

Min

Max

Q1

Q3

Female

65

74.153846

65.694712

8.1052274

1.0053297

76

32

57

89

68

80

Male

65

73.369231

34.517788

5.8751841

0.7287269

73

28

58

86

70

78

The best graphical view of summary statistics, box plots are shown below.

The following is the histogram drawn for Body Temperature. The given skewness and kurtosis gives the extent the data deviates from standard normal.

Column

Skewness

Kurtosis

Body Temp

-0.0044191312

0.7804574

The following diagram is the histogram of the heart rate and its deviation from the standard normal is measured by skewness and kurtosis values mentioned below.

Summary statistics:

Column

Skewness

Kurtosis

Heart Rate

-0.17835296

-0.46302097

An easy way to see the normality of the data is to see the Q-Q plot. The Q-Q plot shows that the data are normally distributed which can be seen by the data points lying near to the line

A scatter plot is displayed below for Body Temperature Vs Heart Rate. It shows a positive correlation between the variables.

Correlation between Body Temp and Heart Rate is: 0.2536564(p-value=0.0036)

The pair plot above shows how Body Temp and Heart Rate are related in each gender.

Correlation between Body Temp and Heart Rate for Female Gender = 0.28693115(p value=0.0205)Correlation between Body Temp and Heart Rate for Male Gender = 0.19558938(p value=0.1184).

Inferential Statistics:

The first test is about checking whether there is significant correlation between the Body Temp and Heart Rate.

Null Hypothesis: There is no correlation between the Body Temp and Heart Rate.

Alternative Hypothesis: There is correlation between the Body temp and Heart Rate. We perform this test by using Pearson correlation test where we use t statistic. The p-value of the test is 0.0036 < 0.05 and we can infer that there is significant correlation between the variables ,Body Temp and Heart Rate variables.
When similar test is conducted with in each females and males, we got p-values as 0.021 and 0.118(>0.05) which means there is significant correlation among females and no significant correlation among males.

In the second test, we are interested to know whether the mean values of Body Temp are significantly different between males and females.

Null Hypothesis (Ho): There is no difference in the mean Body temperature of males and females.

Alternative Hypothesis (H1): There is difference in the mean Body temperature of males and females.

To test this hypothesis, we use t-test for 2 samples. The following is the output from the software. Normality assumption is satisfied by both the samples (Shapiro-Wilk Normality test).Equal variance assumption is also satisfied (Levene’s test).

The p-value of t test is 0.0239 < 0.05 which means we have enough evidence to reject Ho. or otherwise we can conclude there is difference in the mean Body temperature of males and females.
Two sample T hypothesis test:μ1 : Mean of Body Temp(Male)μ2 : Mean of Body Temp(Female)μ1 - μ2 : Difference between two meansH0 : μ1 - μ2 = 0HA : μ1 - μ2 ≠ 0(without pooled variances) Hypothesis test results:
Difference
Sample Diff.
Std. Err.
DF
T-Stat
P-value
μ1 - μ2
0.28923077
0.12655395
127.5103
2.2854345
0.0239
Shapiro-Wilk normality test results:
Sample
n
Stat
P-Value
var5
65
0.96797487
0.0902
var8
65
0.98940716
0.8545
Homogeneity of Variance results:Data stored in separate columns.Levene's Test for Homogeneity of Variance
Test Statistic
DF 1
DF 2
P-value
0.061118127
1
128
0.8051
In the second test, we are interested to know whether the mean values of Heart Rate are significantly different between males and females.
Null Hypothesis: There is no difference in the mean Heart Rate of males and females.
Alternative Hypothesis: There is difference in the mean Heart Rate of males and females
Again we use the t-test of samples and output from the software is given below. The assumption of normality is satisfied (Shapiro-Wilk normality test). The p-value of the test is 0.587 which means we have to accept the null hypothesis. We can conclude that there is no significant difference in the mean heart rates of males and females.
Two sample T hypothesis test:μ1 : Mean of Heart Rate(Male)μ2 : Mean of Heart Rate(Female)μ1 - μ2 : Difference between two meansH0 : μ1 - μ2 = 0HA : μ1 - μ2 ≠ 0
Hypothesis test results:
Difference
Sample Diff.
Std. Err.
DF
T-Stat
P-value
μ1 - μ2
0.78461538
1.2416645
116.70438
0.6319061
0.5287
Shapiro-Wilk normality test results:
Sample
n
Stat
P-Value
var6
65
0.97206506
0.1483
var9
65
0.98813546
0.7912
Regression Analysis:
Consider the model y = a + b*x + error
Where y is heart rate, x is body temperature.
We want to significance of the model.
So we are interested in two hypothesis
Null: b =0
Alternative: b0
Null: a = 0
Alternative: a 0
The following is the output from the software.
Simple linear regression results:Dependent Variable: Body TempIndependent Variable: Heart Rate Body Temp = 96.306754 + 0.026334549 Heart RateSample size: 130R (correlation coefficient) = 0.2536564R-sq = 0.064341571Estimate of error standard deviation: 0.71196889Parameter estimates:
Parameter
Estimate
Std. Err.
Alternative
DF
T-Stat
P-value
Intercept
96.306754
0.65770318
≠ 0
128
146.4289
<0.0001
Slope
0.026334549
0.0088763359
≠ 0
128
2.9668265
0.0036
Analysis of variance table for regression model:
Source
DF
SS
MS
F-stat
P-value
Model
1
4.4617613
4.4617613
8.8020594
0.0036
Error
128
64.883162
0.5068997
Total
129
69.344923
From the output, we can see that the constant (a) and slope (b) are significantly different from zero. The estimated line is given and residuals are normally distributed. We can see from the graphs that all the assumptions of the simple linear regression are satisfied.
References
Anderson, T. W., & Finn, J. D. (1996). The new statistical analysis of data. New York: Springer.
Mendenhall, W., & Sincich, T. (2003). A second course in statistics: Regression analysis. Upper Saddle River, NJ: Pearson Education.
Pretorius, T. B. (1995). Inferential statistics: Hypothesis testing and decision-making. Cape Town: Percept.
Rohatgi, V. K., & Saleh, A. K. (2015). An introduction to probability theory and statistics. Hoboken, NJ: John Wiley & Sons
