ESM 250: Applied Statistical Concepts
Dr. Louis Rocconi
This handout demonstrates how to conduct the statistical analyses covered in ESM 250.
It is recommended that you use an R Project for each course or major assignment. This will avoid having to set a working directory and write long file paths. You can move your files anywhere. As long as the R Project .rproj file is included, there should be no issues.
To create an R Project, simply go to File > New Project…. If you want to create a new directory, select New Directory. On the next screen, choose New Project, locate where you want to put your project, and click Create Project. If you have a directory already and you just want to make it into a project, click Existing Directory.
To access a folder within your R Project folder, you can use
a simple path such as read.csv("data/data.csv")
. To access
a file outside of your R Project folder, you can use
..
: read.csv("../other_data/data.csv")
.
Packages are installed by typing
install.packages("package name")
in the R console. To
replicate the examples in this document, you will need to install the
following packages (Note: you only need to install packages one
time):
install.packages("janitor")
install.packages("effectsize")
install.packages("ggplot2")
Loading a package can be done with the following code:
library(package name)
. It is recommended to load all
packages at the top of your script file.
You can load many different data types using R. In this class, we will mainly use data stored as a CSV file:
CSV files with read.csv
from the base R package:
read.csv("name_of_file.csv")
read.csv("http://www.website.com/internet/file.csv)
Throughout this tutorial, we will use data from the High School and Beyond survey from the National Center for Education Statistics. A codebook for the data is provided in the following table.
Variable Name | Description | Values |
---|---|---|
id | identification number | Range: 1-200 |
female | gender | male, female |
race | race/ethnicity | white, african-amer, hispanic, asian |
ses | socioeconomic status | low, middle, high |
schtyp | school type | public, private |
prog | type of school program | general, vocational, academic |
read | Standardized measure of reading ability | Range: 28-76 |
write | Standardized measure of writing ability | Range: 31-67 |
math | Standardized measure of mathematics literacy | Range: 33-75 |
science | Standardized measure of science literacy | Range: 26-74 |
socst | Standardized measure of social studies literacy | Range: 26-71 |
The High School and Beyond data may be loaded in R using the following code:
dat <- read.csv("https://lrocconi.github.io//files//hsb2.csv")
Excel files with read_excel
from the
readxl
package (you may need to modify function defaults
for read_excel to work properly). :
read_excel("name_of_file.xlsx")
Personally, I like to convert Excel files to a CSV file and load the CSV file into R
The haven
package reads data from other statistical
software packages:
read_spss("name_of_file.sav")
or read_sav
reads SPSS data filesread_dta
reads Stata data filesread_sas
reads SAS data fileshaven
can also save a data set as an SPSS, Stata, or
SAS data file using the write_sav
, write_dta
,
and write_sas
functionsR is an object-oriented program language. That means that results are
stored as objects that you can retrieve and manipulate later. As such,
you save data and results as R objects. You can use =
or
<-
. However, <-
is recommended.
Note: c()
means “combine” and #
allows you to make comments in your code.
# character vector
phrase <- "Good luck in Stats"
#numeric vector
answers <- c(6*5, 6^2, 5*sqrt(23))
# a data frame
dataframe <- data.frame(gender = c("male", "female", "female", "male"),
age = c(18, 19, 19, 20))
# assign data to an object named dat
dat <- read.csv("https://lrocconi.github.io//files//hsb2.csv")
# assign results from a t-test to t_test_results
t_test_results <- t.test(read ~ female, data=dat)
The $
operator is used to subset or name an element of a
list. It is used to specify a specific variable from a dataframe.
|>
In some of the code below you will see the pipe operator:
|>
. This is a function that allows you to connect
multiple commands together without the use of multiple parentheses. It
stands for “and then”. You can easily connected multiple
commands in a single pipe chain.
Quickly view the column names (i.e., variables) in a dataframe with
the names
function:
names(dat)
## [1] "id" "female" "race" "ses" "schtyp" "prog" "read"
## [8] "write" "math" "science" "socst"
View the types of variables using str
:
str(dat)
## 'data.frame': 200 obs. of 11 variables:
## $ id : int 70 121 86 141 172 113 50 11 84 48 ...
## $ female : chr "male" "female" "male" "male" ...
## $ race : chr "white" "white" "white" "white" ...
## $ ses : chr "low" "middle" "high" "high" ...
## $ schtyp : chr "public" "public" "public" "public" ...
## $ prog : chr "general" "vocation" "general" "vocation" ...
## $ read : int 57 68 44 63 47 44 50 34 63 57 ...
## $ write : int 52 59 33 44 52 52 59 46 57 55 ...
## $ math : int 41 53 54 47 57 51 42 45 54 52 ...
## $ science: int 47 63 58 53 53 63 53 39 58 50 ...
## $ socst : int 57 61 31 56 61 61 61 36 51 51 ...
Variable types:
summary
from Base RFor numeric variables, this will return quartiles,
median, means, mix, max, and NAs. Use the $
dollar sign to
refer to a specific variable.
summary(dat$read)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.00 44.00 50.00 52.23 60.00 76.00
describe
from the psych
packageYou can load the psych
package and use describe to get
mean, sd, median, and other variables. Note: if you haven’t
installed the psych
package run
install.packages("psych")
before running code below.
psych::describe(dat)
## vars n mean sd median trimmed mad min max range skew kurtosis
## id 1 200 100.50 57.88 100.5 100.50 74.13 1 200 199 0.00 -1.22
## female* 2 200 1.46 0.50 1.0 1.44 0.00 1 2 1 0.18 -1.98
## race* 3 200 3.47 0.98 4.0 3.71 0.00 1 4 3 -1.68 1.35
## ses* 4 200 2.18 0.86 2.0 2.23 1.48 1 3 2 -0.36 -1.55
## schtyp* 5 200 1.84 0.37 2.0 1.93 0.00 1 2 1 -1.84 1.40
## prog* 6 200 1.73 0.84 1.0 1.66 0.00 1 3 2 0.55 -1.37
## read 7 200 52.23 10.25 50.0 52.03 10.38 28 76 48 0.19 -0.66
## write 8 200 52.77 9.48 54.0 53.36 11.86 31 67 36 -0.47 -0.78
## math 9 200 52.65 9.37 52.0 52.23 10.38 33 75 42 0.28 -0.69
## science 10 200 51.85 9.90 53.0 52.02 11.86 26 74 48 -0.19 -0.60
## socst 11 200 52.41 10.74 52.0 52.99 13.34 26 71 45 -0.38 -0.57
## se
## id 4.09
## female* 0.04
## race* 0.07
## ses* 0.06
## schtyp* 0.03
## prog* 0.06
## read 0.72
## write 0.67
## math 0.66
## science 0.70
## socst 0.76
We can use
describeBy(numeric_variable, group=grouping_variable)
from
the psych
package if you want the descriptive statistics
for individual groups:
psych::describeBy(dat$read, group=dat$prog)
##
## Descriptive statistics by group
## group: academic
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 105 56.16 9.59 55 56 11.86 34 76 42 0.1 -0.87 0.94
## ------------------------------------------------------------
## group: general
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 45 49.76 9.23 50 49.68 10.38 28 68 40 0.09 -0.61 1.38
## ------------------------------------------------------------
## group: vocation
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 50 46.2 8.91 47 45.52 7.41 31 68 37 0.56 -0.16 1.26
To compute specific descriptive statistics you can call the function
directly using mean
for mean, median
for
median, sd
for standard deviation, IQR
for
interquartile range, andmad
for median absolute
deviation.
mean(dat$read)
## [1] 52.23
sd(dat$read)
## [1] 10.25294
IQR(dat$read)
## [1] 16
mad(dat$read)
## [1] 10.3782
Note: There is no function to compute mode for a nominal variable in R. To get the mode (i.e., the most frequently occurring value), you need to save out a frequency table and sort the frequency table.
tab <- table(dat$read) #number of occurrences for each unique value
sort(tab, decreasing = TRUE) # sort highest to lowest
##
## 47 50 63 52 57 42 44 55 68 60 65 39 34 73 36 37 41 43 45 71 76 28 31 35 46 48
## 27 18 16 14 14 13 13 13 11 9 9 8 6 5 3 2 2 2 2 2 2 1 1 1 1 1
## 53 54 61 66
## 1 1 1 1
So 47 is the mode, or the most frequently occurring value for read.
table
in Base RIn Base R, you can use the table
function to print a
frequency table.
table(dat$prog)
##
## academic general vocation
## 105 45 50
However, notice that the table only give you frequency counts. If you
want the proportions in each category, you need to wrap the function in
prop.table
.
prop.table(table(dat$prog))
##
## academic general vocation
## 0.525 0.225 0.250
tabyl
from the janitor
packageThe tabyl
from the janitor
package creates
much nicer frequency tables and automatically includes a column for
proportion.
tabyl(dat$prog)
## dat$prog n percent
## academic 105 0.525
## general 45 0.225
## vocation 50 0.250
tabyl
can also get crosstabs or frequencies by
groups:
# janitor package
dat |>
tabyl(prog, female)
## prog female male
## academic 58 47
## general 24 21
## vocation 27 23
# Base R
table(dat$prog, dat$female)
##
## female male
## academic 58 47
## general 24 21
## vocation 27 23
Using the janitor
package, you can easily add row and
column totals like so:
dat |>
tabyl(prog, female) %>%
adorn_totals(c("row", "col"))
## prog female male Total
## academic 58 47 105
## general 24 21 45
## vocation 27 23 50
## Total 109 91 200
Or use adorn_
commands to include percentages, rounding,
etc:
dat |>
tabyl(prog, female) |>
adorn_percentages() |>
adorn_rounding(digits = 3)
## prog female male
## academic 0.552 0.448
## general 0.533 0.467
## vocation 0.540 0.460
There are two ways to main ways to graphs in R: Base R or
ggplot2
. I will use Base R to create a quick graph to
visualize my data and useggplot2
when I want more
customization options for my graph or when I want to create publication
quality graphs. Below I will show you both ways to create some common
graphs
In Base R, you can save your frequency table as an object and use the
barplot
function to graph the frequency distribution.
Let’s create a bar chart for the program type variable.
freq_prog <- table(dat$prog)
barplot(freq_prog)
You can add some customization to the chart using the following options.
freq_prog <- table(dat$prog)
barplot(freq_prog, col = "blue", main = "Students by Program Type", xlab = "Program Type", ylab = "Number of Students")
Let’s create the same graph but using the ggplot2
package. The main function from ggplot2 is named
ggplot
. The basic syntax for ggplot2 is as follows:
ggplot(data = data_frame) + geom_bar(aes(x = variable_name, fill = variable_name))
ggplot(data = dat) +
geom_bar(aes(x=prog, fill = prog)) +
labs(title = "Students by Program Type", x = "Program Type", y = "Number of Students")
To create a side-by-side bar chart using base R or ggplot.
# Base R
crosstab <- table(dat$female, dat$prog)
barplot(crosstab, beside = TRUE, col = c("pink", "lightblue"), legend = rownames(crosstab), main = "Students by Program Type and Gender")
# ggplot
ggplot(data=dat, aes(x = prog, fill = female)) +
geom_bar(position = "dodge") +
labs(title = "Students by Program Type and Gender", x = "Program Type", y = "Count") +
theme_minimal()
You can easily see frequencies by making a histogram with the
hist
command:
hist(dat$math)
Here is how to do the same thing using ggplot.
ggplot(data = dat) +
geom_histogram(aes(x = math), bins = nclass.Sturges(dat$math)) +
labs(title = "Histogram of Math Scores", x = "Math Score", y = "Number of Students")
The boxplot
function in Base R can be used to create
quick blockplots. You can also add a title and coloring using the “main”
and “col” arguments.
boxplot(dat$math, main = "Math Scores", col = "lightblue")
Here’s how you create the same boxplot in ggplot
ggplot(dat, aes(y = math)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Math Scores", y = "Math Score") +
theme_minimal()
Here is how to create side-by-side boxplots in Base R and ggplot.
# Base R
boxplot(dat$math ~ dat$female, col = c("pink","blue"))
# ggplot
ggplot(dat, aes(x = female, y = math, fill = female)) +
geom_boxplot() +
scale_fill_manual(values = c("pink", "blue")) +
labs(x = "Female", y = "Math Score") +
theme_minimal()
To create a frequency polygon in Base R, first save out a histogram
as an object. Then, plot the histogram with the option
type = "l"
.
hist_data <- hist(dat$read)
plot(hist_data$mids, hist_data$counts, type = "l", main = "Frequency Polygon of Reading Scores", xlab = "Reading Score", ylab = "Frequency")
lines(hist_data$mids, hist_data$counts, col = "blue")
Here is the same frequency polygon in ggplot
ggplot(dat, aes(x = read)) +
geom_freqpoly(binwidth = 5) +
labs(title = "Frequency Polygon of Reading Scores", x = "Reading Score", y = "Frequency") +
theme_minimal()
Use the plot
function in Base R to create a scatterplot;
then use the abline
function to add a regression line to
the plot.
plot(dat$read, dat$math)
abline(lm(math ~ read, data = dat), col = "blue")
In ggplot, create a scatterplot with a regression line using the following code:
ggplot(dat, aes(x = read, y = math)) +
geom_point() +
geom_smooth(method = lm)
## `geom_smooth()` using formula = 'y ~ x'
For additional information on colors in R: - https://bookdown.org/hneth/ds4psy/D-3-apx-colors-basics.html - https://r-graph-gallery.com/colors.html
For additional information on using ggplot2
: - https://rstudio.github.io/cheatsheets/html/data-visualization.html
- https://ggplot2.tidyverse.org/reference/
Use cor
function for most correlations (e.g., Pearson’s
r, Kendall’s \(\tau\),
Spearman’s \(\rho\)). Arguments include
“method” and “use”. Method has options for Pearson, Kendall, and
Spearman correlations. If your data have any missing data, you need to
add the “use” option to tell R how to deal with missing data. Type
?cor
in the console for more information.
cor(dat$read, dat$math, method = "pearson")
## [1] 0.6622801
cor(dat$read, dat$math, method = "spearman")
## [1] 0.663126
To get a correlation matrix, you first need to subset your data to
keep on the numeric variables, then run the cor
function on
this new object.
# save out numeric variables
dat_numeric <- dat[,c("read", "write", "science", "socst")]
cor(dat_numeric)
## read write science socst
## read 1.0000000 0.5967765 0.6301579 0.6214843
## write 0.5967765 1.0000000 0.5704416 0.6047932
## science 0.6301579 0.5704416 1.0000000 0.4651060
## socst 0.6214843 0.6047932 0.4651060 1.0000000
You can use cor.test
to get the correlation coefficient,
a p-value, and a confidence interval for the correlation:
cor.test(dat$read, dat$write,
method = "pearson",
use = "complete.obs")
##
## Pearson's product-moment correlation
##
## data: dat$read and dat$write
## t = 10.465, df = 198, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4993831 0.6792753
## sample estimates:
## cor
## 0.5967765
To get the regression line from R, use the lm
function.
Give it your dependent variable (i.e, the variable you want to predict)
and your independent variable, seperated by a tilde ~
.
lm(math ~ read, data = dat)
##
## Call:
## lm(formula = math ~ read, data = dat)
##
## Coefficients:
## (Intercept) read
## 21.0382 0.6051
Thus, the regression equation would be \(\hat{Math} = 21.0382 + 0.6051*Read\).
Let’s test the null hypothesis that the mean math achievement score
in the population is 50. We can use the t.test
function for
a one-sample t-test, an independent samples t-test, and a paired-samples
t-test. For the one-sample t-test, we give the function our variable of
interest and our hypothesized population parameter in the argument
mu
.
t.test(dat$math, mu = 50)
##
## One Sample t-test
##
## data: dat$math
## t = 3.9928, df = 199, p-value = 9.184e-05
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 51.33868 53.95132
## sample estimates:
## mean of x
## 52.645
We can use the effectsize
package to compute effect
sizes for our statistical tests.
effectsize::cohens_d(dat$math, mu = 50)
## Cohen's d | 95% CI
## ------------------------
## 0.28 | [0.14, 0.42]
##
## - Deviation from a difference of 50.
We can use the t.test
function to conduct an independent
t-test with one grouping variable (IV) that has 2 levels and one
dependent variable (DV) that is continuous. For an independent-samples
t-test, you can use: t.test(DV ~ IV, data=data)
To assess whether variances are similar across groups, we can use the
leveneTest(DV ~ IV, data=data)
function from the
car
package:
car::leveneTest(science ~ female,
data=dat)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 3.0015 0.08475 .
## 198
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results did not show a significant difference in variances (i.e., we meet the assumption).
t.test(science ~ female,
data=dat,
var.equal = TRUE)
##
## Two Sample t-test
##
## data: science by female
## t = -1.8124, df = 198, p-value = 0.07144
## alternative hypothesis: true difference in means between group female and group male is not equal to 0
## 95 percent confidence interval:
## -5.290207 0.223164
## sample estimates:
## mean in group female mean in group male
## 50.69725 53.23077
# Note that I had to specify `var.equal = TRUE` because FALSE is the default, so if we did not include this, it would default to a Welch's t-test
To compute effect sizes, we can use the effectsize
package.
# standardized mean difference
effectsize::cohens_d(science ~ female, data = dat)
## Cohen's d | 95% CI
## -------------------------
## -0.26 | [-0.54, 0.02]
##
## - Estimated using pooled SD.
# proportion of variance explained
effectsize::omega_squared(aov(science~female, data = dat))
## For one-way between subjects designs, partial omega squared is
## equivalent to omega squared. Returning omega squared.
## # Effect Size for ANOVA
##
## Parameter | Omega2 | 95% CI
## ---------------------------------
## female | 0.01 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
Since the High School and Beyond dataset does not contain any repeated-measures data, I will illustrate how to perform a paired-sample test with the following example and data.
A major oil company would like to improve its tarnished image
following a large oil spill. Its marketing department develops a short
television commercial and test it on a sample of n = 7 subjects.
People’s attitudes about the company are measured with a short
questionnaire both before and after viewing the commercial where higher
values indicate better perceptions of the company. The data are as
follows. Do the data indicate that the commercial would improve the
image? Use an alpha = .10. Participant Perceptions (scores range from 0
to 20).
Before | After |
---|---|
15 | 15 |
11 | 13 |
10 | 18 |
11 | 12 |
14 | 16 |
10 | 10 |
11 | 19 |
# enter data into R
paired_data <- data.frame(before = c(15, 11, 10, 11, 14, 10, 11),
after = c(15, 13, 18, 12, 16, 10, 19))
# Let's print the dataframe to ensure we have entered the data correctly
paired_data
## before after
## 1 15 15
## 2 11 13
## 3 10 18
## 4 11 12
## 5 14 16
## 6 10 10
## 7 11 19
# t-test
t.test(paired_data$after, paired_data$before, paired = TRUE, conf.level = .90)
##
## Paired t-test
##
## data: paired_data$after and paired_data$before
## t = 2.2601, df = 6, p-value = 0.06454
## alternative hypothesis: true mean difference is not equal to 0
## 90 percent confidence interval:
## 0.4206854 5.5793146
## sample estimates:
## mean difference
## 3
# effect size
effectsize::cohens_d(paired_data$after, paired_data$before, paired = TRUE, ci = .90)
## For paired samples, 'repeated_measures_d()' provides more options.
## Cohen's d | 90% CI
## ------------------------
## 0.85 | [0.09, 1.56]
Let’s use an example to illustrate the use of a chi-square goodness of fit test.
A researcher is interested in the ability of people to recall their dreams, so the researcher observes 80 participants as they slept overnight in a laboratory. As part of the study, the research woke participants during the REM stages of their sleep and asked participants whether or not they were dreaming. The data showing dream recall is below:
Did Recall Dream | Did Not Recall Dream | Unsure |
---|---|---|
58 | 12 | 10 |
Based on previous studies, the researcher expects 80% of participants to recall their dream, 10% to not recall their dream, and 10% to be unsure. Do these data follow the same distribution as previous research? Test this hypothesis using an alpha = .05.
# Enter data into R
dream <- c(58, 12, 10)
# add names
names(dream) <- c("Did recall", "Did not", "Unsure")
# Perform chi-square test
dream_results <- chisq.test(dream, p = c(.8, .1, .1))
# to view results
dream_results
##
## Chi-squared test for given probabilities
##
## data: dream
## X-squared = 3.0625, df = 2, p-value = 0.2163
# to view residuals
dream_results$stdres
## Did recall Did not Unsure
## -1.677051 1.490712 0.745356
Let’s examine whether males and females differ in the types of academic programs they are enrolled.
Recall that crosstabs can easily be displayed with tabyl
as well as in Base R using table
:
#Tidyverse and tabyl
dat |>
tabyl(prog, female)
## prog female male
## academic 58 47
## general 24 21
## vocation 27 23
#Base R
table(dat$prog, dat$female)
##
## female male
## academic 58 47
## general 24 21
## vocation 27 23
To calculate a chi-square statistics, we can call the
chisq.test
function from either the janitor
package or Base R. Note that because base R also has a
chisq.test
function, we will use
janitor::chisq.test()
. Both packages function the same way,
only the data is entered differently.
For either function, you must give it a crosstabs table and use
chisq.test(crosstabs table)
.
# Using tably from janitor
dat |>
tabyl(prog, female) |>
janitor::chisq.test()
##
## Pearson's Chi-squared test
##
## data: tabyl(dat, prog, female)
## X-squared = 0.052809, df = 2, p-value = 0.9739
#Base R
stats::chisq.test(dat$prog, dat$female)
##
## Pearson's Chi-squared test
##
## data: dat$prog and dat$female
## X-squared = 0.052809, df = 2, p-value = 0.9739
You can use aov
from the stats
package for
a oneway between-subjects ANOVA, like this:
aov(DV ~ group, data = data)
. We can then use
summary.aov
to get the results. Note: the
aov
function computes Type I Sums of Squares. If you want
to reproduce the output you get from other statistical software packages
such as SPSS and SAS, you’ll need to use the car
package
and the Anova
function to compute Type III Sums of Squares.
For balanced designs, these give the same results. See the following for
more information on ANOVA Sums of Squares calculation: https://www.r-bloggers.com/2011/03/anova-%E2%80%93-type-iiiiii-ss-explained/.
ses_analysis <- aov(read ~ ses, data = dat)
summary.aov(ses_analysis)
## Df Sum Sq Mean Sq F value Pr(>F)
## ses 2 1832 916.2 9.456 0.00012 ***
## Residuals 197 19087 96.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Type III Sums of Squares using the `Anova` function from the `car` package
(ses_analysis_2 <- car::Anova(lm(read ~ ses, data=dat), type="III"))
## Anova Table (Type III tests)
##
## Response: read
## Sum Sq Df F value Pr(>F)
## (Intercept) 185150 1 1910.962 < 2.2e-16 ***
## ses 1832 2 9.456 0.0001199 ***
## Residuals 19087 197
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Welch's ANOVA
oneway.test(read ~ ses, data = dat)
##
## One-way analysis of means (not assuming equal variances)
##
## data: read and ses
## F = 8.7397, num df = 2.00, denom df = 107.24, p-value = 0.0003046
# Levene's test
car::leveneTest(read ~ ses, data = dat)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 2.5538 0.08036 .
## 197
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can perform post-hoc tests using different functions. For example, a Tukey test:
TukeyHSD(ses_analysis)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = read ~ ses, data = dat)
##
## $ses
## diff lwr upr p adj
## low-high -8.223404 -12.7855387 -3.661270 0.0000948
## middle-high -4.921053 -8.7945768 -1.047528 0.0085237
## middle-low 3.302352 -0.8430804 7.447784 0.1468044
A pairwise t-test with a Benjamini-Hochberg adjustment (i.e., false discovery rate):
pairwise.t.test(dat$read, dat$ses,
p.adjust.method = "BH")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dat$read and dat$ses
##
## high low
## low 9.6e-05 -
## middle 0.0046 0.0614
##
## P value adjustment method: BH
To calculate an effect size, such as eta squared or partial eta
squared, you can use the effectsize
package
effectsize::omega_squared(ses_analysis)
## # Effect Size for ANOVA
##
## Parameter | Omega2 | 95% CI
## ---------------------------------
## ses | 0.08 | [0.02, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
If you want to save the data and objects that you have loaded, cleaned, and analyzed, you may do so in two ways:
You can select the data objects and save them using save
and saving it as a .Rdata file:
save(dat, dat_numeric, paired_data, file="my_data.Rdata")
You may load this data back into your R environment using
load
:
load("my_data.Rdata")
You can also save all the entire environment (workspace) using
save.image
:
save.image(file = "all_data.Rdata")
You can load it the same way as above.
You can also save data frames as CSV files with
write.csv
:
write.csv(dat_numeric, file="numeric_data.csv")
You can save your code and output together by compiling your document. Navigate to File -> Compile Document and choose HTML, Word, or PDF. Note: if there are any errors in your code, it will not compile until you fix them. Reading the error messages can help.