Finding an outlier using Cook’s distance

A Cook’s distance greater than 1 is a sign that this data point (or random factor) is having a disproportionate influence on your model and should be looked into. Note: I’m not normally a fan of removing data without a valid reason, for me, you need both a statistical and experimental reason for removal.

#Loading data
example<- read.table(url("https://jackrrivers.com/wp-content/uploads/2018/04/ExampleCook.txt"), header=T)
example$ID<-as.factor(example$ID)

#Running a model
require(lme4)
lme1<-lmer(Dependent~1 + Factor1*Factor2+(1|ID), data=example, na.action=na.omit)

#Looking for large Cook distances
require(influence.ME)
infl <- influence(lme1, obs = TRUE)
cooks.distance(infl)
plot(infl, which = "cook")

Categories: R Code

Finding an outlier using Cook’s distance

Leave a Reply Cancel reply

Jump to R code

Talk on social media and science