The analysis below explores the performance of my friends during our 2019 Fantasy Football Draft. Since 2016 we have been drafting in a no-cheatsheet environment which means that each drafter must know who they are going to draft before the draft starts. As such the performance of each drafter could vary wildly and is thus an interesting phenomena to explore.
Note that I am using the ESPN Top 300 PPR rankings as a proxy for ground truth. Many of the rankings there are debatable but I think overall it gives a good rough estimate of each player’s value.
The source and data for this project are available at https://github.com/EvanOman/FantasyFootball2019. My contact info is available on my website: http://www.evanoman.com.
Let the fun begin!
First we load in the draft data. The data is recorded in the same matrix format as the draft board:
Just like 2016 and 2017, this data needs to be adjusted to get into the format we want (i.e. one row per draft selection). We will begin by loading the draft data.
draftData <- read.csv("./data/Draft2019.csv", stringsAsFactors = F)
print.data.frame(head(draftData), row.names=FALSE)
## Andrew Kuuby Mikey Joey William
## Saquon Barkley Adam Thielen Christian McCaffrey Ezekiel Elliott Alvin Kamara
## John Brown Brandin Cooks Todd Gurley II JuJu Smith-Schuster Julio Jones
## Dalvin Cook Baker Mayfield Patrick Mahomes Michael Thomas Mike Evans
## Tyler Lockett Melvin Gordon Chris Godwin Leonard Fournette Chris Carson
## Zach Ertz Devonta Freeman Amari Cooper Mike Williams T.Y. Hilton
## Aaron Rodgers Kyle Rudolph Julian Edelman Josh Gordon David Montgomery
## Evan Jacob Chris
## David Johnson DeAndre Hopkins Tyreek Hill
## Joe Mixon Davante Adams Odell Beckham Jr.
## Antonio Brown Travis Kelce Nick Chubb
## Keenan Allen James Conner LeVeon Bell
## Damien Williams Kerryon Johnson Evan Engram
## Robert Woods Stefon Diggs Cooper Kupp
As you can see, the data is organized by round with a column for each drafter. This format is often referred to as a “wide data-set.” For this analysis we want a “tall data set” so we will use the “melt” function to reorganize the data into one row per data point (ie one row per pick).
library("reshape2")
library("plyr")
# Add a "ROUND" Column
draftData$ROUND = 1:16
# Melt the data
draftData <- melt(draftData, id.vars = "ROUND")
# Drop the row names
rownames(draftData) <- NULL
# Order by round (this is a stable sort)
draftData <- draftData[order(draftData$ROUND), ]
# Rename some columns
draftData <- rename(draftData, c("variable" = "Drafter", "value" = "Player", "ROUND" = "Round"))
print.data.frame(head(draftData, n=10), row.names=FALSE)
## Round Drafter Player
## 1 Andrew Saquon Barkley
## 1 Kuuby Adam Thielen
## 1 Mikey Christian McCaffrey
## 1 Joey Ezekiel Elliott
## 1 William Alvin Kamara
## 1 Evan David Johnson
## 1 Jacob DeAndre Hopkins
## 1 Chris Tyreek Hill
## 2 Andrew John Brown
## 2 Kuuby Brandin Cooks
Now that we have the data in a tall form, we want to know what the actual number for each pick. This is complicated by the winding nature of the draft order (Chris had the 8th and 9th picks). To capture this behavior I have written the following “snake function” which creates a sequence which winds up to 8 and then counts down from 16, up to 24, etc.
# Creates the sequence: 1,2,3,4,5,6,7,8,16,15,14,13,12,11,10,9,17,18,19,20....
snakeFunction <- function(n, rowLen=8)
{
lapply(1:n, function(i){
groupIndicator <- ceiling(i / rowLen)
# Increasing group
if (groupIndicator %% 2 == 1)
{
i
}
# Decreasing group
else
{
rowLen*groupIndicator - ((i - 1) %% rowLen)
}
})
}
# Create the list for out draft (8 drafters, 16 rounds)
pickOrdering <- snakeFunction(8*16)
unlist(pickOrdering)
## [1] 1 2 3 4 5 6 7 8 16 15 14 13 12 11 10 9 17 18 19 20 21 22 23
## [24] 24 32 31 30 29 28 27 26 25 33 34 35 36 37 38 39 40 48 47 46 45 44 43
## [47] 42 41 49 50 51 52 53 54 55 56 64 63 62 61 60 59 58 57 65 66 67 68 69
## [70] 70 71 72 80 79 78 77 76 75 74 73 81 82 83 84 85 86 87 88 96 95 94 93
## [93] 92 91 90 89 97 98 99 100 101 102 103 104 112 111 110 109 108 107 106 105 113 114 115
## [116] 116 117 118 119 120 128 127 126 125 124 123 122 121
We can then use this pick ordering list to index the draft picks in our draft data frame.
# Add the pick numbers to the draft data
draftData$PickNumber <- pickOrdering
draftData <- as.data.frame(lapply(draftData, unlist))
# Order the draft data by pick number
draftData <- draftData[order(draftData$PickNumber),]
print.data.frame(head(draftData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber
## 1 Andrew Saquon Barkley 1
## 1 Kuuby Adam Thielen 2
## 1 Mikey Christian McCaffrey 3
## 1 Joey Ezekiel Elliott 4
## 1 William Alvin Kamara 5
## 1 Evan David Johnson 6
## 1 Jacob DeAndre Hopkins 7
## 1 Chris Tyreek Hill 8
## 2 Chris Odell Beckham Jr. 9
## 2 Jacob Davante Adams 10
Now that the draft data has been prepped, we can load in the rankings data:
rankings <- read.csv("./data/EspnPprRankings2019.csv", stringsAsFactors = F)
# Read player as string, not factor
rankings$Player <- as.character(rankings$Player)
# Add "Rank" column sequentially
rankings$Rank <- 1:nrow(rankings)
print.data.frame(head(rankings, n=10), row.names=FALSE)
## Player Team Pos Rank
## Saquon Barkley NYG RB 1
## Christian McCaffrey CAR RB 2
## Alvin Kamara NO RB 3
## David Johnson ARI RB 4
## Davante Adams GB WR 5
## DeAndre Hopkins HOU WR 6
## Ezekiel Elliott DAL RB 7
## Odell Beckham Jr. CLE WR 8
## Todd Gurley II LAR RB 9
## Julio Jones ATL WR 10
This dataset is already in the tall format we want so there will be no reformating needed. With all of the data prepared we can just join draftData
with rankings
by Player
. Note that this is a left join by default so the resulting dataframe will only contain rows from our draft dataframe. The ranking dataset only includes the top 300 players. Thus some players will not have a ranking so after the join these players will have a NA
rank. We will replace this with the worst possible ranking (nrow(rankings)
) plus 1
.
library("plyr")
# Join rankings data with draft data
joinedData <- join(draftData, rankings, by="Player")
# Some players will not have a ranking, give these players a worst ranking + 1
joinedData$Rank[is.na(joinedData$Rank)] <- nrow(rankings) + 1
print.data.frame(head(joinedData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber Team Pos Rank
## 1 Andrew Saquon Barkley 1 NYG RB 1
## 1 Kuuby Adam Thielen 2 MIN WR 26
## 1 Mikey Christian McCaffrey 3 CAR RB 2
## 1 Joey Ezekiel Elliott 4 DAL RB 7
## 1 William Alvin Kamara 5 NO RB 3
## 1 Evan David Johnson 6 ARI RB 4
## 1 Jacob DeAndre Hopkins 7 HOU WR 6
## 1 Chris Tyreek Hill 8 KC WR 16
## 2 Chris Odell Beckham Jr. 9 CLE WR 8
## 2 Jacob Davante Adams 10 GB WR 5
We now have a dataframe with each draft pick with the corresponding pick number and player ranking. We can then use the pick number and player ranking to measure the quality of each draft pick.
joinedData$RankDiff <- joinedData$Rank - joinedData$PickNumber
print.data.frame(head(joinedData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber Team Pos Rank RankDiff
## 1 Andrew Saquon Barkley 1 NYG RB 1 0
## 1 Kuuby Adam Thielen 2 MIN WR 26 24
## 1 Mikey Christian McCaffrey 3 CAR RB 2 -1
## 1 Joey Ezekiel Elliott 4 DAL RB 7 3
## 1 William Alvin Kamara 5 NO RB 3 -2
## 1 Evan David Johnson 6 ARI RB 4 -2
## 1 Jacob DeAndre Hopkins 7 HOU WR 6 -1
## 1 Chris Tyreek Hill 8 KC WR 16 8
## 2 Chris Odell Beckham Jr. 9 CLE WR 8 -1
## 2 Jacob Davante Adams 10 GB WR 5 -5
At this point all of the data has been prepped so we are ready to generate some plots and do a bit of analysis.
Probably the most natural measure of performance would be to look at the average difference between pick number and rank, across all rounds, organized by Drafter. We can calculate this using aggregate
.
avgDiffs <- aggregate(RankDiff ~ Drafter, joinedData, FUN=mean)
# Order factor by worst draft to best
avgDiffs <- within(avgDiffs, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))
# Order dataframe by worst draft to best
avgDiffs <- avgDiffs[order(avgDiffs$RankDiff, decreasing = FALSE),]
print.data.frame(avgDiffs, row.names=FALSE)
## Drafter RankDiff
## Jacob 3.6250
## Evan 4.5000
## Chris 8.4375
## Andrew 13.8125
## William 14.1250
## Joey 19.0625
## Mikey 46.4375
## Kuuby 54.5625
Thus we can see that Jacob had the best draft (on average he got players 3.625 positions over rank value) and Kuuby had the worst (on average, he got players 54.5625 positions over rank value). Here is a quick visual of the above dataframe.
library(ggplot2)
ggplot(avgDiffs, aes(x=Drafter, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")
Since kickers and defenses are generally ranked very low, and because I picked a bad kicker and defense, I was curious to see what these rankings look like without those position types:
noKorDst <- joinedData[!(joinedData$Pos %in% list("DST", "K")), ]
avgDiffs2 <- aggregate(RankDiff ~ Drafter, noKorDst, FUN=mean)
# Order factor by worst draft to best
avgDiffs2 <- within(avgDiffs2, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))
# Order dataframe by worst draft to best
avgDiffs2 <- avgDiffs2[order(avgDiffs2$RankDiff, decreasing = FALSE),]
print.data.frame(avgDiffs2, row.names=FALSE)
## Drafter RankDiff
## Evan -9.928571
## Jacob -2.000000
## William 2.428571
## Chris 5.714286
## Andrew 8.500000
## Joey 8.500000
## Mikey 47.066667
## Kuuby 47.416667
I was curious to see how undervalued/overvalued players were by position. We can use a similar methodology as above to figure this out.
posDiffs <- aggregate(RankDiff ~ Pos, joinedData, FUN=mean)
posDiffs <- within(posDiffs, Pos <- factor(Pos, levels=posDiffs[order(posDiffs$RankDiff), "Pos"]))
# Order dataframe by worst draft to best
posDiffs <- posDiffs[order(posDiffs$RankDiff, decreasing = FALSE),]
print.data.frame(posDiffs, row.names=FALSE)
## Pos RankDiff
## RB -2.473684
## WR 5.659091
## QB 38.785714
## TE 41.142857
## DST 68.333333
## K 69.750000
ggplot(posDiffs, aes(x=Pos, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")
Here we can see that RB were under valued by 2.4736842 positions and K were overvalued by 69.75 positions.
I was also curious to see how draft pick quality changed by round. In the next two figures I am plotting Pick #
vs Rank
with a diagonal line indicating a pick at the rank value. Thus all picks below this line are “good” and all of the above are “bad”. In the first figure I colored the picks by Drafter and in the second I am coloring by position.
# Scatter colored by drafter
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Drafter)) + geom_abline() + xlab("Pick #")
# Scatter colored by position
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Pos)) + geom_abline() + xlab("Pick #")
It is pretty clear that as time goes on, worse and worse picks made.
Finally I wanted to see how each drafter performed by round, the following figure accomplishes this (here any pick below the dashed line is “good” and anything above the dased line is “bad”).
# Scatter colored by position, faceted by drafer
ggplot(joinedData, aes(x=Round, y=RankDiff)) + geom_line() + facet_wrap(~Drafter, ncol=2) + geom_hline(yintercept = 0, linetype=2) + geom_point(aes(color=Pos)) + ylab("Difference Between Pick # and Ranking")
Here are some final take aways: