The analysis below explores the performance of my friends during our 2017 Fantasy Football Draft. Since last year we have been drafting in a no-cheatsheet environment which means that each drafter must know who they are going to draft before the draft starts. As such the performance of each drafter could vary wildly and is thus an interesting phenomena to explore.
Note that I am using the ESPN Top 300 PPR rankings as a proxy for ground truth. Many of the rankings there are debatable but I think overall it gives a good rough estimate of each player’s value.
The source and data for this project are available at https://github.com/EvanOman/FantasyFootball2017. My contact info is available on my website: http://www.evanoman.com.
Let the fun begin!
First we load in the spreadsheet I made from the draft board:
library("reshape2")
library("plyr")
draftData <- read.csv("./data/Draft2017.csv", stringsAsFactors = F)
print.data.frame(head(draftData), row.names=FALSE)
## ROUND Michael Andrew William Chris Joseph
## Franchise Amari Cooper Aaron Rodgers Le'Veon Bell David Johnson Antonio Brown
## 1 Ezekiel Elliott LeSean McCoy Michael Thomas A.J. Green Devonta Freeman
## 2 Marshawn Lynch Alshon Jeffery T.Y. Hilton Dez Bryant Drew Brees
## 3 Emmanuel Sanders Ty Montgomery Carlos Hyde Rob Gronkowski DeAndre Hopkins
## 4 Cam Newton Demaryius Thomas Melvin Gordon Keenan Allen Dalvin Cook
## 5 Ameer Abdullah Delanie Walker Golden Tate Davante Adams Travis Kelce
## Evan Kuuuuuuby Jacob
## Julio Jones Tom Brady Odell Beckham Jr.
## Jordy Nelson Christian McCaffrey Mike Evans
## DeMarco Murray Brandin Cooks Jordan Howard
## Doug Baldwin Kyle Rudolph Jay Ajayi
## Leonard Fournette Martavis Bryant Terrelle Pryor Sr.
## Lamar Miller Texans D/ST Greg Olsen
As you can see, the data is organized by round with a column for each drafter. This format is often referred to as a “wide data-set.” For this analysis we want a “tall data set” so we will use the “melt” function to reorganize the data into one row per data point (ie one row per pick). Note that we will first drop the Franchise
row (not part of the draft directly) and then melt the resulting dataset:
# Save the franchise row for later
franchisePlayers <- draftData[draftData$ROUND == "Franchise", ]
# Drop the franchise row
draftData <- subset(draftData, ROUND != "Franchise")
# Melt the data
draftData <- melt(draftData, id.vars = "ROUND")
# Drop the row names
rownames(draftData) <- NULL
# Make the round a number instead of a string
draftData$ROUND <- as.numeric(as.character(draftData$ROUND))
# Order by round (this is a stable sort)
draftData <- draftData[order(draftData$ROUND), ]
# Rename some columns
draftData <- rename(draftData, c("variable" = "Drafter", "value" = "Player", "ROUND" = "Round"))
print.data.frame(head(draftData, n=10), row.names=FALSE)
## Round Drafter Player
## 1 Michael Ezekiel Elliott
## 1 Andrew LeSean McCoy
## 1 William Michael Thomas
## 1 Chris A.J. Green
## 1 Joseph Devonta Freeman
## 1 Evan Jordy Nelson
## 1 Kuuuuuuby Christian McCaffrey
## 1 Jacob Mike Evans
## 2 Michael Marshawn Lynch
## 2 Andrew Alshon Jeffery
Now that we have the data in a tall form, we want to know the actual pick number for each row. This is complicated by the winding nature of the draft order (e.g. Jacob had both the 8th and 9th picks). To capture this behavior I have written the following “snake function” which creates a sequence which winds up to 8 and then counts down from 16, up to 24, etc.
# Creates the sequence: 1,2,3,4,5,6,7,8,16,15,14,13,12,11,10,9,17,18,19,20....
snakeFunction <- function(n, rowLen=8)
{
lapply(1:n, function(i){
groupIndicator <- ceiling(i / rowLen)
# Increasing group
if (groupIndicator %% 2 == 1)
{
i
}
# Decreasing group
else
{
rowLen*groupIndicator - ((i - 1) %% rowLen)
}
})
}
# Create the list for out draft (8 drafters, 16 rounds)
pickOrdering <- snakeFunction(8*15)
unlist(pickOrdering)
## [1] 1 2 3 4 5 6 7 8 16 15 14 13 12 11 10 9 17 18 19 20 21 22 23
## [24] 24 32 31 30 29 28 27 26 25 33 34 35 36 37 38 39 40 48 47 46 45 44 43
## [47] 42 41 49 50 51 52 53 54 55 56 64 63 62 61 60 59 58 57 65 66 67 68 69
## [70] 70 71 72 80 79 78 77 76 75 74 73 81 82 83 84 85 86 87 88 96 95 94 93
## [93] 92 91 90 89 97 98 99 100 101 102 103 104 112 111 110 109 108 107 106 105 113 114 115
## [116] 116 117 118 119 120
We can then use this pick ordering list to index the draft picks in our draft data frame.
# Add the pick numbers to the draft data
draftData$PickNumber <- pickOrdering
draftData <- as.data.frame(lapply(draftData, unlist))
# Order the draft data by pick number
draftData <- draftData[order(draftData$PickNumber),]
# Make the player a string instead of a factor
draftData$Player <- as.character(draftData$Player)
print.data.frame(head(draftData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber
## 1 Michael Ezekiel Elliott 1
## 1 Andrew LeSean McCoy 2
## 1 William Michael Thomas 3
## 1 Chris A.J. Green 4
## 1 Joseph Devonta Freeman 5
## 1 Evan Jordy Nelson 6
## 1 Kuuuuuuby Christian McCaffrey 7
## 1 Jacob Mike Evans 8
## 2 Jacob Jordan Howard 9
## 2 Kuuuuuuby Brandin Cooks 10
Now that the draft data has been prepped, we can load in the rankings data:
rankings <- read.csv("./data/EspnPprRankings2017.csv", stringsAsFactors = F)
# Read player as string, not factor
rankings$Player <- as.character(rankings$Player)
print.data.frame(head(rankings, n=10), row.names=FALSE)
## Rank Player Pos Team
## 1 David Johnson RB ARI
## 2 Le'Veon Bell RB PIT
## 3 Antonio Brown WR PIT
## 4 Julio Jones WR ATL
## 5 Odell Beckham Jr. WR NYG
## 6 LeSean McCoy RB BUF
## 7 Jordy Nelson WR GB
## 8 Mike Evans WR TB
## 9 A.J. Green WR CIN
## 10 Devonta Freeman RB ATL
This dataset is already in the tall format we want so there will be no reformating needed. However we will need to remove the Franchise
players from the rankings set and adjust the rankings of the remaining players. This can be accomplished by filtering out the players in franchisePlayers
and then reranking the remaining players in consecutive order.
# Remove franchise players
rankings <- rankings[!(rankings$Player %in% franchisePlayers), ]
# Apply consecutive ranking to remaining players
rankings$Rank <- 1:nrow(rankings)
print.data.frame(head(rankings, n=10), row.names=FALSE)
## Rank Player Pos Team
## 1 LeSean McCoy RB BUF
## 2 Jordy Nelson WR GB
## 3 Mike Evans WR TB
## 4 A.J. Green WR CIN
## 5 Devonta Freeman RB ATL
## 6 Michael Thomas WR NO
## 7 Melvin Gordon RB LAC
## 8 Jordan Howard RB CHI
## 9 DeMarco Murray RB TEN
## 10 Jay Ajayi RB MIA
Now that this final step has been performed we can just join draftData
with rankings
by Player
. Note that this is a left join by default so the resulting dataframe will only contain rows from our draft dataframe. The ranking dataset only includes the top 300 players. Thus some players will not have a ranking so after the join these players will have a NA
rank. We will replace this with the worst possible ranking (nrow(rankings)
) plus 1
.
# Join rankings data with draft data
joinedData <- join(draftData, rankings, by="Player")
# Some players will not have a ranking, give these players a worst ranking + 1
joinedData$Rank[is.na(joinedData$Rank)] <- nrow(rankings) + 1
print.data.frame(head(joinedData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber Rank Pos Team
## 1 Michael Ezekiel Elliott 1 24 RB DAL
## 1 Andrew LeSean McCoy 2 1 RB BUF
## 1 William Michael Thomas 3 6 WR NO
## 1 Chris A.J. Green 4 4 WR CIN
## 1 Joseph Devonta Freeman 5 5 RB ATL
## 1 Evan Jordy Nelson 6 2 WR GB
## 1 Kuuuuuuby Christian McCaffrey 7 28 RB CAR
## 1 Jacob Mike Evans 8 3 WR TB
## 2 Jacob Jordan Howard 9 8 RB CHI
## 2 Kuuuuuuby Brandin Cooks 10 12 WR NE
We now have a dataframe with each draft pick with the corresponding pick number and player ranking. We can then use the pick number and player ranking to measure the quality of each draft pick.
joinedData$RankDiff <- joinedData$Rank - joinedData$PickNumber
print.data.frame(head(joinedData, n=10), row.names=FALSE)
## Round Drafter Player PickNumber Rank Pos Team RankDiff
## 1 Michael Ezekiel Elliott 1 24 RB DAL 23
## 1 Andrew LeSean McCoy 2 1 RB BUF -1
## 1 William Michael Thomas 3 6 WR NO 3
## 1 Chris A.J. Green 4 4 WR CIN 0
## 1 Joseph Devonta Freeman 5 5 RB ATL 0
## 1 Evan Jordy Nelson 6 2 WR GB -4
## 1 Kuuuuuuby Christian McCaffrey 7 28 RB CAR 21
## 1 Jacob Mike Evans 8 3 WR TB -5
## 2 Jacob Jordan Howard 9 8 RB CHI -1
## 2 Kuuuuuuby Brandin Cooks 10 12 WR NE 2
At this point all of the data has been prepped so we are ready to generate some plots and do a bit of analysis.
Probably the most natural measure of performance would be to look at the average difference between pick number and rank, across all rounds, organized by Drafter. We can calculate this using aggregate
.
avgDiffs <- aggregate(RankDiff ~ Drafter, joinedData, FUN=mean)
# Order factor by worst draft to best
avgDiffs <- within(avgDiffs, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))
# Order dataframe by worst draft to best
avgDiffs <- avgDiffs[order(avgDiffs$RankDiff, decreasing = FALSE),]
print.data.frame(avgDiffs, row.names=FALSE)
## Drafter RankDiff
## Chris -8.933333
## Joseph 16.666667
## Jacob 18.133333
## Evan 18.400000
## Kuuuuuuby 30.400000
## Andrew 31.200000
## William 31.466667
## Michael 43.933333
Thus we can see that Chris had the best draft (on average he got players 8.9333333 positions under rank value) and Michael had the worst (on average, he got players 43.9333333 positions over rank value). Here is a quick visual of the above dataframe.
library(ggplot2)
ggplot(avgDiffs, aes(x=Drafter, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")
I was curious to see how undervalued/overvalued players were by position. We can use a similar methodology as above to figure this out.
posDiffs <- aggregate(RankDiff ~ Pos, joinedData, FUN=mean)
posDiffs <- within(posDiffs, Pos <- factor(Pos, levels=posDiffs[order(posDiffs$RankDiff), "Pos"]))
# Order dataframe by worst draft to best
posDiffs <- posDiffs[order(posDiffs$RankDiff, decreasing = FALSE),]
print.data.frame(posDiffs, row.names=FALSE)
## Pos RankDiff
## WR -0.97435897
## RB 0.05263158
## TE 32.58333333
## QB 40.27272727
## D/ST 44.50000000
## K 60.14285714
## DST 161.33333333
ggplot(posDiffs, aes(x=Pos, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")
Here we can see that WR were under valued by 0.974359 positions and DST were overvalued by 161.3333333 positions.
I was also curious to see how draft pick quality changed by round. In the next two figures I am plotting Pick #
vs Rank
with a diagonal line indicating a pick at the rank value. Thus all picks below this line are “good” and all of the above are “bad”. In the first figure I colored the picks by Drafter and in the second I am coloring by position.
# Scatter colored by drafter
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Drafter)) + geom_abline() + xlab("Pick #")
# Scatter colored by position
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Pos)) + geom_abline() + xlab("Pick #")
It is pretty clear that as time goes on, worse and worse picks made.
Finally I wanted to see how each drafter performed by round, the following figure accomplishes this (here any pick below the dashed line is “good” and anything above the dased line is “bad”).
# Scatter colored by position, faceted by drafer
ggplot(joinedData, aes(x=Round, y=RankDiff)) + geom_line() + facet_wrap(~Drafter, ncol=2) + geom_hline(yintercept = 0, linetype=2) + geom_point(aes(color=Pos)) + ylab("Difference Between Pick # and Ranking")
Here are some final take aways: