Introduction

The analysis below explores the performance of my friends during our 2019 Fantasy Football Draft. Since 2016 we have been drafting in a no-cheatsheet environment which means that each drafter must know who they are going to draft before the draft starts. As such the performance of each drafter could vary wildly and is thus an interesting phenomena to explore.

Note that I am using the ESPN Top 300 PPR rankings as a proxy for ground truth. Many of the rankings there are debatable but I think overall it gives a good rough estimate of each player’s value.

The source and data for this project are available at https://github.com/EvanOman/FantasyFootball2019. My contact info is available on my website: http://www.evanoman.com.

Let the fun begin!

Data Preparation

First we load in the draft data. The data is recorded in the same matrix format as the draft board:

2019 Draft Board

2019 Draft Board

Just like 2016 and 2017, this data needs to be adjusted to get into the format we want (i.e. one row per draft selection). We will begin by loading the draft data.

draftData <- read.csv("./data/Draft2019.csv", stringsAsFactors = F)
print.data.frame(head(draftData), row.names=FALSE)
##          Andrew           Kuuby               Mikey                Joey          William
##  Saquon Barkley    Adam Thielen Christian McCaffrey     Ezekiel Elliott     Alvin Kamara
##      John Brown   Brandin Cooks      Todd Gurley II JuJu Smith-Schuster      Julio Jones
##     Dalvin Cook  Baker Mayfield     Patrick Mahomes      Michael Thomas       Mike Evans
##   Tyler Lockett   Melvin Gordon        Chris Godwin   Leonard Fournette     Chris Carson
##       Zach Ertz Devonta Freeman        Amari Cooper       Mike Williams      T.Y. Hilton
##   Aaron Rodgers    Kyle Rudolph      Julian Edelman         Josh Gordon David Montgomery
##             Evan           Jacob             Chris
##    David Johnson DeAndre Hopkins       Tyreek Hill
##        Joe Mixon   Davante Adams Odell Beckham Jr.
##    Antonio Brown    Travis Kelce        Nick Chubb
##     Keenan Allen    James Conner       LeVeon Bell
##  Damien Williams Kerryon Johnson       Evan Engram
##     Robert Woods    Stefon Diggs       Cooper Kupp

As you can see, the data is organized by round with a column for each drafter. This format is often referred to as a “wide data-set.” For this analysis we want a “tall data set” so we will use the “melt” function to reorganize the data into one row per data point (ie one row per pick).

library("reshape2")
library("plyr")

# Add a "ROUND" Column
draftData$ROUND = 1:16

# Melt the data
draftData <- melt(draftData, id.vars = "ROUND")

# Drop the row names
rownames(draftData) <- NULL

# Order by round (this is a stable sort)
draftData <- draftData[order(draftData$ROUND), ]

# Rename some columns
draftData <- rename(draftData, c("variable" = "Drafter", "value" = "Player", "ROUND" = "Round"))

print.data.frame(head(draftData, n=10), row.names=FALSE)
##  Round Drafter              Player
##      1  Andrew      Saquon Barkley
##      1   Kuuby        Adam Thielen
##      1   Mikey Christian McCaffrey
##      1    Joey     Ezekiel Elliott
##      1 William        Alvin Kamara
##      1    Evan       David Johnson
##      1   Jacob     DeAndre Hopkins
##      1   Chris         Tyreek Hill
##      2  Andrew          John Brown
##      2   Kuuby       Brandin Cooks

Now that we have the data in a tall form, we want to know what the actual number for each pick. This is complicated by the winding nature of the draft order (Chris had the 8th and 9th picks). To capture this behavior I have written the following “snake function” which creates a sequence which winds up to 8 and then counts down from 16, up to 24, etc.

# Creates the sequence: 1,2,3,4,5,6,7,8,16,15,14,13,12,11,10,9,17,18,19,20....
snakeFunction <- function(n, rowLen=8)
{
  lapply(1:n, function(i){
    groupIndicator <- ceiling(i / rowLen)
    # Increasing group
    if (groupIndicator %% 2 == 1)
    {
      i
    }
    # Decreasing group
    else
    {
      rowLen*groupIndicator - ((i - 1) %% rowLen)
    }
  })
}

# Create the list for out draft (8 drafters, 16 rounds)
pickOrdering <- snakeFunction(8*16)

unlist(pickOrdering)
##   [1]   1   2   3   4   5   6   7   8  16  15  14  13  12  11  10   9  17  18  19  20  21  22  23
##  [24]  24  32  31  30  29  28  27  26  25  33  34  35  36  37  38  39  40  48  47  46  45  44  43
##  [47]  42  41  49  50  51  52  53  54  55  56  64  63  62  61  60  59  58  57  65  66  67  68  69
##  [70]  70  71  72  80  79  78  77  76  75  74  73  81  82  83  84  85  86  87  88  96  95  94  93
##  [93]  92  91  90  89  97  98  99 100 101 102 103 104 112 111 110 109 108 107 106 105 113 114 115
## [116] 116 117 118 119 120 128 127 126 125 124 123 122 121

We can then use this pick ordering list to index the draft picks in our draft data frame.

# Add the pick numbers to the draft data
draftData$PickNumber <- pickOrdering
draftData <- as.data.frame(lapply(draftData, unlist))

# Order the draft data by pick number
draftData <- draftData[order(draftData$PickNumber),]

print.data.frame(head(draftData, n=10), row.names=FALSE)
##  Round Drafter              Player PickNumber
##      1  Andrew      Saquon Barkley          1
##      1   Kuuby        Adam Thielen          2
##      1   Mikey Christian McCaffrey          3
##      1    Joey     Ezekiel Elliott          4
##      1 William        Alvin Kamara          5
##      1    Evan       David Johnson          6
##      1   Jacob     DeAndre Hopkins          7
##      1   Chris         Tyreek Hill          8
##      2   Chris   Odell Beckham Jr.          9
##      2   Jacob       Davante Adams         10

Now that the draft data has been prepped, we can load in the rankings data:

rankings <- read.csv("./data/EspnPprRankings2019.csv", stringsAsFactors = F)

# Read player as string, not factor
rankings$Player <- as.character(rankings$Player)

# Add "Rank" column sequentially
rankings$Rank <- 1:nrow(rankings)

print.data.frame(head(rankings, n=10), row.names=FALSE)
##               Player Team Pos Rank
##       Saquon Barkley  NYG  RB    1
##  Christian McCaffrey  CAR  RB    2
##         Alvin Kamara   NO  RB    3
##        David Johnson  ARI  RB    4
##        Davante Adams   GB  WR    5
##      DeAndre Hopkins  HOU  WR    6
##      Ezekiel Elliott  DAL  RB    7
##    Odell Beckham Jr.  CLE  WR    8
##       Todd Gurley II  LAR  RB    9
##          Julio Jones  ATL  WR   10

This dataset is already in the tall format we want so there will be no reformating needed. With all of the data prepared we can just join draftData with rankings by Player. Note that this is a left join by default so the resulting dataframe will only contain rows from our draft dataframe. The ranking dataset only includes the top 300 players. Thus some players will not have a ranking so after the join these players will have a NA rank. We will replace this with the worst possible ranking (nrow(rankings)) plus 1.

library("plyr")

# Join rankings data with draft data 
joinedData <- join(draftData, rankings, by="Player")

# Some players will not have a  ranking, give these players a worst ranking + 1
joinedData$Rank[is.na(joinedData$Rank)] <- nrow(rankings) + 1

print.data.frame(head(joinedData, n=10), row.names=FALSE)
##  Round Drafter              Player PickNumber Team Pos Rank
##      1  Andrew      Saquon Barkley          1  NYG  RB    1
##      1   Kuuby        Adam Thielen          2  MIN  WR   26
##      1   Mikey Christian McCaffrey          3  CAR  RB    2
##      1    Joey     Ezekiel Elliott          4  DAL  RB    7
##      1 William        Alvin Kamara          5   NO  RB    3
##      1    Evan       David Johnson          6  ARI  RB    4
##      1   Jacob     DeAndre Hopkins          7  HOU  WR    6
##      1   Chris         Tyreek Hill          8   KC  WR   16
##      2   Chris   Odell Beckham Jr.          9  CLE  WR    8
##      2   Jacob       Davante Adams         10   GB  WR    5

We now have a dataframe with each draft pick with the corresponding pick number and player ranking. We can then use the pick number and player ranking to measure the quality of each draft pick.

joinedData$RankDiff <- joinedData$Rank - joinedData$PickNumber

print.data.frame(head(joinedData, n=10), row.names=FALSE)
##  Round Drafter              Player PickNumber Team Pos Rank RankDiff
##      1  Andrew      Saquon Barkley          1  NYG  RB    1        0
##      1   Kuuby        Adam Thielen          2  MIN  WR   26       24
##      1   Mikey Christian McCaffrey          3  CAR  RB    2       -1
##      1    Joey     Ezekiel Elliott          4  DAL  RB    7        3
##      1 William        Alvin Kamara          5   NO  RB    3       -2
##      1    Evan       David Johnson          6  ARI  RB    4       -2
##      1   Jacob     DeAndre Hopkins          7  HOU  WR    6       -1
##      1   Chris         Tyreek Hill          8   KC  WR   16        8
##      2   Chris   Odell Beckham Jr.          9  CLE  WR    8       -1
##      2   Jacob       Davante Adams         10   GB  WR    5       -5

At this point all of the data has been prepped so we are ready to generate some plots and do a bit of analysis.

Figures and Results

Probably the most natural measure of performance would be to look at the average difference between pick number and rank, across all rounds, organized by Drafter. We can calculate this using aggregate.

avgDiffs <- aggregate(RankDiff ~ Drafter, joinedData, FUN=mean)

# Order factor by worst draft to best
avgDiffs <- within(avgDiffs, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))

# Order dataframe by worst draft to best
avgDiffs <- avgDiffs[order(avgDiffs$RankDiff, decreasing = FALSE),]

print.data.frame(avgDiffs, row.names=FALSE)
##  Drafter RankDiff
##    Jacob   3.6250
##     Evan   4.5000
##    Chris   8.4375
##   Andrew  13.8125
##  William  14.1250
##     Joey  19.0625
##    Mikey  46.4375
##    Kuuby  54.5625

Thus we can see that Jacob had the best draft (on average he got players 3.625 positions over rank value) and Kuuby had the worst (on average, he got players 54.5625 positions over rank value). Here is a quick visual of the above dataframe.

library(ggplot2)
ggplot(avgDiffs, aes(x=Drafter, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")

Since kickers and defenses are generally ranked very low, and because I picked a bad kicker and defense, I was curious to see what these rankings look like without those position types:

noKorDst <- joinedData[!(joinedData$Pos %in% list("DST", "K")), ]

avgDiffs2 <- aggregate(RankDiff ~ Drafter, noKorDst, FUN=mean)

# Order factor by worst draft to best
avgDiffs2 <- within(avgDiffs2, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))

# Order dataframe by worst draft to best
avgDiffs2 <- avgDiffs2[order(avgDiffs2$RankDiff, decreasing = FALSE),]

print.data.frame(avgDiffs2, row.names=FALSE)
##  Drafter  RankDiff
##     Evan -9.928571
##    Jacob -2.000000
##  William  2.428571
##    Chris  5.714286
##   Andrew  8.500000
##     Joey  8.500000
##    Mikey 47.066667
##    Kuuby 47.416667

I was curious to see how undervalued/overvalued players were by position. We can use a similar methodology as above to figure this out.

posDiffs <- aggregate(RankDiff ~ Pos, joinedData, FUN=mean)

posDiffs <- within(posDiffs, Pos <- factor(Pos, levels=posDiffs[order(posDiffs$RankDiff), "Pos"]))

# Order dataframe by worst draft to best
posDiffs <- posDiffs[order(posDiffs$RankDiff, decreasing = FALSE),]

print.data.frame(posDiffs, row.names=FALSE)
##  Pos  RankDiff
##   RB -2.473684
##   WR  5.659091
##   QB 38.785714
##   TE 41.142857
##  DST 68.333333
##    K 69.750000
ggplot(posDiffs, aes(x=Pos, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")

Here we can see that RB were under valued by 2.4736842 positions and K were overvalued by 69.75 positions.

I was also curious to see how draft pick quality changed by round. In the next two figures I am plotting Pick # vs Rank with a diagonal line indicating a pick at the rank value. Thus all picks below this line are “good” and all of the above are “bad”. In the first figure I colored the picks by Drafter and in the second I am coloring by position.

# Scatter colored by drafter 
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Drafter)) + geom_abline() + xlab("Pick #")

# Scatter colored by position 
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Pos)) + geom_abline() + xlab("Pick #")

It is pretty clear that as time goes on, worse and worse picks made.

Finally I wanted to see how each drafter performed by round, the following figure accomplishes this (here any pick below the dashed line is “good” and anything above the dased line is “bad”).

# Scatter colored by position, faceted by drafer
ggplot(joinedData, aes(x=Round, y=RankDiff)) + geom_line() + facet_wrap(~Drafter, ncol=2) + geom_hline(yintercept = 0, linetype=2) + geom_point(aes(color=Pos)) + ylab("Difference Between Pick # and Ranking")

Conclusion

Here are some final take aways: