Introduction

The analysis below explores the performance of my friends during our 2017 Fantasy Football Draft. Since last year we have been drafting in a no-cheatsheet environment which means that each drafter must know who they are going to draft before the draft starts. As such the performance of each drafter could vary wildly and is thus an interesting phenomena to explore.

Note that I am using the ESPN Top 300 PPR rankings as a proxy for ground truth. Many of the rankings there are debatable but I think overall it gives a good rough estimate of each player’s value.

The source and data for this project are available at https://github.com/EvanOman/FantasyFootball2017. My contact info is available on my website: http://www.evanoman.com.

Let the fun begin!

Data Preparation

First we load in the spreadsheet I made from the draft board:

library("reshape2")
library("plyr")
    
draftData <- read.csv("./data/Draft2017.csv", stringsAsFactors = F)
print.data.frame(head(draftData), row.names=FALSE)
##      ROUND          Michael           Andrew        William          Chris          Joseph
##  Franchise     Amari Cooper    Aaron Rodgers   Le'Veon Bell  David Johnson   Antonio Brown
##          1  Ezekiel Elliott     LeSean McCoy Michael Thomas     A.J. Green Devonta Freeman
##          2   Marshawn Lynch   Alshon Jeffery    T.Y. Hilton     Dez Bryant      Drew Brees
##          3 Emmanuel Sanders    Ty Montgomery    Carlos Hyde Rob Gronkowski DeAndre Hopkins
##          4       Cam Newton Demaryius Thomas  Melvin Gordon   Keenan Allen     Dalvin Cook
##          5   Ameer Abdullah   Delanie Walker    Golden Tate  Davante Adams    Travis Kelce
##               Evan           Kuuuuuuby              Jacob
##        Julio Jones           Tom Brady  Odell Beckham Jr.
##       Jordy Nelson Christian McCaffrey         Mike Evans
##     DeMarco Murray       Brandin Cooks      Jordan Howard
##       Doug Baldwin        Kyle Rudolph          Jay Ajayi
##  Leonard Fournette     Martavis Bryant Terrelle Pryor Sr.
##       Lamar Miller         Texans D/ST         Greg Olsen

As you can see, the data is organized by round with a column for each drafter. This format is often referred to as a “wide data-set.” For this analysis we want a “tall data set” so we will use the “melt” function to reorganize the data into one row per data point (ie one row per pick). Note that we will first drop the Franchise row (not part of the draft directly) and then melt the resulting dataset:

# Save the franchise row for later
franchisePlayers <- draftData[draftData$ROUND == "Franchise", ]

# Drop the franchise row
draftData <- subset(draftData, ROUND != "Franchise")

# Melt the data
draftData <- melt(draftData, id.vars = "ROUND")

# Drop the row names
rownames(draftData) <- NULL

# Make the round a number instead of a string
draftData$ROUND <- as.numeric(as.character(draftData$ROUND))

# Order by round (this is a stable sort)
draftData <- draftData[order(draftData$ROUND), ]

# Rename some columns
draftData <- rename(draftData, c("variable" = "Drafter", "value" = "Player", "ROUND" = "Round"))

print.data.frame(head(draftData, n=10), row.names=FALSE)
##  Round   Drafter              Player
##      1   Michael     Ezekiel Elliott
##      1    Andrew        LeSean McCoy
##      1   William      Michael Thomas
##      1     Chris          A.J. Green
##      1    Joseph     Devonta Freeman
##      1      Evan        Jordy Nelson
##      1 Kuuuuuuby Christian McCaffrey
##      1     Jacob          Mike Evans
##      2   Michael      Marshawn Lynch
##      2    Andrew      Alshon Jeffery

Now that we have the data in a tall form, we want to know the actual pick number for each row. This is complicated by the winding nature of the draft order (e.g. Jacob had both the 8th and 9th picks). To capture this behavior I have written the following “snake function” which creates a sequence which winds up to 8 and then counts down from 16, up to 24, etc.

# Creates the sequence: 1,2,3,4,5,6,7,8,16,15,14,13,12,11,10,9,17,18,19,20....
snakeFunction <- function(n, rowLen=8)
{
  lapply(1:n, function(i){
    groupIndicator <- ceiling(i / rowLen)
    # Increasing group
    if (groupIndicator %% 2 == 1)
    {
      i
    }
    # Decreasing group
    else
    {
      rowLen*groupIndicator - ((i - 1) %% rowLen)
    }
  })
}

# Create the list for out draft (8 drafters, 16 rounds)
pickOrdering <- snakeFunction(8*15)

unlist(pickOrdering)
##   [1]   1   2   3   4   5   6   7   8  16  15  14  13  12  11  10   9  17  18  19  20  21  22  23
##  [24]  24  32  31  30  29  28  27  26  25  33  34  35  36  37  38  39  40  48  47  46  45  44  43
##  [47]  42  41  49  50  51  52  53  54  55  56  64  63  62  61  60  59  58  57  65  66  67  68  69
##  [70]  70  71  72  80  79  78  77  76  75  74  73  81  82  83  84  85  86  87  88  96  95  94  93
##  [93]  92  91  90  89  97  98  99 100 101 102 103 104 112 111 110 109 108 107 106 105 113 114 115
## [116] 116 117 118 119 120

We can then use this pick ordering list to index the draft picks in our draft data frame.

# Add the pick numbers to the draft data
draftData$PickNumber <- pickOrdering
draftData <- as.data.frame(lapply(draftData, unlist))

# Order the draft data by pick number
draftData <- draftData[order(draftData$PickNumber),]

# Make the player a string instead of a factor
draftData$Player <- as.character(draftData$Player)

print.data.frame(head(draftData, n=10), row.names=FALSE)
##  Round   Drafter              Player PickNumber
##      1   Michael     Ezekiel Elliott          1
##      1    Andrew        LeSean McCoy          2
##      1   William      Michael Thomas          3
##      1     Chris          A.J. Green          4
##      1    Joseph     Devonta Freeman          5
##      1      Evan        Jordy Nelson          6
##      1 Kuuuuuuby Christian McCaffrey          7
##      1     Jacob          Mike Evans          8
##      2     Jacob       Jordan Howard          9
##      2 Kuuuuuuby       Brandin Cooks         10

Now that the draft data has been prepped, we can load in the rankings data:

rankings <- read.csv("./data/EspnPprRankings2017.csv", stringsAsFactors = F)

# Read player as string, not factor
rankings$Player <- as.character(rankings$Player)

print.data.frame(head(rankings, n=10), row.names=FALSE)
##  Rank            Player Pos Team
##     1     David Johnson  RB  ARI
##     2      Le'Veon Bell  RB  PIT
##     3     Antonio Brown  WR  PIT
##     4       Julio Jones  WR  ATL
##     5 Odell Beckham Jr.  WR  NYG
##     6      LeSean McCoy  RB  BUF
##     7      Jordy Nelson  WR   GB
##     8        Mike Evans  WR   TB
##     9        A.J. Green  WR  CIN
##    10   Devonta Freeman  RB  ATL

This dataset is already in the tall format we want so there will be no reformating needed. However we will need to remove the Franchise players from the rankings set and adjust the rankings of the remaining players. This can be accomplished by filtering out the players in franchisePlayers and then reranking the remaining players in consecutive order.

# Remove franchise players
rankings <- rankings[!(rankings$Player %in% franchisePlayers), ]

# Apply consecutive ranking to remaining players
rankings$Rank <- 1:nrow(rankings)

print.data.frame(head(rankings, n=10), row.names=FALSE)
##  Rank          Player Pos Team
##     1    LeSean McCoy  RB  BUF
##     2    Jordy Nelson  WR   GB
##     3      Mike Evans  WR   TB
##     4      A.J. Green  WR  CIN
##     5 Devonta Freeman  RB  ATL
##     6  Michael Thomas  WR   NO
##     7   Melvin Gordon  RB  LAC
##     8   Jordan Howard  RB  CHI
##     9  DeMarco Murray  RB  TEN
##    10       Jay Ajayi  RB  MIA

Now that this final step has been performed we can just join draftData with rankings by Player. Note that this is a left join by default so the resulting dataframe will only contain rows from our draft dataframe. The ranking dataset only includes the top 300 players. Thus some players will not have a ranking so after the join these players will have a NA rank. We will replace this with the worst possible ranking (nrow(rankings)) plus 1.

# Join rankings data with draft data 
joinedData <- join(draftData, rankings, by="Player")

# Some players will not have a  ranking, give these players a worst ranking + 1
joinedData$Rank[is.na(joinedData$Rank)] <- nrow(rankings) + 1

print.data.frame(head(joinedData, n=10), row.names=FALSE)
##  Round   Drafter              Player PickNumber Rank Pos Team
##      1   Michael     Ezekiel Elliott          1   24  RB  DAL
##      1    Andrew        LeSean McCoy          2    1  RB  BUF
##      1   William      Michael Thomas          3    6  WR   NO
##      1     Chris          A.J. Green          4    4  WR  CIN
##      1    Joseph     Devonta Freeman          5    5  RB  ATL
##      1      Evan        Jordy Nelson          6    2  WR   GB
##      1 Kuuuuuuby Christian McCaffrey          7   28  RB  CAR
##      1     Jacob          Mike Evans          8    3  WR   TB
##      2     Jacob       Jordan Howard          9    8  RB  CHI
##      2 Kuuuuuuby       Brandin Cooks         10   12  WR   NE

We now have a dataframe with each draft pick with the corresponding pick number and player ranking. We can then use the pick number and player ranking to measure the quality of each draft pick.

joinedData$RankDiff <- joinedData$Rank - joinedData$PickNumber

print.data.frame(head(joinedData, n=10), row.names=FALSE)
##  Round   Drafter              Player PickNumber Rank Pos Team RankDiff
##      1   Michael     Ezekiel Elliott          1   24  RB  DAL       23
##      1    Andrew        LeSean McCoy          2    1  RB  BUF       -1
##      1   William      Michael Thomas          3    6  WR   NO        3
##      1     Chris          A.J. Green          4    4  WR  CIN        0
##      1    Joseph     Devonta Freeman          5    5  RB  ATL        0
##      1      Evan        Jordy Nelson          6    2  WR   GB       -4
##      1 Kuuuuuuby Christian McCaffrey          7   28  RB  CAR       21
##      1     Jacob          Mike Evans          8    3  WR   TB       -5
##      2     Jacob       Jordan Howard          9    8  RB  CHI       -1
##      2 Kuuuuuuby       Brandin Cooks         10   12  WR   NE        2

At this point all of the data has been prepped so we are ready to generate some plots and do a bit of analysis.

Figures and Results

Probably the most natural measure of performance would be to look at the average difference between pick number and rank, across all rounds, organized by Drafter. We can calculate this using aggregate.

avgDiffs <- aggregate(RankDiff ~ Drafter, joinedData, FUN=mean)

# Order factor by worst draft to best
avgDiffs <- within(avgDiffs, Drafter <- factor(Drafter, levels=avgDiffs[order(avgDiffs$RankDiff), "Drafter"]))

# Order dataframe by worst draft to best
avgDiffs <- avgDiffs[order(avgDiffs$RankDiff, decreasing = FALSE),]

print.data.frame(avgDiffs, row.names=FALSE)
##    Drafter  RankDiff
##      Chris -8.933333
##     Joseph 16.666667
##      Jacob 18.133333
##       Evan 18.400000
##  Kuuuuuuby 30.400000
##     Andrew 31.200000
##    William 31.466667
##    Michael 43.933333

Thus we can see that Chris had the best draft (on average he got players 8.9333333 positions under rank value) and Michael had the worst (on average, he got players 43.9333333 positions over rank value). Here is a quick visual of the above dataframe.

library(ggplot2)
ggplot(avgDiffs, aes(x=Drafter, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")

I was curious to see how undervalued/overvalued players were by position. We can use a similar methodology as above to figure this out.

posDiffs <- aggregate(RankDiff ~ Pos, joinedData, FUN=mean)

posDiffs <- within(posDiffs, Pos <- factor(Pos, levels=posDiffs[order(posDiffs$RankDiff), "Pos"]))

# Order dataframe by worst draft to best
posDiffs <- posDiffs[order(posDiffs$RankDiff, decreasing = FALSE),]

print.data.frame(posDiffs, row.names=FALSE)
##   Pos     RankDiff
##    WR  -0.97435897
##    RB   0.05263158
##    TE  32.58333333
##    QB  40.27272727
##  D/ST  44.50000000
##     K  60.14285714
##   DST 161.33333333
ggplot(posDiffs, aes(x=Pos, y=RankDiff)) + geom_bar(stat = "identity") + ylab("Mean Difference Between Pick # and Ranking")

Here we can see that WR were under valued by 0.974359 positions and DST were overvalued by 161.3333333 positions.

I was also curious to see how draft pick quality changed by round. In the next two figures I am plotting Pick # vs Rank with a diagonal line indicating a pick at the rank value. Thus all picks below this line are “good” and all of the above are “bad”. In the first figure I colored the picks by Drafter and in the second I am coloring by position.

# Scatter colored by drafter 
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Drafter)) + geom_abline() + xlab("Pick #")

# Scatter colored by position 
ggplot(joinedData, aes(x=PickNumber,y=Rank)) + geom_point(aes(color=Pos)) + geom_abline() + xlab("Pick #")

It is pretty clear that as time goes on, worse and worse picks made.

Finally I wanted to see how each drafter performed by round, the following figure accomplishes this (here any pick below the dashed line is “good” and anything above the dased line is “bad”).

# Scatter colored by position, faceted by drafer
ggplot(joinedData, aes(x=Round, y=RankDiff)) + geom_line() + facet_wrap(~Drafter, ncol=2) + geom_hline(yintercept = 0, linetype=2) + geom_point(aes(color=Pos)) + ylab("Difference Between Pick # and Ranking")

Conclusion

Here are some final take aways: