22 Mar

Building a Scouting Algorithm Part 1

I recently read a two part series on Building a Football Scouting Algorithm Part 1 & Part 2 by Richard Whittall. He showed his work as “The ultimate aim of this series is to provide a very simple model for smaller clubs with very limited resources to potential improve their success rate in the summer transfer window.”

I have used the the method described to build my Youth Soccer Capitalization Rate study and other projects. This is a useful method that can be used across multiple disciplines. At Concordia I used a similar method to determine how to find the best Graduate Assistant. An example of using it is a job search is shown below.

In Part 1 of my mini series I will summarize the two parts that sparked my imagination. In Part 2 of my mini series I will demonstrate a simple algorithm for college coaches to use in recruiting.

Daniel Kahneman sums up what this means in Thinking: Fast & Slow:

The important conclusion from this research is than algorithm that is constructed on the back of an envelope is often good enough to compete with an optimally weighted formula, and certainly good enough to outdo expert judgment.

Kahneman even suggests a practical example:

Suppose that you need to hire a sales representative for your firm. If you are serious about hiring the best possible person for the job, this is what you should do. First, select a few traits that are prerequisites for success in this position (technical proficiency, engaging personality, reliability, and so on). Don’t overdo it—six dimensions is a good number. The traits you choose should be as independent as possible from each other, and you should feel that you can assess them reliably by asking a few factual questions. Next, make a list of those questions for each trait and thin about how you will score it, say on a 1-5 scale. You should have an idea of what you will call “very weak” or “very strong.”

This all may sound very familiar to some of you so far—that’s because I wrote about using a ‘back-of-the-envelope’ unit-weighted algorithm for scouting footballers a few years ago for theScore (an article that sadly lives on only via the Wayback Machine).

The difference this time around is I would like to finally implement one, test it, and compare it to subjective judgments from myself and others, whether from fans or media pundits.

I’m going to do this over a series of posts.

  • First, I will propose a simple model with equally weighted variables, and I hope to keep it as simple as possible. The criteria for a successful transfer will likely involve a minimum percentage of of injury-free playing minutes, say 70%.
  • Next, I will apply the model to players from last year’s Premier League transfer window to get a rough idea of how it works in practice.
  • Third, I will tweak the model and run it against this year’s Premier League window, and compare conclusions to my own and pundit/fan predictions.

This method is simple and has proven to work. Why don’t college coaches use this method to find players (instead of waiting for Admissions to send them a list of interested students)? Why don’t Athletic Directors use this method to find Head Coaches (instead of ‘having a feeling’ during the hiring process)? How comes Head Coaches don’t use this method to find assistant coaches (instead of just hiring friends they want to hang out with)?

In part 2 of Whittall’s post he works through developing a sample algorithm and comparing to to the English Premier League transfer lists.

Whittall’s Scouting Algorithm will have 5 variables, and I’m proposing a 1-3 score for each. They are:

  • Age There has already been voluminous work on this topic from many different respected analysts, most recently from the likes of Colin Trainor and Garry Gelade. Obviously, we can get more position specific with these, but at this stage I think sticking to the findings of Simon Gleave—3 points for peak age range, 2 for those <2 years outside the peak range, and 1 for those >2 years outside peak range.
  • Relative Quality of Previous Club Meaning whether the player is coming from a team of lesser, equal or greater quality than your own. Obviously, measuring relative quality is difficult at the best of time, so this will either involve a measure of subjective opinion, or, even better, something like Lars Schiefler’s clubelo.com, the IFFHS world club rankings, or even the UEFA club coefficient rankings. I would go 3 points for a team of substantially greater quality, 2 for roughly equal, and 1 for inferior.
  • Non-Injury Playing Minutes at Previous Club Essentially, how often did this player feature for their team the previous season? Or seasons? 70% or more of available playing minutes? 50-70%? Less than 30% One can discount recovery from this measure, but including it could also make it a crude proxy for proneness to injury. This also balances nicely with our Relative Quality measure, for the reason that a very good player may not be able to break into the Barca first team, but a so-so player may also get a lot more minutes at a mediocre club.
  • History of Transfer Success at Your Club As measured in percentage of available minutes played for recruits in their first season, perhaps. Remember the outside view! Though obviously the percentage of playing minutes for recruits will depend on a host of factors including many of the above, this can at least help measure some of your own club’s success at integrating new transfers. This will involve measuring your team’s record against the league average (and some work for yours truly).
  • Transfer Market Value Though the transfer market valuations on transfermarkt.co.uk are fairly controversial, and while the market as a whole is often wildly inefficient, this is a decent proxy for the going perception of quality as measured in potential transfer fees. To make this relevant, points might be awarded based on whether the fee for potential player is higher, roughly equal to, or lower than the highest market value player at your club.

So here we have five variables to consider before we’ve even evaluated how well a player kicks a ball. This could obviously be used in conjunction with traditional scouting methods, or as a kind of rough “filter”. Some of these involve a little creative work…teams closer to the base the football pyramid won’t necessarily have a handy, evidence-based club rating systems, or even a way to properly gauge transfer market value.

Also note that we’re not determining whether a potential transfer is “good” or not, but instead developing a scratchpad measure of risk. Clearly, a lower-priced older player who couldn’t break into the first team at an inferior club has to be some hell of a diamond of the rough to be considered worth picking up. Nevertheless, buying a high risk player isn’t bad per se, but buying a player without knowing either why they’re high risk or that they’re high risk at all is obviously not a good idea.

For now, however, this is what I’m going with. It’s simple enough to be put into use almost right away. Ideally next week we’ll apply it to a few Premier League transfers from last summer’s window. I can’t promise a linear regression to measure its effectiveness unless you’d like to chip in and help (again, even that is above my level of expertise), but we should still be able to get a sense of how predictive it was across a few cases.