The S2 Cognition Test: C.J. Stroud and the Failures of Science Journalism
A year ago, I was able to talk to a dozen scientists and football analytics experts about the validity of the S2 Cognition Test. The misreporting of the tool led to unjustified public embarassment.
In April of last year, NFL draft media was overwhelmed with reports of a new evaluation tool that could change the draft landscape – the S2 Cognition test. The test, as reported by the media, was apparently a silver bullet to draft evaluation, especially at the quarterback position.
It helped identify eventual success Brock Purdy while elite quarterbacks such as Josh Allen, Patrick Mahomes, Drew Brees and Joe Burrow all aced it. This could be the key to identifying future Hall of Famers, it seemed.
But that’s not what the test purported to do. Draft media oversold the test and it burned the creators.
Over the course of the season, the success of rookie quarterback C.J. Stroud, who reportedly scored low in the measure, cast doubt on the S2 Cognition test. Social media posts would sometimes contain sly references to the test as Stroud notched accolades on his way to winning, presumably, offensive rookie of the year.
But those off-the-cuff shots turned into something a bit more serious when the Wall Street Journal published a piece titled “They Created a Test to Identify Star QBs. How Did It Miss the Best One in Years?” by Andrew Beaton and Louise Radnofsky.
The text of the piece itself handles the question with some nuance and gives the company founders a voice, but the title and a viral quote diminished the test – and the company – in the eyes of the football-watching public.
In the wake of Stroud’s play this year, the S2 Cognition test has been ridiculed for whiffing on the best quarterback prospect to enter the NFL in years. That is unsurprisingly aggravating to the performance evaluation company’s two founders, a pair of cognitive neuroscientists named Brandon Ally and Scott Wylie. They say that Stroud’s result should never have leaked and that as soon as his score came in, it was flagged as potentially invalid and an unreliable result.
To many commentators, this looked like post-hoc justification for the “failure” of the test. Former Houston Texans defensive lineman J.J. Watt said as much in a tweet that, as of the morning of January 21, had over 40,000 likes.
Watt’s tweet mischaracterizes the claims of the founders in some key ways. The first is that the founders never confirmed that Stroud scored poorly on the test. In fact, one of the founders, Brandon Ally, seemingly implied that Stroud actually hadn’t scored poorly at all in an interview on the Pat McAfee Show shortly after Bob McGinn leaked a bevy of quarterback scores at Go Long TD.
Ally argued that two of the eight leaked scores weren’t accurate and pointed out that one quarterback did not register a valid result because of the testing conditions.
It should be noted that it’s not obvious that the quarterback who had to retest is Stroud. In fact, there’s reason to believe that it’s not – Stroud did not attend an all-star game during the draft circuit. That doesn’t exclude Stroud; he could have taken one at an all-star game or similar event without participating, but it’s not a smoking gun.
Nevertheless, it’s easy to believe that Ally is pushing back specifically against Stroud’s score being an accurate representation of what their testing found given the consternation he seems to have to specific references towards the prospect’s score.
The leaks McAfee and Ally are referring to come from that McGinn piece and there were only eight quarterbacks with listed scores. Young has been taking the test for years – itself worth some discussion – and it’s likely that his score is accurate.
That leaves two of the seven quarterbacks, though we have received multiple reports for Jake Haener, Will Levis and Anthony Richardson, reducing the likelihood that they had erroneous scores as well.
Many asked why they made the statement about the invalid result now instead of prior to the draft, but that’s fairly simple; their contract forbids them from revealing player scores. But in light of what happened here, they could have asked for a waiver from that clause or needed to wait until after the draft, when the information is significantly less valuable to have kept private.
In Watt’s tweet, there is a quote about the lack of success for low scorers. And while that references a genuine piece of reporting from McGinn, the piece attributes that quote to a team executive, not to the company.
Throughout the process, S2 had been modest in their public messaging about the cognition test, though reporters have not been. That much was made clear in my conversations with the founders nearly a year ago.
That has been repeated in their public appearances on podcasts like the Pat McAfee Show and the PFF NFL Show, where Ally told Sam Monson, “Our database right is now 3600 NFL prospects, so it would be that three percent of college athletes that have declared for the draft and have gone through that process. When you get a percentile rank, it is compared to those players,” he said.
“I think in the context of ‘hey, this is a low number,’ a 50 is an average NFL prospect,” Ally added. “That guy is getting drafted, he's going to be on a roster. Low numbers are, again, it's all contextual. It doesn't mean a player can't play. It just means he's going to have more areas where he's going to struggle with than others.”
They work directly with teams on how the data is intended to be used. “The feedback we're getting from our teams,” said Ally to Monson, “is that nobody is hanging ‘oh this guy had a high number, we're drafting him’ or ‘this guy had a low number, we're not drafting him’; it's more about context, about hey what are we asking this athlete to do.”
That approach is consistent with the conversations I had with the founders last year in how they described the test. But hold on, who are the co-founders and why do they have the authority to create this kind of test?
Who Are S2 Cognition?
Ally and Scott Wylie are now assistant professors at the University of Lousiville in the Department of Neurosurgery. Ally received his PhD in neuropsychology from the University of Southern Mississippi while Wylie received his from Indiana University and both have been active in clinical and research-oriented work in their post-doctorate academic career, including stints at Harvard Medical, the University of Virginia and Boston University along with Vanderbilt and Louisville.
Ally and Wylie met while at Vanderbilt after a discussion about the NFL draft in 2014. Phrases like “he sees the whole field,” and “this player has a nose for the ball” piqued their interest – this was something that was in the arena of their scientific background.
“We're currently using the Wonderlic, which is a paper and pencil. It's timed, but it's not the same kind of timing we engage in when we cross the white line,” Wylie told me. “We said, ‘man, we've just had tools for the cognitive sciences that have been studied for decades, in many cases, that could really actually look at the capacity of these rapid processing decisions, both in terms of visual aspects, which a lot of people were starting to sniff around, but also the other aspects that are critically involved in the dynamic ways that athletes use their memory systems.”
In truth, S2 has a very narrow focus. “I don’t want people lumping us in with the Wonderlic test,” Ally told me. That’s a fair concern. The Wonderlic is a reasoning test with some memorization elements to it not unlike a very simple, non-tactile version of the IQ test.
It has not shown any significant capacity to predict NFL performance at any position.
Ally’s claim is that the S2 Cognition test maps more closely to actual football tasks on the field, which include making snap-judgment decisions on briefly presented visual information with conflicting cues. It also does not purport to test the intelligence or psychology of any athlete. It is primarily concerned with visual processing.
Their work with Alzheimer’s and Parkinson’s patients gave them the skill set to approach these problems as their patients also needed to be tested for cognitive function on a regular basis. But they encountered a problem – moving from a general population to one selected for elite traits meant needing more precise equipment.
And they needed to have answers to that problem right away.
Can S2 Cognition Do What It Claims To?
Keep reading with a 7-day free trial
Subscribe to Wide Left to keep reading this post and get 7 days of free access to the full post archives.