Subscribe to Pittwire Today
Get the most interesting and important stories from the University of Pittsburgh.Uncertainty Rules as March Madness Begins
COVID-19 canceled the 2020 NCAA Division I basketball tournaments and while March Madness has returned this spring, the ongoing pandemic has modified the matchups, following a complicated 2021 season.
In a departure from its typical multi-city tournament venues, the NCAA 2021 men’s basketball championship will take place in Indiana with the men’s Final Four set for April 3 and 5, while the women’s tournament will be played in the San Antonio region with the women’s Final Four scheduled for April 2 and 4.
“Uncertainty typically is a friend to the underdog,” says sports analytics expert Konstantinos Pelechrinis, a Pitt faculty member who leads the Network Data Science Lab in the School of Computing and Information.
He predicts that could bring more upsets to this year’s tournament.
“Single-elimination games allow for more upsets and contribute to uncertainty for the final outcome. Add the COVID situation this season and you have even more uncertainty,” he said.
One huge element is whether a team would need to pull out of the tournament if players or personnel test positive for COVID. The NCAA’s no-contest rule dictates that if a team is unable to play due to medical reasons, their opponent advances.
Pelechrinis added that the move to a single-city format also may have an effect.
“Some of the most successful prediction models include as a variable the distance traveled to the site of play,” he said. “Teams that do not have to travel far might have some advantage, but how much is yet to be seen, given the pandemic circumstances as well.”
Traditionally the tournaments are played in four different locations. “Typically, some teams are in or near their actual location, particularly the top-4 seeds,” he said. This time around, a pure S-curve is being used for seeding.
The venue-related effects stand to be more pronounced in the women’s tournament. “Since 2015, the first two rounds were hosted by the higher-seeded team, giving them significant advantage playing at home court. Now that all rounds will take place in San Antonio, this home court advantage for the higher seeds in the earlier rounds is not there,” Pelechrinis said.
Another question looming large is whether the teams’ true quality has emerged this season. “What about teams that underperformed during the regular season due to issues with COVID, but their true power is much better? What about teams that played considerably fewer games than others?” Pelechrinis said.
“The vast majority of the conferences—including the Atlantic Coast Conference (ACC)—simply used the win percentage to seed the tournaments. A few conferences tried to adjust for the uneven schedule faced by teams, but not with very sophisticated approaches,” Pelechrinis observed.
Interestingly, this uncertainty is easier to quantify than the uncertainty of whether COVID will send a team home from the tournament, he said.
“This is a difficult problem to solve, but there are approaches one could take,” he said. “For example, in some of our past work we have explored various network approaches—similar to how Google ranks web pages—to ranking teams that have played an uneven schedule.”
Using this approach with the ACC as an example, Florida State and Georgia Tech emerged as the top two teams this season, while a win/loss percentage ranking yielded a different result with Georgia Tech falling to No. 4, Pelechrinis said. This was even more pronounced during the season, where win percentage had Georgia Tech ranked in the middle of the pack, while network rankings had them always ranked high.
Off the court, what will all this mean for fans’ odds of filling out a perfect bracket?
“The odds are already very slim, approximately 1 in 120 billion, assuming you make educated guesses and are not simply flipping a coin,” he said. “A little more uncertainty will not shift these odds in any meaningful way.”
Learn sports analytics
Pelechrinis is offering a new introductory-level course at Pitt this fall, Decision-Making in Sports, for students interested in sports analytics and how sports can aid in the understanding of uncertainty and risk. The 3-credit Information Science elective is also being piloted in the University’s College in High School program.
“There has always been an interest in sports and in statistics and data,” Pelechrinis said. “With teams and leagues investing in artificial intelligence and machine learning technology, interest is growing and so are job opportunities,” he said.
Students might pursue a career in the sports industry, but data analysis skills apply to almost any sector, Pelechrinis said. “It’s particularly important in the insurance, banking, medicine and engineering fields.”
Sports data wasn’t always thought to be worthy of scholarly pursuit, but it’s now being recognized as a testbed for numerous fields, including decision making and behavioral sciences and spatiotemporal statistics, in which where and when data was collected is taken into account, he said.
Looking at data analysis through the lens of sports analytics helps students more easily understand the concepts, he said. “Students can relate to something they understand. It is different to say ‘We want to predict the points scored by the Panthers in the next game using as our variables their field goal percentage and turnover percentage,’ rather than ‘Predict y using x1, x2 and x3.’ It keeps students interested. For me, it’s sports, but examples could come from music, movies or fashion. Whatever students can relate to,” he said.
“We focus first and foremost on understanding and quantifying uncertainty; interpreting probabilities and learning to embrace these principles in decision-making. Students will learn to build models for tasks like predicting matchups and evaluating performance and strategies,” said Pelechrinis.
Students interested in pursuing a career in data analysis may follow up with more advanced courses, but understanding data is important for everyone, regardless of their path.
“This is an introductory course that will enable students to understand what data and stats can or cannot tell us about a problem. They will learn to read ‘data stories’ in a critical way that will enable them to spot over-claims,” he said. “This is an important skill for anyone in today’s data-driven society; useful for instance in understanding data surrounding the U.S. elections, or pandemic-related statistics.”