Measuring What Matters: “Big Data,” Data Analytics, and Student Success

By Richard P. Keeling

Numbers matter. Quantitative data provide important information, from how many miles it is to our destination to what percentage chance there is of rain today; we use numbers to verify or more precisely describe something we sense or feel, like how hot or cold it is, and to understand key trends, such as the percentage decline in high school graduates in some regions of the country. Although “data” might include qualitative information (such as stories, observations, and eyewitness reports) in some contexts, “data,” as we use the term, almost always means numerical measurements. Numbers, in the form of binary code, make possible our computing and communications technologies.

Intensive, and sometimes rigorous, assessments based on numbers (“data analytics,” or “big data”) have become common tools in business, science, and higher education. Enrollment analyses and projections, comparative rankings, and measures of institutional performance against various criteria or standards have become central to the way colleges and universities—and the public—think about higher education. Sometimes we question numbers (“Our university should be ranked higher than that”), but, more often, numbers question us (“The data just don’t back up your statements about the effectiveness of that program”).

Numbers have critical importance in medicine; we don’t call heart rate, respiratory rate, blood pressure, and temperature “vital signs” for no reason. Laboratory data, most of it numerical, guide many clinical decisions. But medical students (I was one) learn quickly to interpret all the numbers used to characterize a patient’s condition carefully, and always in context. For example, an elevated level of blood urea nitrogen (BUN, a measure of kidney function) could mean serious kidney disease or simple dehydration; it might cause great alarm in some circumstances and far less concern in others. The numbers that appear on laboratory reports, even if completely accurate, paint only part of the patient’s picture. Many a patient has “died a Harvard death” with normal laboratory results, and a lot of people walk around with persistent laboratory abnormalities to which their bodies have accommodated. All of which means that you have to know the person, the patient, to understand the data. 

So it is in higher education: you have to know the person, the student, to understand the data. But in a collective rush to have and use numerical metrics to support strategy, decision-making, and resource allocation, institutions of higher education and their constituents, from legislatures and funders to alumni and trustees, have elevated data analytics and forgotten the person, the student. There are long-standing questions about the methods and value of institutional rankings, but those reservations have not (yet,, at least) shaken the foundations of that industry. Strangely, questions about rankings have also not (yet) generated parallel questions about the utilization of “big data” for other purposes in higher education. Academic programs and credentials in data analytics have proliferated, and administrators whose main distinction is their possession of an armamentarium of data have risen in stature (and power) in many colleges and universities.

“…institutions of higher education and their constituents, from legislatures and funders to alumni and trustees, have elevated data analytics and forgotten the person, the student.”

Their tables, charts, and graphs use statistics (raw, refined, elegant, or manipulated) to present a view of the student by looking at measurements or markers for many, or all, students. Certain of these data enterprises in higher education have been, are, or may yet become useful; others only supplant more holistic (i.e., not just numerical) information and more thoughtful analysis with numbers upon numbers. In many universities, human observations and reflections are now trusted only if numbers—statistical representations— verify them. Tracking students’ progress toward various persistence, retention, and completion goals by monitoring some, or many, metrics may allow administrators to identify groups of students who are regarded as being, in one way or another, “at risk.” Why engage directly with students, when you can “know” them by tracking the digital indicators of their enrollment and performance? And how, other than by collecting and presenting all those metrics, can an institution compare itself with, compete with, or be ranked better than, its peers?

Why, and how, indeed. Mark Twain did not originate, but did popularize, this famous comment: “There are three kinds of lies: lies, damned lies, and statistics.” Data analytics, like any other variety of statistics about human beings and their behavior, work only on a population-level basis; they tell us little, if anything, about any individual student. That makes big data a clumsy instrument, because it depends on homogenizing students (i.e., lumping together all the students who have the same value on one particular metric— which is like lumping together all the students who grew up in Greenville, or who have brown eyes). Yes, it can reveal (though not always) large-scale characteristics and trends, but it inevitably overlooks (and intentionally so) the separate, particular, unique person who sits across your desk needing assistance. That person, your student, may be meeting with you because some combination of metrics identified them, or the institution, as being at risk of some negative outcome—in which case an important human interaction may follow a quantitative warning flag. But other human interactions, or the fact that there had been a lack thereof, might have been a better foundation for the conversation—and might also avoid giving the student the impression that the university is primarily watching out for its own interests (e.g., rates of this or that). Caring about students, engaging with them, and supporting their success is one thing; monitoring and tracking them to elevate a university’s statistical profile is another.

It is easy to monitor and track students in the aggregate; it is impossible to care about them that way. We can say we “care about students,” and mean it, in some abstract sense. But we can only really care about the students we know. The students we know are all marvelously different. Just as many patients will have the same result on a laboratory test like BUN, many students may have similar persistence or retention metrics—but each patient, and each student, has their own individual story, and we can’t understand the laboratory tests, or the persistence metrics, without knowing the patient, or the student, and the story. Knowing that a student is from Greenville, or has brown eyes, tells you remarkably little about that person.

Big data knows no student individually; it knows no stories. It therefore renders a picture of students that is disturbingly reminiscent of the one Malvina Reynolds wrote about more than 50 years ago, in a song that ended up, eventually, as the theme music for “Weeds”:

Little boxes on the hillside
Little boxes made of ticky tacky
Little boxes on the hillside
Little boxes all the same
There’s a green one and a pink one
And a blue one and a yellow one
And they’re all made out of ticky tacky
And they all look just the same

And the people in the houses
All went to the university
Where they were put in boxes
And they came out all the same…

There is something truly terrifying about “…All went to the university/Where they were put in boxes/And they came out all the same.” That—“…put in boxes/and they came out all the same”—is not higher learning. And that is also what is most distressing about big data when applied to students as well; it puts students into boxes and sees them as all the same. When big data drives institutional strategy, certain measures of organizational effectiveness, such as rates of persistence, may improve, at least for some groups of students—but those improvements do not include any measures of what students learned, what they became, or how their campus experience changed (ideally, transformed) them. From the big data perspective, metrics are students, and students are their metrics.

Big data is a tool that can be used along with other tools in the service of good ends. Data analytics has the potential to “do good” for students (e.g., by systematically identifying groups of students who need additional support), but it may also do harm by creating or advancing the view that tracking metrics is better (more efficient, more cost-effective, more successful) than engaging deeply and authentically with students as whole people. Why have a student affairs division, when you can “know” students by their numbers? Where that view prevails, there will be fewer, but far more stretched, advisors, counselors, and mentors working directly to know and assist students; in those places, statistical outcomes will have overcome human ones, and the institution’s fascination with rankings and comparative data will have scuttled its mission. Student success is most importantly measured by the development of knowledge, character, and values. Not one of those has even been, or ever will be, accurately measured by big data. 

This article was written for the Council for the Advancement of Standards in Higher Education (CAS) 2019 Annual Board Meeting. 

Keeling & Associates, LLC is a comprehensive higher education consulting firm that provides strategic planning, consultation, and executive search services to improve the quality and quantity of higher learning throughout institutions of higher education across North America.