Clever Mathematicians

Judgment about what data is analyzed, using which criteria, remains the critical difference between signal and noise.

The one and what I say about it make two; two and one make three. If we go on this way, then even the cleverest mathematician can’t tell where we’ll end, much less an ordinary person.

Sound familiar? I have often thought that this passage, written a few centuries before the start of the Common Era by the Chinese thinker Zhuangzi, nicely encapsulates certain things we are beginning to understand about information—or, as we are now encouraged to say, data—and how it is generated and processed. I was recently reminded of this passage in rereading Steven Bell’s excellent little article, “Promise and Problems of Big Data,” in the March, 2013, edition of Library Journal, in which he reviews some of the issues that should give us pause about the use of big data.

Discussions about big data invariably focus, formulaically, on its volume, velocity, and variety. Lots of information generated at astonishing speed in an enormous variety of formats. There is simply no precedent for the amount of information we now have access to—or for the analytical skill required to draw meaningful conclusions from it. It all gets to sounding a bit bewildering and, because it involves technology, a bit seductive. How can I be a part of this?

For the academic librarian, big data represents a number of opportunities. As experts in data management and preservation, academic librarians can extend their service to the patron community in such areas as information literacy and data visualization. Closer to home, big data analysis can be used to enhance the value of the library itself by analyzing, for instance, transactional data and using them to create ever deeper altmetrics of library assets, usage patterns, patron information, and so forth. The abundance of information being generated makes for an almost irresistible target.

And therein lies the rub. For it is the nature of information to begin to reflect on itself. In this era of big data we are coming to understand that the generation of information is a geometric rather than an arithmetic process. Big data does not simply imply the processing of extremely large amounts of information; it also means applying analytical techniques to the information derived from the information derived from the analysis of the thing (asset, document, behavior) itself to create . . . more information. The two and the one make three. Without the imposition of critical judgment, the process will continue to spin off layer after layer of data, each reflecting on itself in ever-widening circles of abstraction.

There are costs to this process, and not just the physical costs of data processing, storage, and management but also the opportunity costs inherent in roads not taken because of time lost in the analysis of the wrong data or the use of the wrong analytical criteria. All of which is by way of saying that judgment about what data is analyzed, using which criteria, remains the critical difference between signal and noise. Most of us are not the clever mathematicians imagined by Zhuangzi, but modern technology has invited us to act as such. We need to know how and when to push the <End> button.

Author photo

About the author:

Mark Cummings is the editor and publisher at Choice