finding meaning in Data

 

The “Cyber” in cyberphysical systems

78335_JLJBETQG_3000.jpg

The Cyber element of Cyber-Physical Systems is focused on the analysis of data. How can we find true stories in data — what the data really means, and how can we use it?

My coursework included tasks looking at data collection and analysis, thinking deeply about how data can be used and, critically, how it should not be. How can we be clear that our data answers the question we are asking? And how can we present data in ways that tell powerful, actionable stories?

Case Study: Where are People from?

For one assignment we were asked to look at open data sets provided by the Australian Capital Territory Open Data Portal and analyze the data that offered through Tableau, a data-analysis and visualization tool. In my analysis of data on the nation of origin for immigrants in the territory, I discovered that data on offer from the ACT and the way Tableau inherently interpreted that data created a series of omissions. These omissions were related to political decisions around the names and recognition of nations.

At first, this error was clear: the software had checked the ACT’s dataset against an internal atlas of names. In the ACT’s data, the UK was identified with the inclusion of the “Isle of Man.” Tableau couldn’t find “United Kingdom and Isle of Man,” and so it simply excluded the listing from the map, moving Iceland to the top position. Closer inspection identified the obvious error. Closer inspection revealed that this sort of “bug” — or bias — would impact smaller proportions disproportionately through exclusions that were less notable. How many immigrants come from the Gaza Strip, for example, vs “Palestine?” Tableau maps one, ACT identifies the other. The effect is that an entire population is invisible within the Tableau map.

Key Learning

Data do not exist independently of the ideas, instruments, practices, contexts and knowledges used to generate, process and analyse them.
— Rob Kitchen (2014)

As a result of this exercise, I began thinking about what is made visible and invisible by data. When we look at a spreadsheet, we see slices of the world, rearranged into discrete facts and presented in isolation from their broader contexts. As the Kitchen quote explains, each presentation of data is the result of a series of decisions about how to focus data for the purposes of the presentation or the data set. “Data” as we define it does not exist in the world, and therefore is not “extracted,” it is built from observations and shaped into something useful.

It would be impossible to represent the full scope of human experiences through data sets. To me, excluding human emotion from data is cutting off a crucial form of data’s usefulness. If we want to use data to make improvements to the world. How can we do that if we reduce the human experience to columns and rows? And how might we begin to bring data to life in an emotional way, while still serving as a factual and true measure of the world?

I began thinking about ways data might be reconnected to the subjective, emotionally complex world we draw it from, and I began to think about sound. In our coursework, we looked at visualization and design — how to ensure data is communicated effectively, transparently, and honestly. I began to look at tools that can turn data into sound, and to consider the way data might be reconnected to human emotion.

Data in Sound: “The Invisible Breath of the Market: q1 2020”

I suddenly realized that all a light curve is, is a table of numbers converted into a visual plot. So ... we translated those numbers into sound. I’m able to do physics at the level of an astronomer using only sound.
— Wanda Diaz Merced, TED Talk (2016)
twotone charts.JPG

As a way to explore data and sound, I downloaded data sets from Yahoo! Finance, which offers a .csv file of historic opening bell stock prices (visualized in the image above, top row). I trimmed data to look at the market exclusively from the first confirmed US outbreak of the coronavirus, 22 January 2020 (seen in row two). This data told a story, but that story was relatively straightforward. I then added a layer of sound that tracked confirmed cases of coronavirus from Wuhan, China, from the same date (row three). This created added levels of complexity to the story.

I brought that dataset into TwoTone, an online app that creates very simple midi scores out of .csv files. This produced a straightforward musical chart of notes escalating in pitch as numbers rose, and descending in pitch as the numbers decreased. This was suitable for the presentation of the data, but the tone was wrong: it sounded like a child playing with a keyboard. Tone was an essential element of this experiment: poorly executed “musicality” would be just as bad as poorly designed graphics. Simple synthesized instruments present a whimsical tone, while the data was describing something more tragic.

So, I began post-processing the sonic data, much as one might begin rethinking the colors or priorities made in a chart or visualization. Using two separate musical processing programs — PaulStretch, and FL Studio — I slowed the original output down so that each “day” in the data roughly corresponded to the length of one deep breath. This moved the piece from just under 2 minutes to nearly 14 minutes long, but created an eerie atmosphere which revealed something more than the spreadsheet.

In the final score, the data becomes abstracted, and of course, requires a key, which I think is discernible, but to be clear: The stock market is relatively repetitive in the beginning, only transforming at the very end (the visualization above is a handy reference). Meanwhile, a series of escalating, dissonant sounds emerge as case numbers rise in China, while the stock market continues unabated. It is only as the rise of the US cases mounts that we start to hear the market crash, and then an abrupt end as Q1 2020 comes to a close.

The result, in sound form, is a 14-minute-long sound piece — I hesitate to call it “music,” though it has elements of musicality. As I listened to this I was struck by how haunting it was: a story of a distant, ignored threat finally striking. It suggests questions about the response to the Chinese epidemic, and the American response; about the relationship between the markets and health, and serves as a different experience of key data sets reflecting this unprecedented moment in history.

What’s Next?

While a 14-minute long minimalist sound experiment may not be helpful in “communicating” data, it was proof of my concept that data could be analyzed through different lenses, rather than isolated from emotion or human experience. If the Tableau experience taught me that data could be present in ways that exclude by political lines, my time creating the sonification piece taught me that data could be presented in inclusive ways: sound transcends language and numeracy. There is a time and a place for objective analysis; but we need to be aware of what data “means” to the environments, people, and places we collect it from. Analyzing data with unique lenses to reveal insights is a critical skill for understanding and interpreting data. As I move forward as a practitioner of the New Branch of Engineering, I aim to constantly explore novel but informative ways to understand the gravity and meaning of data, to help myself and others act with greater social awareness and human sensitivity.

Skills Mastered

  • Contextualizing Data within social and political contexts.

  • Unique approaches to data visualization and analysis, such as sonification, and how they can be used to reveal patterns.

  • Interrogating “neutral” decisions in software and how they shape data, such as names in country lists.


Works Cited

Diaz Merced, Wanda. “How a Blind Astronomer Found Her Way to See the Stars.” Presentation at TED2016, Vancouver, Canada. Retrieved online via https://www.youtube.com/watch?v=-hY9QSdaReY [1 June 2020]

Kitchin, Rob. (2014). Conceptualising data. In Kitchin, R. The data revolution: Big data, open data, data infrastructures & their consequences (pp. 1-26). London: SAGE Publications Ltd doi: 10.4135/9781473909472