Sonifying data, collaborating with an academic and holding Silicon Valley accountable

We talk a lot about visualizing data. But what if you’re a radio journalist? Reveal, an audio show by the Center for Investigative Reporting, uses sound to illustrate their findings. Sinduja Rangarajan from the Reveal podcast recalls one of her data-driven radio projects for us.

Some of the largest technology companies nestled in Silicon Valley have a track record of poor gender and racial representation amongst their workforce. A few companies, such as Google, Facebook and Apple, release official reports of who works in their companies every year. But a majority of companies don’t release any data.

In a series of data-driven investigations, our team at Reveal used a range of methods to pull aside the veil of secrecy around Silicon Valley’s poor diversity numbers. First, we asked 211 of the largest technology companies headquartered in Silicon Valley to release this data. 23 responded with their numbers. Then, we found an academic at the Center for Employment Equity at the University of Massachusetts Amherst who had access to this data and agreed to give us anonymized numbers for 177 companies.

Six large tech companies had no female executives at all

Anonymized means: He took the names of the companies out of the data and jumbled the numbers up in such a way that we can’t find the names of the companies, either. This is the most comprehensive data we have on diversity in Silicon Valley technology companies – and it happened because of an academic collaboration.

Even this anonymized data gave us a lot of insight into the scale and scope of what diversity looked like in Silicon Valley. For example, we learned that there were six large Silicon Valley technology companies that had no female executives at all in 2016. We learned more about the distributions of diversity and the averages for the industry. Reveal and the Center for Employment Equity co-published independent reports based on that data.

Screenshot of the spreadsheet used to calculate numbers to divide the singers into different groups.
Source: Sinduja Rangarajan/Reveal

We took this data and thought about how we could convert it into sound for our radio show. Our goal with sonifying the data was to present facts in an emotionally engaging and powerful way on the radio. The other reason to use sonification, especially in conjunction with data visualization, is to improve the accessibility of your article to audiences with visual impairments.

The disparities in the data, especially at the executive level, were so high that we thought the data lent itself to sonification.

To turn data into sound, one of the most important things is that the story is clear. In this case, we knew that 73 percent of executives were White, 21 percent of the executives were Asian, only 3 percent were Latino and 1.4 percent were black. We sat down to brainstorm about how to convert this data into sound. One idea that Reveal has implemented in past has been to convert data points into midi files and then running the files through a synthesizer.

A choir directed by data

For this project, the point that we were trying to convey was the importance of diversity in a workplace. A powerful auditive image in that context was that of louder voices drowning out the rest. So we thought of using a choir of real voices to illustrate this point.

We called our friends, family and Reveal members to sing as a choir at the First Unitarian Church of Oakland on a weekday evening. We’d prepared an Excel spreadsheet that calculated percentages dynamically based on how many people would show up. For example, if 40 people showed up, 78 percent of them – or 31 people – would sing “White executives”, 8 people would sing “Asian executives”, 1 person would sing “Black executive” and 1 person would sing “Hispanic or Latino executive”. Our lead sound designer Jim Briggs directed the choir by assigning different notes to the different groups that he then stacked on to a chord.

More details on the sonification are here. I hope you enjoy them! You can explore our underlying data and distributions here.

To listen to the podcast, go to the Reveal page for more information or listen to it directly below 👇.


Sinduja Rangarajan

Sinduja is a data reporter at Reveal from the Center for Investigative Reporting. She joined Reveal as a Google News Lab Fellow in 2015. She has a bachelor’s degree in computer science from the University of Mumbai and a master's from the USC Annenberg School of Communication and Journalism.

Runs on:

How many stickers do you have on your laptop?
I did not know that it was a data journalist thing. I thought it was only me.

How many pie charts have you built?
Zero. I try not to use them.

How many times per week do you have to explain what "data journalism" is?
My newsroom folks know what I do. But with my family, I often avoid the topic of explaining what I do. Sometimes, I say I work in tech and then my family don't ask anything else.

How many items are on your desktop?
Who's counting? When they go out of my visual range, I know it's time to clean up.

Swear words per day?
I don't. I am a mom and i try to channel by anger by drinking tea and coffee. Healthy living y'all.

How big was your biggest data set?
6.5 million rows

snow flake
© 2018 Journocode