In the story of "Murder in the arms of marriage", the data team at the award-winning InfoTimes
analyzed news reports on murder cases between wives and husbands in Egypt.
They worked on the project for ten months, and it helped them win the GEN Data Journalism Award this year.
Islam Salahuddin will describe for us how the team worked on the project from beginning to end, going through the details of the different steps.
InfoTimes is Egypt's only data-driven journalism agency and one of very few data-focused newsrooms in the Middle East. The team had started working on the project before I joined up nine months ago. I was responsible for it from nearly the middle of the process until the story was published.
The goal was to analyze the reasons, tools and frequency of murder and attempted murder crimes between wives and husbands in Egypt. The story idea came from a then-arising public debate, especially on social media, about the increase in the frequency of such crimes. So, we decided to settle the debate and explore what is behind it – with data.
News stories as a data source
First, we needed a dataset to analyze. In Egypt, we still do not have NGOs or other organizations that collect data about such crimes, or about almost anything else! Contacting the police departments or any governmental institution was not an option because, you know, transparency about data is not really their thing in our region. That's why we decided to depend on news stories as a source for data about the crimes that we aim to analyze.
To make things clear: news stories are not the best source of data:
- News don't cover each and every crime that happened in the geographic area and during the time period we were concerned with.
- Some news reports may not be fully accurate.
- Some news stories may lack fundamental information about the case.
This is in addition to the possibility of repetition and other issues.
How to work with shaky data
Since we had no better choice than depending on the news sources, though, we dealt with these issues directly:
- We tried to make it as clear as possible for the readers that we were analyzing only a sample of the crimes that actually happened, that this sample was not representative, that it just depended on the news coverage – and we admitted that it, consequently, followed its biases.
- We tried to verify each crime that went through our analysis from multiple sources of news.
- We worked on completing the missing pieces of information from complementary news sources.
- After all of that – a practice which is even more important in data journalism – we declared our data source for the readers, together with a short description of the data we gathered from it, as an intro for the story.
Scraping: Extract the info you need from a website
We then had to decide which news website we will depend on for the coverage. The editor chose the site Youm7 because it is known for being one of the websites that has the most continuous and systematic coverage on crime. It is also the most-read news website in Egypt according to Alexa. It is well-known to Egyptian readers as well. In addition, its website structure was consistent and organized, so it was easy to scrape stories from it. The word “scraping” means to extract the news stories from the website and record them in a dataset, basically a table of rows and columns. Each row recorded one story, detailing, across the columns, its title, date, excerpt, category and URL. So we scraped the crime section of the website, using a piece of programming code. And so we had a dataset of recorded news stories about the covered crimes that happened in Egypt during a certain time period. It still contained a lot of information than what we did not need, in addition to being messy and incomplete with regard to the murder cases. Afterwards, the team went through the news stories. We had to classify them based on whether the story was about a murder or attempted murder between spouses, or about something else. So we excluded all the news stories about other crimes that we were not concerned with – and found ourselves left with 222 cases to put our focus on.
How to make your data show the full picture
To be able to analyze it thoroughly, we had to extend the dataset with more details about each case. We added new columns, which were not originally present in the dataset we scraped, to show: Who was the murderer (the husband or the wife), who was the victim and, additionally, the tool of the murder, the place of the crime, the age of the murderer and the age of the victim. This is when I joined up the team, and this was the most time-consuming stage.
The dataset was still messy, because of initial mistakes in the scraped data plus human mistakes made by the team. It was my responsibility to get down to clean the data. Typical errors are typos, writing the same word with two different spellings or different formats, in addition to duplications and other problems. I did the data cleaning for this story using Google Spreadsheets. I could use Microsoft Excel, but I personally often prefer the former for its online availability and simpler interface. After the data was clean, I started my analysis, also using Google Spreadsheets, and found some results that could be interesting for the readers to build the story on.
After discussions with my editor on these results, we started drafting, editing, co-editing, polishing and translating the story. While I got to editing the text, I was also working with my editor to see which data visualizations would best tell the story we had. I built the visualizations using Tableau Desktop. It is a relatively easy and quick tool to visually dig deep into your dataset and come up with appealing, interactive visualizations that can be presented to the public online. The resulting visualizations are usually quite slow to load, though, which is a catastrophic issue for websites, especially journalistic ones. Tableau visualizations also usually do not look exactly the same across different browsers and devices, which means you'll have to compromise in some of your design choices most of the time. Anyway, we believed Tableau Desktop would do it in our case, and, to a large extent, it did. The discussions between me and my editor at this stage included choosing the chart types and choosing colors, shapes and sizes. Almost all of the successful choices of colors in particular are my editor's (He's my ex-editor now, so I'm not dissembling! Swear! I mean, kind of!).
Bringing a new perspective to a live debate
We published the story in both Arabic and English. The Arabic is for the typical local readers, and the English is for the local readers with English preference, non-Arab readers and for global awards and competitions. The story with the title ‘Murder in the arms of marriage: Story of 222 cases' was one of the stories that helped us win the GEN's Data Journalism Awards 2018 as the best small data journalism team. Locally, it was one of the stories that touched on a live debate with an approach that the audience were not yet familiar with, so it provoked some sense of interest in both our analyses in particular and in data-driven journalism in general.
You can find the full project 👉 here.