How can satellite data enrich (data) journalism?

Ben Heubl is a data journalist, he previously worked at the Financial Times and The Economist in London. Here, he explains how he uses satellite data in several of his investigative enterprise stories.

The truth is: We, as data journalists, cannot rely merely on clean, open spreadsheet data anymore. Tabular data remains an essential tool to validate, vindicate and confirm claims and stories. But if data journalists want to bring their A-game to the newsroom, alternative data may just offer an additional array of tasty investigative news stories.

One accessible alternative data source is satellite imagery. Satellite images have one big advantage: Although not completely free from potential errors or government interventions, they are supposed to be a source of unbiased information. Usually, images are available with a high spatial resolution and providers do collect them frequently. Also, the whole operation is usually fairly affordable.

At the very top of my list stood China as a guinea pig. It was last and particularly this year when doubts on Chinese GDP figures’ accuracy culminated (Especially, as some Chinese local governments are hoping for perks by admitting fake data). As a result, I searched for ways to disprove official provincial open data via alternative sources. In many ways, problems with the data were suspected rather than thoroughly demonstrated. Generally, we knew about the habitually issuing of sketchy data – inflated by local statisticians who aimed at making their economies sound better than they really were.

I partnered with a company called SpaceKnow, who got me hooked on the topic of nightlights. Together, we collected high resolution nightlight satellite imagery for Chinese provinces. Why nightlight? Images of nighttime light intensity can serve as proxies for headline economic growth, as numerous studies, like this one, suggest. Especially for emerging economies, such as Africa, this can be one suitable alternative source of information when conventional data operations fail to do the deed.

Nightlight analysis in R (@BenHeubl).

Nowadays, thanks to our newsroom, nightlight is becoming a more mainstream source of information, for investors and companies alike (since 2016, SpaceKnow also publishes the Africa Night Light Index on Bloomberg and the World Bank advocated for the use of such techniques – see, for instance, WB’s Nightlight India project).

What does nightlight analysis entail, apart from being very pretty?

Nightlights are satellite data. They are collected by light sensitive instruments mounted on satellites orbiting around the earth at night. Thus, they can capture artificial light – i.e. light that is not from the sun or other natural sources – generated on the surface of the earth.

In more simple terms, imagery is taken and alterations in light sources are observed and measured. There are some caveats to it (e.g. altitude for measurement matters while large cities might need ‘special’ treatment, the issue of clouds etc.). Nonetheless, the results for Chinese provinces were stunning and proved to tell a magnificent story in the end.

Remember: This was one of the first technical attempts by an Asian newsroom to add solid data-driven evidence of data meddling in the Chinese provinces. Instead of ending there, Nikkei’s news analysis also revealed the size of the gap between real and fake data.

Viz: How to present deviations found between officially issued Chinese GDP data compared to that of nightlight generated readings.

Given that the country faced solidifying allegations (here, news analysis by Bloomberg Economics) officials pledged to improve local data accuracy. Its federal statistical offices (National Bureau of Statistics of China), had just issued devastating downbeat economic reports, suggesting a slowing rate unprecedented since the global financial crisis. Despite being painful to witness, those figures were probably more accurate than those issued some years earlier.

In another instance – this time for an FT print edition project – satellite nightlight images served as a confirmation that the lights really went out at North Korea’s Kaesong Industrial Zone (KIZ) – once a proud collaborative project between the two Koreas. There was evidence that it closed all its industrial activity on February 10th, 2016. This had not been 100% certain, as little is known about what really goes on in North Korea (I got hooked by another investigative satellite account by Colin Zwirko at NKNews).

A bit more sophisticated in implementation was an investigation into Chinese island building, supported by a machine learning algorithm that yielded evidence that China became more efficient in its operations.

Your very own satellite analysis

If you attempt to involve nightlight in your own data journalism analysis – potentially alongside conventional economic stats, e.g. for countries where public data is in doubtful state – there are now a few new convenient ways to do this. If you are just looking for raster files, Planet Labs, a US based private Earth imaging company allows you to download updated satellite images, via a free trial account. You can ‘mosaic’ them together and even get daily updates on the world of satellite images (pretty impressively, the firm claims that their shoebox-size satellites take a conclusive image of the world once every day). In a medium post, an employee shares their view on the implications for newsrooms: "Satellite imagery is quickly becoming a near-real-time reporting source". Watch this pretty-straight-forward beginners tutorial on how to work with satellite imagery in Python, filmed at SciPy 2018, the annual scientific computing with Python Conference.

Another way* – of which I will present an example – is a solution relying on R. The authors of the package “Rnightlights” released its new version on Cran in October this year. It helps to speedily extract raster and zonal statistics for countries from satellite nightlight rasters gathered and provided by NOAA.gov. Best of all, it’s utterly free. If you have ever hunted for boundary shapefiles, you might be familiar with GADM. The package also utilises this to crop satellite images according to geo-polygon bite-sized shapes (like a cookie cutter, but way cooler).

If you have some familiarity with R, I encourage you to test the following***.

# Load packages required
install.packages("Rnightlights")
install.packages("easypackages")
install.packages("reshape2")
install.packages("lubridate")

library(easypackages)
libraries("reshape2", "lubridate", "Rnightlights")

This will take some time (bear in mind that each month’s raster is about the size of 500MB, depending on the connection and date range you provided). Nightlight images are downloaded in a tif format on a monthly basis).

China_highestAdmLevelStats <- 
  getCtryNlData(ctryCode = "CHN", 
                admLevel = "highest",
                nlType = "VIIRS.M", 
                # pick current period
                nlPeriods = nlRange("201801", "201811"), 
                nlStats = list("sum",na.rm=TRUE),
                ignoreMissing=FALSE)  

# some tidying with a melt function provided by the reshape2 package,
# which later allows us to plot via ggplot2
China_highestAdmLevelStats <- 
  melt(China_highestAdmLevelStats,
       id.vars = grep("NL_", names(China_highestAdmLevelStats), 
                      invert=TRUE), 
       variable.name = "nlPeriod",
       value.name = "radiancesum")

# Get date from the nightlight col names
China_highestAdmLevelStats $nlPeriod <- 
  substr(China_highestAdmLevelStats, 12, 17)

# format period as date
China_highestAdmLevelStats $nlPeriod <- 
  ymd(paste0(substr(China_highestAdmLevelStats $nlPeriod, 1,4), 
             "-",substr(China_highestAdmLevelStats$nlPeriod, 5,6), "-01"))

# GADM gives you several levels of boundaries
# (depending on how granular you need them).
# We will plot 2nd admin level sums for the year,
# as instructed by the author
ggplot(data = China_highestAdmLevelStats, 
       aes(x=nlPeriod, y=radiancesum, 
           color= China_highestAdmLevelStats [[2]])) +
  scale_x_date(date_breaks = "1 month", date_labels = "%Y-%m")+
  geom_line()+
  geom_point() + 
  labs(color = names(China_highestAdmLevelStats)[2]) + 
  xlab("Month") + 
  ylab("Sum of Radiances") +
  ggtitle(paste0(unique(names(China_highestAdmLevelStats)[2]), 
                 " sum of radiances for ", ctry))

# Now plot the data and see how nightlight changed over time

# Big Thanks to Chris Njuguna for the code example on Africa
# (https://github.com/chrisvwn)
          


Despite the general lack of comprehensive data from China, there is data on China, even good data. More and more companies establish themselves as providers/collectors and edit or validate precious data-points (e.g., Win.d or CEIC, they are picky providers on which data they give to their paying users). There is a gap in the market for data journalists to include alternative data in their investigative data journalism process. It’s a niche, widely unexploited according to my experience.

After all, against all the marketing by state officials, China remains widely bolted (in terms of available data, and in some other strange ways, too). Many times, data that does flow out to western news organizations, is intercepted and checked by the Communist Party of China (see a story about why Chinese customs administration suddenly stopped publishing detailed monthly open trade data for North Korea, or for Iran).

Even without programming magic, data heavy investigations can benefit from just a raster and KLM image analysis in QGIS – Image provided by Planet Labs, enriched with the lovely London tube network (red lined).

Another recent example I worked on with CoalSwarm, a research community that provides data on coal fired power plants, used satellite imagery to affirm findings on Chinese coal fired construction pipelines.

Satellite imagery provided the proof the researcher needed to establish claims against the Chinese government – that Chinese plants, allegedly shelved for construction, in reality, keep being built on.

Xinfa Group appears to be violating central government orders and keep building. A before and after shot provided by CoalSwarm. Planet Lab satellite images from September 2017 to February 2018 showing that the plant continued with both operation and development.

More examples of satellite reporting to orientate yourself on are done by Reuters Graphics. A favorite one of mine involves the reporting on Uighurs camps in Xinjiang. Reporting on Chinese Island building is another great one, providing some stunning pictures, too. Planet labs appears Reuters go-to source for satellite images.

*I am borrowing here from the package creator’s Github site.
**Some more background info on nightlight.
***For the complete documentation, check the vignette.



About

Ben Heubl

Ben Heubl is a data journalist originally from Munich (and yes, he owns a pair of Lederhosen), who worked previously at the Economist and the Financial Times in London. He is a massive fan of Caro-Kaffee and in his spare time he enjoys reading suspense and may soon even attempt to produce a work of fiction himself.

Runs on:

How many pie charts have you built?
I don’t bake.

How many times per week do you have to explain what "data journalism" is?
Three times last week, until my boss gets it too :-)

How often do people use you as IT-Support?
My wife does it all the time.

How many adapters do you have?
One, that connects to all/most countries.

Your funniest file name?
File_me_to_the_moon.r

snow flake
© 2018 Journocode