Category Archives: Data Journalism

Telling hidden stories through data and charts.

Data is boring? Depends how you present it

When I told friends and families that my job is to tell stories with data, they often thought that it is a boring job because of the word “data”. But when I showed them some of my works, they started to smile and told me “this is very interesting!”

The secret lies in the art of personalizing and humanizing data. Facts and figures are no longer cold and heartless if you could build a bridge between them and the users. Here are two news apps that I created to serve as the bridge.

First is a calculator that complements a hard news on the unequal distribution of unpaid housework between men and women. The calculator helps fathers to count the total time they spend on housework each day and compare it with fathers in other countries. Some fathers like me felt proud when we saw our performance is relatively high. But when the calculator compared it with the time spent by mothers in other countries, that’s when we saw and felt the inequality.

You can access the calculator here.

The second news app is also a calculator. It estimates whether you had a higher chance of being targeted by a hate crime in the US now than before the 9/11 attacks, based on your demographics. It was published in conjunction with the 15th anniversary of the attacks.

You can access the news app here.

Both news apps are trying to make the hard, cold data relevant to individuals, to let them see how the numbers affect their lives. Such interactive approach not only makes the story more engaging, it unleashes more impacts. This is the power of data if used effectively.

Using Google Trends for news story

Google Trends is an useful tool for journalists to gauge public sentiment especially on current affairs or breaking news. It shows you the search traffic of phrases and topics on Google search engine.

I usually use Google Trends to check the search traffic of relevant phrases after a news broke out. The results help me and my team to understand what people are interested in at that point of time. Sometimes the results themselves are the story.

After the Washington Post story about the British were “frantically Googling what the EU is” right after voting for the Brexit, I did more digging on Google Trends and found that a strong sense of regret among British as the searches for phrases like “second referendum” and “second referendum petition” have seen a spike in traffic. Then I did this quick story on the findings.

I have also been monitoring the search traffic of anti-Muslim phrases after every terrorist attack.

However, Google Trends has its problem especially when it is used for journalism. Its results do not show the absolute number, i.e. the number of users searching the phrases, but the search traffic relative to that of another phrase. For example, you could see the search traffic of a particular phrase has tripled but you wouldn’t know whether the traffic has grown from 100 to 300 users or one to three millions.

My solution to this problem is to include a popular search phrase that has a consistent traffic as a reference point. So far I found the phrase “weather forecast” to be a good reference point (share with me if you have a better phrase!). You can see how I used it in the first chart in the post-Brexit story. It is also embedded below.

Other important points when using Google Trends in news story – always explain to your readers how it works and don’t overinterpret the results.

My first data journalism award!

That’s right, the news application I created for PRI.org has won the Public Choice Award in the Data Journalism Award 2016!

Thanks to my colleagues, families, friends and readers who have voted for the news application “What if the Syrian civil war happened in your country?”.

It was first shortlisted as one of the five finalists under the category “News Data App of the Year (Small Newsroom)” and later, together with other finalists in all 10 categories, opened for public voting. The finalist with the highest vote would be given the Public Choice Award. The awards organized by the Global Editors Network received 241 entries this year.

Here’s the idea behind the news app that simulates the damages brought by the Syrian civil if it were to happen in another country.

“How do you generate empathy with the victims of a war that has been repeatedly told through various formats and media in the past 5 years? Despite being the most disastrous humanitarian situation in the 21st century, most of us have become numb to the Syrian civil war. Numbers and infographics highlighting the cruelty of the crisis no longer move us. In conjunction with the fifth anniversary of the Syrian civil war and the growing anti-refugee sentiment in the US and Europe, this news app uses basic data to allow users to simulate the civil war in their own country, moving the war closer to their hearts. This is another attempt by PRI.org to combine personalization of data, interactivity and creativity to create empathy after the hugely successful news app “What if your hometown were hit by the Hiroshima atomic bomb?” last August.”

Cited and embedded by other news websites in both the US and Europe, it has been used over 37,000 times in less than one month by users from all over the world, generating much more traffic than an average PRI story.

Effective visualization with animated Google Map

Google Map is an easy and quick tool to make map for storytelling. If you need a map with just one layer of information but don’t know much about map-making technique, Google Map is the perfect tool. (For multiple-layer map, I would use Carto.)

Like other tools developed by Google for journalists, Google Map has a clear and comprehensive documentation. The best part is there are many sample codes  that you can copy and modify for your visualization.

After learning the basics, I started to try the animation feature of Google Map and found it to be an effective tool for a quick illustration of geographical movement. I have used it several times in migrant and refugee stories to show the long journeys taken by refugees across multiple international borders.

In this story that compiles essays from five Syrian refugees on their journey fleeing from home, I made an animated map for each of them. The little yellow dot that moves on the map has a magical effect that makes the maps clearer and more engaging.

The moving yellow dot is even more important in this interactive photo story about the 6,600-mile trail taken by a Cuban family to reach the US.  Without the animation, it would take readers more time to understand the map as the route is a little counter-intuitive.

This atomic bomb news app was played almost 1 million times

I guess this would be the most viral online content I’ve ever created in my life.

This news app “What if your hometown were hit by the Hiroshima atomic bomb?” I created for PRI.org was played almost one million times by users all over the world. (There’s a newer version with some improved features.) It has been cited and reproduced by over 30 websites (I’ve stop tracking the number but simple Googling can show you some of them) in various languages.

I’ve been tracking the use of the app since it was published and it is now clocking at 850,000++. I even made an animated map to show when and where the bombs were dropped.

What’s the thought behind it? While researching for stories in conjunction with the 70th anniversary of the Hiroshima bombing, I came across some data on the damages caused by the atomic bomb. I could have done a straightforward story by listing a bunch of numbers about the death toll and other damages probably with some charts or infographics, but I asked myself a question: How can I use the information or data I have to help readers put themselves in the shoes of the victims? The answer led to the news app.

This news app also serves at a gateway to many other great stories done by my colleagues on the anniversary collected in the series called Hiroshima Generations.

 

 

 

 

Sometimes crazy charts work really well

One way to grab readers’ attention and drive your message home is to have the “wow” factor in your story. Visual is a good way to create such dramatic effect.

Just before the 2015 Thanksgiving weekend, I knew that it was a good timing to do story about caring for refugees, but the challenge is how to make the story stand out from all the festive stories and information while making a point about the refugee crisis in the Middle East and Europe.

I came out with two ridiculous charts. One compares the cost of settling refugees in the US and how much the Americans spend for Thanksgiving. The result is way beyond any expectation.

The second chart compares the death toll of the Paris terrorist attack in November 2015 and that of the Syrian civil war. The result is less dramatic but still shocking.

Check them out yourself.

Visualizing data with… music video

Inspired by Reveal’s data sonification (meaning representing data as non-speech sound) project on earthquakes in Oklahoma, I tried experimenting with a news app that uses audio to represent flood data.

Oh wait, not just audio, but the music video of Vanilla Ice’s rap single “Ice Ice Baby.”

I mashed the music video with data that project how many floods will some American cities have in another 30 years due to climate change. The higher the number of floods, the faster the tempo of the music video.

Here’s an example of the number of floods that Washington DC will have in 2045 visualized by the music video.

It created some buzz on social media. One of the shout-outs from the the Union of Concerned Scientists which compiled the flood data:

I embedded the news app below but the full-width version in PRI.org website looks better.

Enabling small newsrooms to produce data journalism

DataN is a program that lowers the barrier for smaller newsrooms to integrate data journalism into their daily operation.

UPDATE (Apr 17, 2015):  I have officially launched DataN as a service for newsrooms. Check out the official website and subscribe to the newsletter! 

This is a 3-month project that I did together with Washington DC-based Foreign Policy for my MA journalism program at New York University. It was presented at NYU Studio 20’s Open Studio Night on Dec 11, 2014.

Here’s a video of the presentation.

An informal ‘testimonial’ from Foreign Policy Homepage Editor Emma Carew Grovum, who worked together with me to develop the program.

If you want to know more about how I designed and ran the program, and the lessons I learned along the way, here’s a long article.

THE PROBLEM:

How can we lower the barrier for small newsrooms with limited resources to integrate date journalism into their daily reporting?

THE SOLUTION:

DataBig Data, Small Newsroom

THE BACKGROUND: 

Before I came to New York University to do my Masters, I spent 8 years working as a journalist at Malaysiakini, a small but independent news website in Malaysia that competes with all the media giants controlled and supported by the government. When I first joined in 2005, the editorial team has only about 30 members, and we published 25 to 30 stories daily in 3 languages (Malay, English, Chinese) whereas most of our government-backed competitors were operating with over 100 editorial staff. Despite limited manpower and financial resource, we became the most visited news website in 2008 and grew into a team of about 50 members. The lesson: With the right combination of vision, people and technology, small can be very powerful.

I came to New York with the goal to equip myself with knowledge and skills that can help independent media organizations in countries where the media is not free to grow and empower their citizens. I want to solve this question: How do resource-challenged independent media organizations produce quality and impactful journalism in a difficult environment?

Although media in the US is relatively free, I observed a similar gap between big and small news outlets especially in producing data journalism, a powerful journalism specialty that many newsrooms are scrambling to master.

Currently the way to produce data journalism is very resource intensive. Big newsrooms like New York Times has over 1,200 editorial staff, and the graphic department that produces data components alone has about 40 members, but for small newsrooms like Foreign Policy, the whole newsroom has only 40 people, equal to only one department in the Times. There’s a huge gap in the industry. To narrow this gap, I developed DataN.

ppt gap screencap for blog2

THE PROJECT: 

DataN is a program that lowers the barrier for smaller newsrooms to integrate data journalism into their daily operation. It helps them to produce more credible, engaging and comprehensible journalism. I partnered with Foreign Policy, one of the world’s most credible publication on international politics and global affairs, to develop a data journalism program for newsrooms that don’t have the resources to build a specialized team. Together with Homepage Editor Emma Carew Grovum, we designed training modules, selected and customized tools, and conducted trainings on the basics of data visualization for Foreign Policy journalists.

THE PROCESS:

I wanted to take a bottom-up, agile and user-centric approach. The design, content and delivery of the program should be based on the demands and feedback of the users – the journalists. I believed this approach would minimize disruption to the newsroom and resistance from the journalists. I started by asking the journalists about their views on data journalism, the importance of data in their works, skill level, support they need and challenges they face. Most journalists agreed that data components help to better tell their stories, but they have two major challenges: insufficient time and lack of skill to visualize data.

survey-result-1

With these feedback in mind, we designed and ran the program. The video demo below showed some of the things that we’ve built.

THE LESSONS:

1. Get ready for surprises

Almost nothing went as planned. Time is a universal problem for journalists, especially in a small newsroom where every journalist has to multitask and none can take over each other’s task. To make the matter worse, at the beginning of this program, the newsroom experienced an unforeseen infrastructure issue that required most journalists’ commitment to fix it. After postponing the training sessions for three weeks, I realized that the issue would not be solved soon enough for me to complete the training program. It was impossible to gather everyone in the newsroom at the same time for training. I knew we had to be creative.

2. Flexible, customized training

We transformed our one-size-fit-all trainings which supposed to consist of three one-hour weekly sessions into a one-hour session customized for different journalists. We broke our training materials into small pieces that can be mixed and matched into different modules specifically for different groups of journalists. There were modules for editors who work with contributors, staff writers, reporters, and advanced module for producers who will produce most of the data components. One-hour concise training sessions were conducted in small groups, sometimes one-to-one through both online and in person. Using this flexible approach, we managed to train 14 journalists, more than half of our target journalists.

The training materials include:

  • Charts and Graphs 101 – An Introduction
  • Data 101 – Working with spreadsheets
  • Data 101 – Analyzing and interviewing data
  • Which chart to use?
  • Documentation for different tools

Another advantage of giving training to a small group of journalists from the same section or beat is that it encourages conversation among the journalists. This is because the training can be tailored to give specific examples relevant to the group and the journalists can ask group-specific questions.

The training materials follows the same principle – save the user time. They are concise, written in layman’s terms, and sometimes illustrated using charts. To make the training more relevant, we selected several published Foreign Policy stories and made charts that complement the stories. We used them as examples of how charts can help to tell better stories.

3. Easy-to-use, customizable tools

Not only journalists are busy, most of them don’t like coding. Even if they are interested to learn, they can’t find the time. Teaching journalists how to code require significant time and extraordinary commitment. This program aims to improve the data literacy of the newsroom and equip the journalists with basic data visualization skill. Coding is not one of them. Hence any data visualization tools that require too much time to learn/use or coding skill is out of the of the question. We also wanted tools that can be customized to fit the house style and will be used repeatedly in future stories. Not all tools that we need fit all criteria. We picked two open source tools and one paid tool, and designed one ourselves. For each tool, we prepared documentation that includes guideline on when to use the tool and step-by-step instruction. I have listed the tools that we used in the last section.

4. Integration into existing platform

We believe it is important to integrate the program into Foreign Policy’s existing in-house training platform. We didn’t want the journalists to see the program as something additional or something that is very different from previous in-house trainings. Hence we used the same language, tone, format, and level of detail in our training materials as those in previous trainings. All the training materials were also uploaded into an existing internal training website that serves as a one-stop resource center. Instead of opening up different PDF files or websites, the journalists can visit the website to access the documentations or use the tools.

5. External consultant

Getting an external consultant like me to run the program has several advantages. It encourages higher commitment from journalists as they know the consultant, as opposed to their colleagues, will not be in the office everyday. They have a sense of urgency. For newsroom that tends to procrastinate on new initiatives, bringing in an outsider can be a good way to get things started.

It helps small newsroom with limited financial resources to save money as it is a one-time investment. You don’t have to hire a permanent full-time staff just to teach you how to do data journalism.

However, the external consultant needs the support of senior management and a committed partner from from the newsroom. The partner should be someone who understands the newsroom operation and culture, and passionate about data journalism and innovation. Emma was a great partner in my program with Foreign Policy. She helped to get the journalists on board and integrate the program into existing training platform.

6. Get developers on board

We underestimated the role of developers in this program. Their involvement is crucial in selecting and installing tools because they know the best about the CMS and the backend infrastructure. During my program, the installation of customized tools was delayed as the developers had to deal with other unforeseen issues in the newsroom.

THE TOOLS:

1. Datawrapper   datawrapper logo

This is the major tool that the journalists will use to produce charts. It is easy and quick to use, and the embedded charts are interactive and responsive. It can be installed in our own server and we can create a Foreign Policy theme with customized colors, fonts and logo. This helps the journalists to save time as they don’t have to manually customize the appearance.

2. Visual Investigative Scenarios (VIS)   vis logo

This network visualization tool is still in its Beta version. Although not customizable, it has seven themes to choose from and one of them fits Foreign Policy house style pretty well. There are other more sophisticated and powerful network visualizer out there but they require coding.

3. Venngage   venngage logo

This is a paid infographic tool that requires monthly subscription. This is our alternative tool when Datawrapper cannot produce the data components we want e.g., two charts with different unit in the same graphic, or the use of customized icons or images in the chart. We only use it to create flat chart, not elaborated infographic.

4. In-house Sankey Diagram    d3 logo

I developed this customizable chart as an advanced tool for journalists who know basic coding skill (open and edit code using a text editor). The journalists only need to format their data in spreadsheet and paste it into the code to generate a new chart.

Apart from these, Foreign Policy has earlier included other data visualization tools into their toolbox including Chartbuider, TimelineJS, StoryMapJS, Tabula and other in-house tools.

THE RESULTS:

We conducted another round of survey after the training. Out of 13 trained journalists responded to the survey, almost half said the training met their expectations. Four said they still have unanswered questions after the training, and the other three felt that it was too basic. The three journalists had worked with data visualization before the training. Another encouraging result was that 90% of them wanted to have more advanced training.

result-expectation 2

result-advanced training

After the training, some of the journalists put the new skills to work almost immediately. Here are some examples [1, 2, 3, 4]. Personally I like this story very much as it used 6 charts to analyze the impacts of the 75-day Hong Kong’s pro-democracy protest. This is something Foreign Policy had been hoping to produce. (The charts are not interactive as there were some glitches with the server that hosts Foreign Policy’s Datawrapper tool.)

MOVING FORWARD:

This program is just a beginning to equip the journalists with basic data literacy and visualization skill, and introduce them to the value of data journalism. From the feedback, we can see that such a one-time training is never enough. The journalists need long-term support to further develop their skills.

There are several ways to proceed from here. The newsroom can continue to bring in external consultant to provide specialized training for journalists who are more enthusiastic about data journalism, or those who have already mastered the basics, such as the three journalists who said the training was too basic. These journalists can form a small team in newsroom that leads data journalism and transferred their skills to other colleagues through collaboration and in-house training. The newsroom can also connect the journalists with external resources and network such as NICAR, and encourage them to attend conferences or boot camps. The post-training support is something I want to develop after graduation.

DataN is not just for Foreign Policy. When I designed the program, my goal was to build a model that can be applied to other small newsrooms, not only in the US but other parts of the world. I believe every newsroom should have its own DataN.

If you want to build a DataN for your newsroom or know more about the program, contact me. I would be more than happy to help. [kuangkeng@gmail.com]

Interactive time use calculator

This is the first quiz I handcoded using Javascript. Dubbed as an interactive time use calculator, it was produced during my summer internship with NBC Local Media but it was only published on Dec 18 as part of the year-end series. (Updated)

By submitting the time you spend everyday on various activities like sleeping, working, surfing Internet, shopping and exercising, the quiz will compare your time use habit with the rest of Americans and show the result using charts.

The idea came after the 2014 American Time Use Survey was released in June. Instead of doing a story about how Americans spend their time (WSJ did the story), I wanted to make the survey more relevant and personally engaging for users. A good way would be to find out where you stand in the statistics, and a quiz should do the work.

To make the result easily understood, I thought drawing a chart would be helpful. So I explored different ways to do that and settled with D3, a JavaScript library for data visualization. I tried Google chart but failed to load it in the same page (it worked fine in a separate page). But D3 is actually more powerful because you could customize everything from the color to the mouseover tooltip. It took me several weeks to put everything together as there are different parts that I need to figure out.

  1. I need to learn how to build a quiz. All the existing quiz tools or plugin don’t meet my requirement because I need a quiz that allows me to use the result to draw a chart on the same page after the quiz is submitted. So I started with a simple HTML form.
  2. After building a simple HTML form, I need something to validate the user input before the form is submitted. I don’t want users to put in more than 24 in the hour box or more than 59 in the minute box. After testing out several plugin, I managed to make jQuery Validation works.
  3. I figured out how to use popular javascript library jQuery to collect user input, do calculation, and select which result box to show after the form is submitted.
  4. After drawing the D3 charts, I learnt how to use jQuery Tipsy plugin to add mouseover tooltip to each bar in the charts.
  5. Finally, after everything was working, my editors wanted to allow users to share their personal result through social media. Twitter is no-brainer, but Facebook is a disaster when you want to customize the shared content. Instead of going through all the complicated processes in Facebook SDK for Javascript, I found a workaround by creating 4 empty pages, each with a meta tag that states a different result. So now there are 4 different results for users to share in their Facebook. (Update: the personalized results sharing function was not used by the editors in the published version but you still can find it in my github page.)

Instead of compare user input with the average Americans, I produced a second chart that does the comparison with the user’s peer group (same age group, gender and employment status). I believe this gives users a better picture of where they stand in the society.

The whole project proves a notion that many coders have been stressing – the best way to learn coding is to produce a project. Yes, that’s the truth!

You can access the code in my Github or try the quiz.

Thanks for my Studio 20 classmate Elle Zhu and editors in NBC for helping to complete the quiz.

Interactive: San Diego Comic-Con By the Numbers

I produced this interactive chart + infographic on San Diego Comic-Con International 2014 (SDCC), the largest comic conference in the US,  during my summer stint with NBC Local Media as a data intern.

It was selected by Tableau Public as the Viz of the Day.

The instruction from the editors was simple: find some information about SDCC and put together an infographic.  But I wanted to make some thing more than that. I was curious about the history of the event and the journey it took to reach today’s popularity. So I started to collect data but one important data is missing – the ticket prices of previous conferences. I decided to build that data myself and went through a long and tedious online searching process, going from one website to another to find and verify the ticket prices of all the previous conferences since 2000. Together with other readily available data, I visualized it using Tableau Public, one of my favorite data visualization tools as it is free and user friendly.

I also found some old SDCC logos from the official website and thought it would be fun to explore SDCC’s history through visual. I used JCarousel, a jQuery plugin to build my first photo carousel. Anyone with some basic HTML and CSS skill should be able to make it work.

Then I wrap the two components with the infographic that I made using free online infographic making tool Piktochart and Adobe Photoshop in a box, uploaded it into my Github account and embedded in NBC website as an iframe.

Mission accomplished!

Visit the full story here.