The coolest thing about data

Perhaps the really really coolest thing about data is when it starts talking to you. Well, not literally, but as a figure of speech. When you’ve been working on a set of raw data, spent hours cleaning it, twisting it around and getting to know it. Tried some things, not found anything, tried something else. And then suddenly it’s there. The story the data wants to tell. It’s fascinating and I know that I, at least, can get very excited about unraveling the secrets of the data at hand.

And it really doesn’t need to be that much analysis behind it either, sometimes it’s just plain simple data that you haven’t looked at like that before. Like this past week when we’ve had both the icehockey world championships and the Eurovision Song Contest going on. Both of them events that are covered by our newspaper and both of them with potential to attract lots of readers. Which they have done. But the thing that has surprised me this week is how different the two audiences behave. Where the ESC-fans find our articles on social media and end up on our site mainly via Facebook, the hockey fans come directly to our site. This is very interesting and definitely needs to be looked into more in depth. It raises a million questions, the first and foremost: How have I not seen this before? Is this the normal behaviour of these two groups of readers? Why do they behave like this? And how can we leverage on this information?

Most of the times, however, the exciting feeling of a discovery and of data really talking to you, happens when you have a more complex analysis at hand. When you really start seeing patterns emerge from the data and feel the connection between the data and your daily business activities.  I’m currently working on a bigger analysis of our online readers that I’m sure will reveal it’s inner self  given some more time. Already I’ve found some interesting things, like a large group of people never visiting the front page. And by never, I really do mean never, not “a few times” or “seldom”, I truly mean never. But more on that later, after I finish with the analysis. (I know, I too hate these teasers – I’m sorry.)

I hope your data is speaking to you too, because that really is the coolest thing! :nerd_face:


Analysing the wording of the NPS question

NPS (Net Promoter Score) is a popular way to measure customer satisfaction. The NPS score is supposed to correlate with growth and as such of course appeals to management teams.

The idea is simple, you ask the customer how likely he or she is to recommend your product/service to others on a scale from 0 to 10. Then you calculate the score by subtracting the sum of zeros to sixes from the sum of nines and tens. If the score is positive it is supposed to indicate growth, if it is negative it is supposed to indicate decline.

My employer is a news company publishing newspapers and sites mainly in swedish (some finnish too). Therefore we mainly use the key question in swedish, i.e. Hur sannolikt skulle du rekommendera X till dina vänner? This wording, although an exact mach to the original (How likely is it that you would recommend X to a friend?) seems a little bit clumsy in swedish. We would prefer to use a more direct wording, i.e. Skulle du rekommentera X till dina vänner? which would translate into Would you recommend X to a friend? However, we were a bit hesitant to change the wordin without solid proof that it would not affect the answers.

So we decided to test it. We randomely asked our readers either the original key question or the modified one. The total amount of answers was 1521. Then, using R and the wilcox.test() function, I analysed the answers and could conclude that there is no difference in the results whichever way we are asking the question.

There is some criticism out there about using the NPS and I catch myself wondering every now and again if people are getting too used to the scale for it to be accurate any more. Also, here in Finland there is a small risk that people mix the scale with the scale 4-10 which is commonly used in schools and therefore apply their opinions to their years old impression about what is considered good and what is considered bad. I’d very much like to see some research about it.

Nevertheless, we are nowaday happily using the shorter version of the NPS key question. And have not found any reason why not to. Perhaps it could be altered in other languages too?



The 2018 presidential election in Finland, some observations from a news analytics perspective

The presidential elections 2018 in Finland were quite lame. The incumbent president, Sauli Niinistö, was a very strong candidate from the offset and was predicted to win in the first round, which he did. You can read more about the elections for instance on Wikipedia.

Boring election or not, from an analytics perspective there is always something interesting to learn. So I dug into the data and tried to understand how the elections had played out on our site, (which is the largest swedish language news site in Finland).

We published a total of 275 articles about the presidential election of 2018. 15 of these were published already in 2016, but the vast majority (123) was pubslished in January 2018.

Among the readers the interest for the elections grew over time, which might not be that extraordinery (for Finnish circumstances at least). Here are the pageviews per article over time (as Google Analytics samples the data heavily i used Supermetrics to retrieve the unsampled data – filtering on a custom dimension to get only the articles about the election):


Not much interesting going on there. So, I also took a look at the traffic coming in via social media. Twitter is big in certain circles, but not really that important a driver of traffic to our site. Facebook, on the other hand, is quite interesting.

Using Supermetrics again, and doing some manual(!) work too, I matched the Facebook post reach for a selection of our articles to the unsampled pageviews measured by Google Analytics.  From this, it is apparent that approximately one in ten persons reached on Facebook ended up reading our articles on our site. Or more, as we know that some of the social media traffic is dark.

The problem with traffic that originates from Facebook is that people tend to jump in and read one article and then jump out again. Regarding the presidential elections this was painfully clear, the average pageviews was down to 1,2 for sessions originating from Facebook. You can picture this as: Four out of five people read only the one article that was linked to Facebook and then they leave our site. One out of five person reads an additional article and then decides to leave. But nobody reads three or more articles. This is something to think about – we get a good amount of traffic on these articles from Facebook but then we are not that good at keeping the readers on board. There’s certainly room for improvement.

What about the content then? Which articles interested the readers? Well, with good metadata this is not that difficult an analysis. Looking at the articles split by the candidate they covered and the time of day the article was published:


(The legend of the graph is in swedish => “Allmän artikel” means a general article, i.e. either it covered many candidates or it didn’t cover any candidates at all.)

Apart from telling us which candidates attracted the most pageviews, this also clearly shows how many articles were written about which candidate. A quite simple graph in itself, a scatter diagram coloured by the metadata, but revealing a lot of information. From this graph there are several take aways; at what time should we (not) publish, which candidates did our readers find interesting, should we have written more/less about one candidate or the other. When you plot these graphs for all different kinds of meta data, you get a quite interesting story to tell the editors!

So even a boring election can be interesting when you look at the data. In fact, with data, nothing is ever boring 😉


A note about the graphs: The first graph in this post was made with Google Sheets’ chart function. It was an easy to use, and good enough, solution to tell the story of the pageviews. Why use something more fancy? The second graph I drew in Tableau, as the visualisation options are so much better there than in other tools. I like using the optimal tool for the task, not overkilling easy stuff with importing it to Tableau, but also not settling for lesser quality when there is a solution using a more advanced tool. If I had the need to plot the same graphs over and over again, I would go with an R-script to decrease the need of manual clicking and pointing.


The Road to Nowhere

I was just told that students who multitask during lectures perform up to a whole letter grade poorer than their fellow students. Whether this is true or not, I’m pretty sure humans cannot concentrate fully on two things at the same time, our focus is split and our attention jumps back and forth.

In certain situations it certainly is worth while devoting your full attention to whatever you are doing. The students who want to perform well should preferably pay attention to the lecturer rather than their laptops or mobiles. The same is true for our jobs, the result is often better if the person doing the job is paying attention to it. Whether it be writing, cooking or taking care of sick people.

An interesting question is how the multitasking affects our media consumption. There are studies on this as well. The consumption certainly is becoming more and more fragmented which puts pressure on the media companies to produce content that succeeds in keeping the attention of the audience.

I have to admit that almost every time I sit down on the sofa I bring my iPad along. Because most of the TV shows are boring. So why not Facebook or read emails at the same time? At least I fool myself into believing I am more efficient this way. Still I was shocked when a TV strategist told me that the attention span of the TV audience of today is six minutes. Six minutes! Every six minutes something really interesting should happen on the screen or people zap away (or turn to their iPads). It is just crazy. How can we expect to relax or to learn something if our attention span is that short? At least I know I most often feel more stressed than relaxed after an hour of simultaneous usage of TV and FB. It’s a bit like eating a large bag of candy – it feels like a good idea at the beginning, but when it’s done you swear never to do it again. Until next time.

But there is at least one upside to surfing the web while watching TV. When you watch a TV show, you can easily enrich the experience by reading more on the topic at hand online. And this has become so much easier with the iPad. If I watch an old movie on TV I tend to look up the actors and the reception the movie got when it was released, who composed the music, which other films the actors have been involved in etc etc. You learn a lot! Take Vanilla Sky as an example, I had no idea that the name referred to the skies depicted by Monet until I read about it on Wikipedia.

I especially love enriching documentaries. The Finnish broadcasting company YLE just showed the four-part documentary Billy Connolly: Journey to the Edge of the World. A fantastic scenery and interesting people! I watched this together with my iPad, looked up the places Connolly visited on Google Maps, read about the inuits and about Pond Inlet, a place I didn’t know existed.

Pond Inlet by Michael Saunders

Simultaneous usage of media in a way that enriches the experience gives you so much more than only watching the documentary. At the same time you have to be careful not to overdo it. It is quite easy to get carried away and forget all about the documentary or film you thought you were watching. Maybe we do need some twist to the story telling every six minutes to stay focused?

Oh yes, I almost forgot, The Road to Nowhere:
Road to Nowhere by janers.sweeter

Poor research is a real burden for media

With a vast experience of research my heart always cries when I come across poor research. Be it poorly designed or poorly presented – it’s such a waste of money! Sometimes I also get angry. Angry with the research institutes who sell fancy “truths” to gullible companies.  Most of the time, however, there’s not much you can do about that, other than hope the public isn’t stupid enough to believe everything they hear. For instance, when some poll tells you that a certain political party has gained in supporters at another party’s expence when in fact the margins of error make any such conclusions null and void.

But sometimes, when this poor research lands close to my own turf, I feel the need to act.

Last Friday I spent all day tearing a research concept to pieces. Comparing the results to the questionnaire and trying to make sense of it all. It’s a study that’s been done four times already and at the second and third round I was in the audience when the research institute presented the results. Both times I politely asked the researchers how they calculate certain key figures. But the answers never satisfied me. As the study was commissioned by our newspaper association and not our company, I decided to let it be, it was not my fight.

Then came the fourth round, using exactly the same concept, again with exactly the same dubious figures. So I sat down, once again, with the report and the questionnaire and pinpointed the problems with the study in a lengthy email and sent it to the persons responsible for commissioning the study. I just hope it is well received and at least leads to a thorough discussion.

Poor research should be banned. Even though we have the Esomar professional standards we are presented with way too much cr*p even from research institutes complying to the standards. The research institutes  really should go the extra mile on assuring the quality of their concepts and services because it isn’t easy to commission extensive surveys ( Esomar also has a guideline for commissioning research. Read it. And there are independent researchers out there who can help you with the commissioning. Use them.). There are so many factors to weigh in, ranging from the aim and the sample to the analysis and  conclusions. If you aren’t a research professional yourself you should be able to rely on the research institutes.

My personal favorites in the Esomar code are the following basic principle articles:

1a and 1b) “Market research shall be legal, honest, truthful and objective and be carried out in
accordance with appropriate scientific principles“. “Researchers shall not act in any way that could bring discredit on the market research profession or lead to a loss of public confidence in it.” – This is something all researchers should take to their hearts. Sadly enough many don’t. Just think about how often you stumble across crazy research and crazy conclusions. Research that does damage the reputation of market research as people either laugh at it or simply don’t believe in it.

4 c) “Researchers shall on request allow the client to arrange for checks on the quality of
data collection and data preparation.” This article implies that the quality of your work should be impeccable. You should be ready, at any time, to let the customer audit your work. Way too seldom customers ask for it though. Working at a research institute myself some years ago, I offered this option to sceptical customers – nobody has ever offered it to me.

The research on media in Finland is seldom good. Too much is lost in the margins of error, too many conclusions are derived from studying means.The ambition to cover too much has resulted in monstrous surveys that serve nobody well. Thankfully, the print media audience measures has been criticised publicly by more and more people and some improvement is under way.

If we make decisions based on mediocre studies and information that cannot hold for scrutiny we won’t end up with winning products. As long a we measure a total audience and try to describe that mass of heterogeneous people as one entity we fool our selves and we fool the advertisers. We need more detailed information, we need to open our eyes to see the multidimensional audience we have. Gone are the days when one product suited all and the audience could be treated as one. Thus we should also realise that the surveys we use to measure our audiences should be re-designed to fit the needs of today. Although we might lose some trends and many grand old men and ladies will grunt in discontent, we need the change. The poor research of today is only hampering us, so let’s throw it out and bring in research that really benefits us!

Whom are we designing tablet apps for?

The tablets buyers of today are still early adopters, I’m sure we all agree on that. And at least in the US they are more often than not young affluent men. But appart from that, what do we know about the users? What do we know about our media consumers in general?

It’s easy to treat all users as one entity, as one homogeneous group of people who all use their tablets in the same way. The recent Mintel study on tablets and eReaders, tells us what tablet and eReader users do with their devices:

This is interesting reading as such but how about different user profiles? Means and averages aren’t a very good basis for development actions. We need to know more about the tablet user behaviour before starting our design process.Not all users use their tablet in the same way. This can be seen in the above diagrams. But the diagrams don’t tell us whether those who read RSS feeds also read blogs or whether those who watch movies also read news. And so on.

We need to identify the different user groups and design for each group separately, or at least keep them all in mind when designing.

Just like we know from the print media business that some read their papers from cover to cover and others only skim through the newspapers reading what they find interesting, we need to be aware that not all tablet users behave in the same way. Even though the market is still young I’m sure that there are different behavioural groups emerging. Some persons want breaking news fast whereas others like to read thoroughly about the subject at hand. Some want news about politics others about celebrities.

The technology now provides us with the tools to customise the experience. Using the same backend platform we can produce multiple experiences which can cater to the needs of different user groups. Why not do it?