AMR outliers or not?

I’m working on a data set with AMR for audio. AMR = Average Minute Rating, in essence how many listeners you content has had on average, each minute. You can think of it as a measure of your audience being spread out evenly over the content, from start to beginning.

To be able to calculate your AMR you need to know the total amount of minutes people have listened to your content and then of course the length of the content. So if your audio content is ten minutes long and your analytics tells you that you have a total of 43 minutes of listening, that would give you an AMR of 4.3 (=on average 4.3 persons listened to the content for the entire duration of it).

My assupmtion is, at least when it comes to well established audio content, like pods running for tens of episodes, that the AMR is more or less the same for each episode. Or at least within the same ball park.

However, at times your data might contain odd numbers. Way to small or way too big numbers. So are these outliers or should you believe that there actually was that few/many listeners at that particular time? Well, there’s no easy answer to that. You need to do some exploratory analysis and have a thorough look at your data.

First, especially if you run into this kind of data often, I would establish some kind of rule-of-thumb as to what is a normal variation in AMR in your case. For some content variation might be small, and thus even smaller deviations from the “normal” should be singled out for further analysis. In other cases the AMR varies a lot, and then you should be more tolerant.

Then, after identifying the potential outliers, you need to start playing detective. Can you find any explanation as to why the AMR is exceptionnally high or low? What date did you publish the content? Was it holidays when your audience had more time than usual to listen to the content or did some special event occur that day, that drew people away from it? Again, there is no one rule to apply, you need to judge for yourself.

Another thing to consider is the content: Was the topic especially interesting/boring? Did you have a celebrity as a guest on your pod/did you not have one (if you usually do)? Was the episode much longer/shorter than normally? Was it published within the same cycle, like day of week/month as you usually do? Did you have technical difficulties recording that affects the quality? And so on, and so son…

It all boils down to knowing your content, examining it from as many different perspectives as possible, and then make a qualified judgement as to whether or not the AMR is to be considered an outlier or not. Only then can you decide which values to filter out and which not.

When you are done with this, you can finally start the analysis of the data. As always, cleaning the data takes 80% of your time and the actual analysis 20% – or was it 90%-10…?

 

P.S. Sometimes it helps to visualise – but not always:

Failed linegraph of AMRs
Epic fail: Trying to plot a line graph of my AMRs using ggplot2. Well, it didn’t turn out quite as expected ๐Ÿ˜€

 

 

Advertisement

Funny vizzes

Every now and then your visualisation tool might be a little too clever. And it suggests some nice viz based on your data but the viz makes absolutely no sence. Like the one below. The credits go to Google Sheets this time. I had a simple dataset, just two columns of simple integers that I wanted to plot in a line chart. Actually, I’ve plotted seven of them already today. But come number eight, Google Sheets decides it is not an appriopriate viz anymore. So it drew this for me:

Not much information in that one ๐Ÿ˜€ Perhaps this was Googles way of telling me to take a break?

I just thought I’d share it with you since we all need a good laugh every now and then! And I just might share some other funny vizzes as they come along. Please comment and share your similar vizzes, I’m sure you have a bunch of them as well!

The 2018 presidential election in Finland, some observations from a news analytics perspective

The presidential elections 2018 in Finland were quite lame. The incumbent president, Sauli Niinistรถ, was a very strong candidate from the offset and was predicted to win in the first round, which he did. You can read more about the elections for instance on Wikipedia.

Boring election or not, from an analytics perspective there is always something interesting to learn. So I dug into the data and tried to understand how the elections had played out on our site, hbl.fi (which is the largest swedish language news site in Finland).

We published a total of 275 articles about the presidential election of 2018. 15 of these were published already in 2016, but the vast majority (123) was pubslished in January 2018.

Among the readers the interest for the elections grew over time, which might not be that extraordinery (for Finnish circumstances at least). Here are the pageviews per article over time (as Google Analytics samples the data heavily i used Supermetrics to retrieve the unsampled data – filtering on a custom dimension to get only the articles about the election):

President_2018_per_day

Not much interesting going on there. So, I also took a look at the traffic coming in via social media. Twitter is big in certain circles, but not really that important a driver of traffic to our site. Facebook, on the other hand, is quite interesting.

Using Supermetrics again, and doing some manual(!) work too, I matched the Facebook post reach for a selection of our articles to the unsampled pageviews measured by Google Analytics.ย  From this, it is apparent that approximately one in ten persons reached on Facebook ended up reading our articles on our site. Or more, as we know that some of the social media traffic is dark.

The problem with traffic that originates from Facebook is that people tend to jump in and read one article and then jump out again. Regarding the presidential elections this was painfully clear, the average pageviews was down to 1,2 for sessions originating from Facebook. You can picture this as: Four out of five people read only the one article that was linked to Facebook and then they leave our site. One out of five person reads an additional article and then decides to leave. But nobody reads three or more articles. This is something to think about – we get a good amount of traffic on these articles from Facebook but then we are not that good at keeping the readers on board. There’s certainly room for improvement.

What about the content then? Which articles interested the readers? Well, with good metadata this is not that difficult an analysis. Looking at the articles split by the candidate they covered and the time of day the article was published:

President_2018_per_candidate

(The legend of the graph is in swedish => “Allmรคn artikel” means a general article, i.e. either it covered many candidates or it didn’t cover any candidates at all.)

Apart from telling us which candidates attracted the most pageviews, this also clearly shows how many articles were written about which candidate. A quite simple graph in itself, a scatter diagram coloured by the metadata, but revealing a lot of information. From this graph there are several take aways; at what time should we (not) publish, which candidates did our readers find interesting, should we have written more/less about one candidate or the other. When you plot these graphs for all different kinds of meta data, you get a quite interesting story to tell the editors!

So even a boring election can be interesting when you look at the data. In fact, with data, nothing is ever boring ๐Ÿ˜‰

 

A note about the graphs: The first graph in this post was made with Google Sheets’ chart function. It was an easy to use, and good enough, solution to tell the story of the pageviews. Why use something more fancy? The second graph I drew in Tableau, as the visualisation options are so much better there than in other tools. I like using the optimal tool for the task, not overkilling easy stuff with importing it to Tableau, but also not settling for lesser quality when there is a solution using a more advanced tool. If I had the need to plot the same graphs over and over again, I would go with an R-script to decrease the need of manual clicking and pointing.

 

Headache while trying to filter on a map in Tableau :/

This week’s MakeoverMonday delivered a data set on the accessibility of buildings in Singapore. For each building there is an index for the accessibility level and of course information on where this building is situated alongside with some information on that area (“subzone”). So I figured, why not plot each area on a map and then by clicking that area youl’d get a list of all the buildings in that area and their accessibility indeces? Seems straigth forward enough.

So I plotted the map, and let Tableau color the areas according to the average accessibility:

w50_singapore_averages.PNG

 

The darker the colour, the better the accessibility. Now I’d like the user to be able to click an area, for instance Alexandra Hill, and get the information about the buildings in this particular area. Like this:

w50_alexandrahill_table

But alas, this table is NOT shown when you click on the map, this action only shows one line per area, for some (for me) still unknown reason:

w50_alexandrahill_table_short

The entire list of buildings is shown only when you chose the area from a list on the side of the dashboard, but not when you click on the map. You can try it out on Tableau Public yourself.

I’ve tried different ways of filtering and different actions on the filters, but nada. I will, however, fix this! I want to understand why Tableau acts this way.ย  I just need to dig into it some more. So instead of serving you a nice #mmonday blog post, I shared some headache, but hey – this is not that uncommon when working with data after all ๐Ÿ˜‰ Hang in there for the sequel!

 

Makeover Monday – Prices of curries

This week’s Makeover Monday was about visualising a data set gathered by Financial Times. The data covers the pricing of curries at the Wetherspoon pubs in the UK and Ireland. The original story is covering several different aspects of the pricing – my simple makeover is by no means any attempt to do it better. Rather it is an excercise for myself in using Tableau dashboards.

My makeover is posted at Tableu Public. It shows a map of the pubs and when you click on a pub a stacked bar showing the pricing for that bar appears on the right.

w49_curries

A simple viz, but a nice excercise in combining maps and charts into an interactive dashboard.

 

A new acquaintance – Google Data Studio

For the past few months we’ve been building dashboards with Google’s Data Studio. A visualiation tools that can easily be connected to a multitude of data sources. We have uploaded most of our data to Big Query to be able to easily (and with much better speed!) query the data into a multitude of dashboards.

BQ in combination with Google’s Data Studio is an easy-to-use combination to implement basic dashboards needed in a media house. Here are some examples of dashboards that we’ve built the past months:

  • A live report on the NPS for our site, including open ended comments, shown on a screen at the news desk
  • A dashboard showing which articles generate the most registrations
  • Amounts of sold subscriptions per type, date and per area
  • A vis on the demographics of the registered users (showing demo data):

Registered_demog

Data Studio is very easy to use and set up to work with different data sources. You don’t even need to do any coding to access the data in Big Query, but then again, the options on how to plot your data are limited. What you gain on the swings you lose on the roundabouts…

The plot types are quite basic, simple time series, bar charts, pie charts, tables etc. One nice feature though is the geo map that allows you to visualise your data on a map:

Subs_geo

But us non-US users still will have to wait for the zoom level to have other options than just the country for areas outside the US :/

Formatting your visualisation can, however, by no means be compared to e.g. Tableau or even Power Point. Limited options for formatting margins etc. mean that effective use of space on your dashboard is difficult. And you can forget about formatting any of the nitty gritty details on your chart.

Nevertheless, Data Studio makes it really easy to visualise your data and is a handy tool with a low learning curve. And it’s free. So why not try it out? And I’d love to hear your comments on it, so please pitch in in the comment section!

 

MakeoverMondays

A week ago on Thursday I attended a meeting for the Tableau User Group in Finland #fintug. There the inspiring Eva Murray (@TriMyData) from Exasol, Tableau evangelist, told us about the concept of MakeoverMonday, and had us do last week’s challenge live, then and there.

I was paired up with Jaakko Wanhalinna (@JWanhalinna) from Solutive in redoing the viz in only 43 minutes. We had a blast, and thanks Jaakko’s good knowledge of Tableau we came up with this nice remake:

 

mmovermonday_w45

You can find the original at my profile on Tableau Public.

Despite some schedule restraints I decided to take on the challenge of this week’s MakeoverMonday viz as well. It’s about the city transport systems in 100 cities globally. The data provided covered only the names of the cities and an index for each city. The higher the index the better. More information about the index can be found at the homepage of Arcadis, a Design and consultancy agency for natural and built assets.

Here’s my viz on the data:

mmovermonday_w46.PNG

And the original is of course on Tableau Public.

The MakeoverMonday is a fun way to experiment with Tableau and simultaneously learn about very diverse topics, I can highly recommend it! So there will be more of these, maybe not every Monday, but as often as I can squeeze them into my schedule!