The coolest thing about data

Perhaps the really really coolest thing about data is when it starts talking to you. Well, not literally, but as a figure of speech. When you’ve been working on a set of raw data, spent hours cleaning it, twisting it around and getting to know it. Tried some things, not found anything, tried something else. And then suddenly it’s there. The story the data wants to tell. It’s fascinating and I know that I, at least, can get very excited about unraveling the secrets of the data at hand.

And it really doesn’t need to be that much analysis behind it either, sometimes it’s just plain simple data that you haven’t looked at like that before. Like this past week when we’ve had both the icehockey world championships and the Eurovision Song Contest going on. Both of them events that are covered by our newspaper and both of them with potential to attract lots of readers. Which they have done. But the thing that has surprised me this week is how different the two audiences behave. Where the ESC-fans find our articles on social media and end up on our site mainly via Facebook, the hockey fans come directly to our site. This is very interesting and definitely needs to be looked into more in depth. It raises a million questions, the first and foremost: How have I not seen this before? Is this the normal behaviour of these two groups of readers? Why do they behave like this? And how can we leverage on this information?

Most of the times, however, the exciting feeling of a discovery and of data really talking to you, happens when you have a more complex analysis at hand. When you really start seeing patterns emerge from the data and feel the connection between the data and your daily business activities.  I’m currently working on a bigger analysis of our online readers that I’m sure will reveal it’s inner self  given some more time. Already I’ve found some interesting things, like a large group of people never visiting the front page. And by never, I really do mean never, not “a few times” or “seldom”, I truly mean never. But more on that later, after I finish with the analysis. (I know, I too hate these teasers – I’m sorry.)

I hope your data is speaking to you too, because that really is the coolest thing! :nerd_face:

Advertisements

Switching Supermetrics reports to a new user – some tips and tricks

Recently I was faced with the need to switch a bunch of Supermetrics reports (in Google Sheets) to another user. How this is done is perhaps not the most obvious thing, but not at all hard after you figure it out.

This is how you do it:

  1. Open the report and navigate to the sheet called SupermetricsQueries. (If you can’t see this sheet you can make it visible either via the All sheets -button at the lower left hand corner of your Google Sheets or via the add-on menu Supermetrics / Manage queries). On this sheet you’ll find a page with some instructions and a table with  information about the queries in this report.
  2. Delete the content in the column QueryID :
    supermetrics_QueryID
  3. Replace the content in the column Refresh with user account with the correct credentials. E.g. if we talk about Google Analytics this is the email of the account you want to use, if it is Facebok it is a long numerical id.
    supermetrics_RefreshWithUserAccount
  4. Navigate to the Supermetrics add-on menu and choose Refresh all.
  5. Be sure to check the results in the column Last status to ensure that all queries were updates as planned.
    supermetrics_LastStatus
  6. Then, check the data in the reports themselves.
  7. When you’re done I suggest you hide the sheet SupermetricsQueries sheet so that you (or someone you shared the report with) doesn’t alter the specs by mistake.
  8. Don’t forget to transfer the ownership of the file itself if needed!

 

This is pretty straight forward. Updating a bunch of reports I, however, made the following notes-to-self that I’d like to share with you:

  • Make sure that the account you are using Supermetrics with has credentials to all the data you want to query!
  • Before you start transferring your reports take some time to get acquainted with the content of the reports. Perhaps even make a safety copy of it so that you can be sure that the new credentials and queries are producing the data you expected.
  • When updating the report you probably will want to make some changes to some of the queries. I noticed that when updating many queries it might be easier to update them making changes to the specifications in the table on the SupermetricsQueries sheet instead of using the add-on. Just be careful while doing this!
  • NB! If the original report was scheduled to auto refresh or auto email with certain intervals, you will need to re-do the scheduling. So make sure you know who the recipients of the original report were before you switch the ownership!

Google Analytics and R for a news website

For a news site understanding the analytics is essential. The basic reporting provided by Google Analytics (GA) gives us good tools for monitoring the performance on a daily bases. Even the standard version of GA (which we use) offers a wide variety of reporting options which carries you a long way. However, when you have exhausted all these options and need more, you can either use some kind of tool like Supermetrics or then query the GA api directly. For the latter purpose, I’m using R.

Querying GA with R is a very powerful way to access the analytics data. Where GA only allows you to use two dimensions at the same time, using R you can query several dimensions and easily join different datasets to combine all your data into one large data set that you then can use for further analysis. Provided you know R of course – otherwise I suggest you use a tool like the above mentioned Supermetrics.

For querying GA with R I have used the package RGoogleAnalytics. There are other packages out there, but as for many other packages in R, this is the one I first stumbled upon and then continued using… And so far, I’m quite happy with it, so why change?!

Setting up R to work with GA is quite straight forward, you can find a nice post on it here.

Recently I needed to query GA for our main site’s (hbl.fi, a newssite about Finland in swedish) different measures such as sessions, users, pageviews but also some custom dimensions including author, publish date etc. The aim was to collate this data for last year and then run some analysis on it.

I started out querying the api for the basic information: date (for when the article was read), publish date (a custom dimension), page path, page title and pageviews. After this I queried several different custom dimension one by one and joined them in R with the first dataset. This is necessary as GA only returns rows where there are no NA:s. And as I know that our metadata sometimes is incomplete, this solution allows me to stitch together a dataset that is as complete as possible.

This is my basic query:

# Init() combines all the query parameters into a list that is passed as an argument to QueryBuilder()
query.list <- Init(start.date = "2017-01-01",
                  end.date = "2017-12-31",
                  dimensions = "ga:date,ga:pagePath,ga:pageTitle,ga:dimension13", 
                  metrics = "ga:sessions,ga:users,ga:pageviews,ga:avgTimeOnPage",
                  max.results = 99999,
                  sort = "ga:date",
                  table.id = "ga:110501343")

# Create the Query Builder object so that the query parameters are validated
ga.query <- QueryBuilder(query.list)

# Extract the data and store it in a data-frame
ga.data <- GetReportData(ga.query, token, split_daywise=T)

 

Note this in the Init()-function:

  • You can have a maximum of 7 dimensions and 10 metrics
  • The max.results can (according to my experience) be at the most 99,999 (at 100,000 you get an error).
  • table.id is called ViewID in your GA’s admin panel under View Settings
  • If you want to use a GA segment* in your query, add the following attribute: segments = “xxxx”

 

Note this in the GetReportData-function:

  • Use split_daywise = TRUE to minimize the sampling of GA.
  • If your data is sampled the output returns the percentage of sessions that were used for the query. Hence, if you get no message, the data is unsampled.

 

* Finding the segment id isn’t as easy as finding the table id. It isn’t visible from within Google Analytics (or at least I haven’t found it). The easiest way to do this is to use the query explorer tool provided by Google. This tool is actually meant to aid you in creating api query UPIs but comes in handy for finding the segment id. Just authorise the tool to access your GA account and select the proper view. Go to the segment drop down box and select the segment you want. This will show the segment id which is in format gaid::-2. Use this inside the quotes for the segments attribute.

 

The basic query returned 770,000 rows of data. The others returned between 250,000 and 490,000 rows. After doing some cleaning and joining these together (using dplyers join functions) I ended up with a dataset of 450,000 rows. Each containing the amount of readers per article per day, information on category, author and publish date as well as amount of sessions and pageviews for the given day. All ready for the actual analysing of the data!

 

Co-creating the future media business

Opening up the newsroom is a new trend that spins off the idea that co-creation is the key to loyal readers. I believe in it, or at least that co-creation is a necessity for part of the readers to become or remain loyal when some readers are loyal as is. But as Anette Novak says it so much better and has first hand experience of it, you’d better read about it from her, here.

 However, reading Anette’s blog post I came to think about something similar – the co-creation of the business itself. Or call it by its old-fashioned name; networking. But I’d like to find it a better term than networking. Networking has a sound of “mingling on an event and sharing fragments of ideas” to it. True networking is more than that. As a phenomenon it’s closer to mentoring than just the sharing of bits and pieces that networking often is. But mentoring implies that there is an apprentice and a master, someone old in the game and someone new. We don’t need that now – there is no grand old man for the new media landscape of tomorrow.

For the print media to overcome the challenges we are facing today we need true networking. Not all companies are fully staffed with the most brilliant minds and the most productive people. We are more or less stuck with the people we have. And too many in the print media industry are happy and content with the business as it has been the past thirty years or so. Some big media players might be able to re-staff with brilliance and productiveness but most are not. (And yes, I do know that we need the people who have the old kind of know how as well, I do.)

The simple fact is that we need each other. We are faced with new challenges and it is just plain stupid not to search for answers amongst colleagues in other media companies, academic media researchers and even people in other industries. Together we are smarter and more creative.

We need to get together and start networking for real. Networking in a more mentoring way. Networking and trusting each other. There are many things we can share and co-ponder without sharing business secrets. Instead of mingling on seminars and events and trying to pry the competitor on some secrets or scrabbling quick notes while some guru is talking about how their multinational company launched a new iPad app we should find people we can co-create our future with. Co-create the media business of tomorrow.

 In “reader co-creation” the reader gives input to the journalistic process. In co-creating the entire business I believe in long lasting networking with people you trust. People who are close enough to your business to understand it but far enough either in geographical terms, business wise or in mindset not to misuse the relation. Preferably people that challenge you with their chain of thought and whom you challenge with yours.

The media companies that will succeed will be those that are able to embrace the common know how of the industry, listen to the quiet messages of the readers and implement this knowledge in their own daily life. (Now substitute the word companies in the previous sentence for individuals, it is just as true). Although one additional thing is necessary for the companies to succeed: the retention of the skilled and the brilliant. There are enough challenges for the brilliant out there not to stay put if their brilliance is not appreciated.