Reading multiple csv files into R while skipping the first lines

Today I needed to read in multiple csv:s into R and merge them all to one table. I’ve done this before succesfully using a nice and short piece of code:

files <- list.files(".", full.names = T, recursive = T)
listcsv <- lapply(files, function(x) read.csv(paste0(x)))
data <- do.call("rbind", listcsv)

This code works nicely when you have csv:s that are of identical structure and don’t include any extra rows at the top of your data.

Today however, I’m stuck with csv:s with an extra seven lines at the top of each file that I’d like to strip.  Normally, skipping lines while reading in a csv is easy, just specify the argument skip. Like so:

file <- read.csv("filename", skip=7)

This would give me the file just nicely, as a data frame. However, the above code for reading in the files one by one into a list and the binding them together into one data frame doesn’t work as such as I need to get rid of the seven extra lines at the beginning of each file.

I could, of course, strip the seven lines from each file manually, I currently only have 13 files that I’d like to read in. But I’m sure that there will come a day when I have many more files, so why not do this properly right from the start?

After several trial-and-error approaches I reverted to Google. And found one nice Stackoverflow article on the subject.

I tried a couple of the options with no luck, always failing on some detail, like how to pass the arguments to read.csv. Eventually I tried this one:

data <- do.call(rbind, lapply
        (files, read.csv, as.is=T, skip = 7, header = FALSE))

And it works! In this format passing the extra arguments (as.is, skip and header*) to read.csv works fine. And with my amount of data (only 55000+ rows in the final data frame), it is also fast enough:

   user  system elapsed 
   0.38    0.00    0.37 

 

So now I’m all ready to start joining this data with some other data and get on with my data analysis!

 

* The as.is argument makes sure that strings are read in as character and not factors. The skip argument allows you to skip the specified amount of rows. The header  argument lets you specify whether the first row shouls be used as a header row or not.

Advertisement

Headache while trying to filter on a map in Tableau :/

This week’s MakeoverMonday delivered a data set on the accessibility of buildings in Singapore. For each building there is an index for the accessibility level and of course information on where this building is situated alongside with some information on that area (“subzone”). So I figured, why not plot each area on a map and then by clicking that area youl’d get a list of all the buildings in that area and their accessibility indeces? Seems straigth forward enough.

So I plotted the map, and let Tableau color the areas according to the average accessibility:

w50_singapore_averages.PNG

 

The darker the colour, the better the accessibility. Now I’d like the user to be able to click an area, for instance Alexandra Hill, and get the information about the buildings in this particular area. Like this:

w50_alexandrahill_table

But alas, this table is NOT shown when you click on the map, this action only shows one line per area, for some (for me) still unknown reason:

w50_alexandrahill_table_short

The entire list of buildings is shown only when you chose the area from a list on the side of the dashboard, but not when you click on the map. You can try it out on Tableau Public yourself.

I’ve tried different ways of filtering and different actions on the filters, but nada. I will, however, fix this! I want to understand why Tableau acts this way.  I just need to dig into it some more. So instead of serving you a nice #mmonday blog post, I shared some headache, but hey – this is not that uncommon when working with data after all 😉 Hang in there for the sequel!

 

Makeover Monday – Prices of curries

This week’s Makeover Monday was about visualising a data set gathered by Financial Times. The data covers the pricing of curries at the Wetherspoon pubs in the UK and Ireland. The original story is covering several different aspects of the pricing – my simple makeover is by no means any attempt to do it better. Rather it is an excercise for myself in using Tableau dashboards.

My makeover is posted at Tableu Public. It shows a map of the pubs and when you click on a pub a stacked bar showing the pricing for that bar appears on the right.

w49_curries

A simple viz, but a nice excercise in combining maps and charts into an interactive dashboard.