Wednesday, 25 January 2017

Why dark backgrounds in a Tableau viz should be avoided

One of the design choices that you have to make when creating a visualisation is what colours to use, and in particular do you go for a light foreground and a dark background or vice versa. On the face of it, this can be thought of as just an aesthetic design decision, but in fact, it has some real implications that need to be considered. 

There are many things that can determine the colours used in a viz. I talked about this in my #data15 talk on colour. Sometimes it's the theme of the data. If it's a 'dark' subject then that lends itself to be a dark background and muted colours. If it's a fun poppy viz then bright candy colours work well. 

We always want to make or data viz look awesome and beautiful, but first and foremost it must be an accurate representation of the data, and this is where we have to consider the correct use of dark backgrounds. 

The issue is that it's harder to see small, faint light marks on a dark background than it is with a dark mark on a light background. This is something that Astronomers have known for years. 20 years ago (damn) I was an undergraduate studying for a degree in Astrophysics and spent a lot of time looking at images of star fields. More often than not you would be looking for faint galaxies or nebulae and it was much more effective to look at a negative image. Check out these two images below, the only thing that is different is one is the negative of the other. 

See how you can see a lot more detail in the lower image. imagine if you have data points that are there but faint and they get missed?

Here's an example made in Tableau. Each row has 100 points, all sized in the same way, the only difference is the reverse of the colour gradient. Notice that the light on black fade out a lot earlier. Try playing with the sliders for the reference line and see how much of a difference it makes

Now I am not saying you shouldn't use a dark background, just that if you do, make sure you can see all the marks. 

Wednesday, 11 January 2017

Pseudo Conditional Formating A Table in Tableau

Ever wanted to colour entire rows of a table in Tableau based on a condition? Well, you can, sorta, check out the video below on how to do it, or read one for more info. 

In this example I have a table looking at Tableau data extracts and I want to know which ones belong to archived projects so that I could turn off the update and take a bit of strain off the server. 

This calculated field looks at the project that the extract belongs to and sees if the word archived is contained within it.  If it does it gets the value of True, if not, False. This is then added to the rows, and the header is turned off. This has the effect of putting all the Archived extracts at the top of the page. I've also added the field to the colour shelf to add a coloured mark.

The problem is, the highlighted part is just the small mark, i would much rather have the entire row coloured. So to do that we need to use row banding formating. 

1. Remove the archived field from the colour shelf. 

2. Show the Archived field header on the row. 

3. Next, we want to get rid of the 'abc'. To do that we can use the blank trick that Emily Kund (@emily1852) presented in her TDT
This is just a single string of two double quotes and then we put that new field onto the label shelf. 

4. Go to Format, Shading and select the row. 

5. Change the header to be the highlight colour, in this case, orange. That changes all the headers to be orange, except for the far right, where the row banding settings adds the grey band.

6. Next, we can change the row band to be white and move the level slider to the left. Each time we move it notice how the coloured banding changes. When its all the way over to the left we get just two bands. 
7. The reason we get the two bands is because of the boolean true/false result of the archive calculation. We don't need that header anymore so right click on it and un-tick show header.

And there you go. A nice coloured band across the top of the table, clearly highlighting the issues, rather than just the mark. 

Be sure to check out the youtube video above for more details and follow on twitter at @Matt_Francis 

Friday, 16 December 2016

Automatically Parse Dates in any Tableau Version

Dates are just the worst data type you ever have to work with. There are about 1001 different ways to write a date, different databases store them differently. Each software has its little qwerks and that's even before you take into account regional differences. 

Using dates in Tableau has been no exception. Everyone at some point has imported some data with dates in and Tableau hasn't recognised it and refused to play along. So you change the data type to date, hoping it might work and it never does. 

So you have to instead use the date parse function, which is fine but does require a little bit of coding and a little bit of trial and error. Wouldn't it be so much better if there was an automatic way of doing it, what if Tableau just worked with dates like you think it should do.

Wel in 10.2 is does! Tableau have introduced an automatic date parse function. Now if you load in a data set and it doesn't at first recognise it as a function you can try to convert and 10.2 does a much better job at recognising strange formats the success rate is much much higher. 

But what if you don't have 10.2 yet, what if you are on 9.2 for example, what then? Are you stuck with having to write date parse functions yourself? Well of course not. 

We can a little trick to get all the goodness from 10.2 into any version of Tableau that has the date parse function. Watch how I do it below, or read on. 

First, load up your data with the datefield that you want to parse, in this case, its got a day portion, then the day number, the short month, and the full year. So we'd have to do a little bit of work to get that to parse. 

The Next step is to make a duplicate of the original field that we want to parse.

Now here is the interesting bit. See how tableau makes duplicates? that little = sign means it's created a calculated field, if we edit this field we can see this 

It's just a copy of the original. So.... now here comes the magic. 
Go back to your copy and change the data type to date 

Now if we drag that new field onto the sheet we see it now functions as a normal date, we can roll it up and down, filter on it etc etc. But here is the trick, what happens if we now look at the calculation?

Boom! There is the full calculation that Tableau did in order to parse that field into the date format. And as its just a calculation, you can copy and paste it into any Tableau version with the date parse function. 

You need never write a date parse calculation ever again. 


Wednesday, 12 October 2016

Make Over Monday 40 - EU Transport Satisfaction

This week the makeover was on a satisfaction survey on EU Transport. Rather than look at the break down of opinion i looked at the change between 2012 and 2016 and created this arrow plot. I used Red/Green for good and bad changes, but importantly i double encoded this with the arrow heads pointing in the right direction too. 

Tuesday, 11 October 2016

Why are my Tableau server extracts failing?

This is the first in a planned series where I document my adventures in the exciting world of Tableau server. I've been using Tableau server ever since I first found Tableau, but for the first time, I am getting into the admin side of things. I might even end up going on training for it, now that's serious! 

Anyhow one of the first tasks I have been asked to look at is monitoring the server. Now Tableau comes with some dashboards to look at things like user activity, disk space and extracts. These do a pretty good job of saying what's currently going on, but you soon find out that you need a bit more information and that's the point of these posts, to walk through what I needed and how I did it. 

First up, Extracts

We use lots of extracts on our server for a number of reasons. A lot of our data is not from fast warehouses, in a lot of cases, it's from the live databases so we need a way to get the data and make sure that the user interaction in the dashboards is as fast as possible, so that's why we extract. When looking at the extracts I have got 3 tasks that I want to do

  1. Identify and fix all failing extracts, find those one-off errors and put measures in place to stop them happening again. Fix those that are constantly failing due to connection passwords changing etc. 
  2. Once all the failures have been addressed the next step is to look at usage, how many of these extracts are being used, what's the total number of monthly views , what is the size. Those old extracts can be turned off to reduce load and space on the server and remove duplicates
  3. Now we just have the extracts that are needed, they don't fail so the final step is to optimise them, reduce the complexity, increase performance, and tune them up. 
  4. Use the new v10 alerts function to tell users when their extracts have failed so that they can monitor it themselves. Self service FTW!

So the first little issue is over failing extracts. Tableau server comes with a nice little dashboard that shows "How Have Extracts Performed on This Sever" and it lets you select some filters and find extracts that have fallen over or that have taken a long time. The first job I have been asked to do is find out why these are failing and either fix them or at least know what's going on. 

So what I can see is that we have roughly 100 failures every day,now is that good or bad? Are that 100 extracts failing once, or one extract failing 100 times. Are they old or new, are they even being used any more? Like most things related to Tableau, one answer always leads to more questions. There is a separate tab to look at the amount of traffic to a data source, but then that wouldn't list the ones that have failed. As this is a pretty specific query it looks like I'll have to build my own dashboard to assist. And that's never a bad thing, always nice to roll up the sleeves and build something.  

I've connected to the Tableau internal postgresDB and looked at the _backgroundjobs view, which lists all the jobs that have been running, including those extracts that have failed. Then by joining to the _datasources I can build a dashboard that shows me all the jobs for the past month, whether they have passed or failed. I can see the total number of jobs and the number that have failed for the past 7 days, are they going up or down and what proportion fail. This is important as a top level metric, the numbers that the Team Leaders are going to want to see decreasing. 
But crucially, we need to know whether the failing extract was a one-off or if it's always failed or something that has recently fallen over. It's this that's the important thing to look at with any error. Problems like disk space, they might come and go as old data gets pruned but changes to database names might be reoccurring issue. Just getting a list of the current fails doesn't give enough information. I've added the grey bars using reference lines to show today in dark grey and the last 7 days in the lighter grey so that it tied it back to the overview bar charts at the top. This also helps to quickly see if the issues are recent of have been going on for days or weeks. 

Then, of course, the next step is, so this extract is failing, but what is the error, who published it, when did it last correctly run and is it even being used anymore. All this comes from the postgresDB and added to the tooltip

So now I have a single dashboard that in a glance I can see if we have site-wide issues, local project issues, a re-occurring issue that needs intervention, or just a blip that will sort itself out. 

Once I have all the monitoring dashboards in place to check that things are not failing, I can then go on to the next step and look at usage, remove those that are not needed anymore, and then finally tune those that are. 

If these next steps go as well as the first, then I will have made a huge step in getting the server under my command. Muhahahah

Tuesday, 4 October 2016

The Rise and Fall of Global Peace

This week for MakeOverMonday it the Global Peace Index. The original was the classic example of using a map to show changes over time, something that never works very well. When i started to look at the data I saw that the overall global peace score hadn't improved significantly, there were countries , Syria, Libya that had seen large rises. This meant that some countries must have also improved, and so my idea formed. To split the world in two and show both those countries that had declined, but also those that had improved. Once i had that idea, the black and white colour scheme followed. I used the maps as both a navigation tool and as an indication of the division in the world.

Monday, 12 September 2016

MakeOverMonday Week 37 - The Box That Contains The World

This week for Makeovermonday the data is all around global shipping, looking at how many ships and containers are owned and run by the 100 largest container companies. As soon as i saw this data set i knew what i wanted to do. The treemap lent itself perfectly to be displayed as the containers on a ship. All i had to do was a quick google search for a decent image, edit it slightly to make it a transparent overlay and it was done. A little back ground on shipping containers and voila, The Box That Contains The World

Monday, 22 August 2016

The Rise of Malaria in The Democratic Republic of Congo

The Democratic Republic of Congo is the 2nd largest country in Africa. In 2014, a quarter of African malaria deaths occured in the DRC. The 10 countries that surround it equated to another quarter. Since 2006 the surrouding countries malaria victims have been on the decline, by the DRC has continued to rise, peaking at nearly 31,000 in 2013. Civil war prevents medical supplies reaching those that need it. Refugees fleeing conflcit leave behind vital mosquito nets.

This viz looks at the issue. 

Wednesday, 10 August 2016

Celebrate the upcoming release of Tableau 10, get my Udemy course for $10

To celebrate the up coming release of Tableau 10 I'm giving away my Udemy course on Vizzing data with Tableau for just $10, see what i did there. Be sure to follow the link below to get it at the discounted price.

Tuesday, 2 August 2016

How to filter data when you don't have the data?

Filters in Tableau are great, they let you get rid of data that you are not interested in a hone in on the data that you do. They do however have one big flaw. You can only create a filter if you have the data. What do I mean by this? Well lets look at an example. 

We want to create a sales dashboard per state. Our sales areas are divided by regions and we want to create sales dashboards one that just looks at just one region at a time Central,West,East and South.

So we could do that using a filter right? Well lets see what filter options we have. Dragging region onto the filter shelf shows us this

We only have 3 regions to choose from, its not possible to select the West region. The reason for that is that filters Only filter the values in our data. If the data isn't present, we cannot filter it out. We haven't yet had any sales for the West region, so we don't yet have that in our data, so we cannot create a filter on it yet. We do know though, that we will be getting data for it, so how can we set up filtering for data that we haven't got yet?

So what do we do in this instance? Well we can create the East/South/Central dashboard as those three regions are present in the data, but what about the West?

Well we could use and exclude filter and select East/South/Central to be excluded, leaving the other two regions. Then when we do get the sales for West arrive in the data the filter will still work and our dashboard will filter the data. 

But, and its a big but, the sort that Sir Mixalot would like, are we certain no more regions would ever be added? What would happen if a North region got created and data started to be associated with a North region? Well the East/South/West dashboards would be fine, its including the East,West and South regions so any new regions would just be ignored. However, what happens to the West? That filter is only excluding Central,East and South, any other value of Region is welcome. This means that the North region data will be included, silently. Using Exclude filters are only good if you don't care about extra data being added, sometimes that what you want, but its better to include if possible. 

So, theres the problem, how can we create a filter for data that isn't yet in our data source? The answer like most things is parameters. 

We need to create a parameter and then tell Tableau to match it to the Regions and then add that to the filters. 

First, lets create a Region parameter

Now we create a calculated field based on the value of the parameter. 

Add this to the filter shelf

and then test it out.

We can now select a region and the filter works. When we select West, we get a blank sheet, which is correct, there is no data, so nothing is going to be shown. However, when the West's data starts to get added this filter will become "active" and only show the wests data. 

Parameters are a great way to take control of your data viz, you can ensure that filters work how they should and that the only the data that you want to be seen, is seen. 

Thursday, 28 July 2016

40 Years of Executions, 50,000 Wasted Years

This viz was part of MakeOverMonday looking at executions since 1979. I decided to take a look at the wasted years of the executed people, based on their life expectancy according to their race and state. In 40 years over 1400 people were executed, a total of nearly 50,000 wasted years that never were because of the crimes the committed. If the death penalty is the ultimate deterrent, then maybe looking at the data this way might do just that. This viz was inspired by the superb Gun Deaths viz by Periscopic.

Wednesday, 27 July 2016

Don't use network drives with Tableau, use Network Locations

We have users that need to use Excel workbooks as datasources in their Tableau dashboards which are then published to Tableau server. However the problem lies with how Windows and Tableau use the network path names. 
Most people will connect to a shared drive by mapping a network drive, like this

And map the network drive to a letter, in this case Z. 

Now all that works fine and Tableau desktop will happily connect and all is right in the world. But if you then publish that workbook to the server, the problem arises with that drive letter. Tableau server has no idea what that Z drive is, it thinks it's a local drive on the server box somewhere and so it cannot find the correct file. This makes for a sad panda.

Instead we need to make sure that Tableau uses the UNC path name. The UNC path in this example is \\fastnfs\tableau_data_sources, the same that we used when we mapped the drive to the letter Z. 

Even though from here it knows the UNC path..

You can see the path that the data source is using by right clicking on the name in the data window and selecting properties

and if we click on the filename in the data connection window we can see it's defaulted back to use the drive letter. 

To fix this we need to edit that path to be the full UNC one

And then if we publish, Tableau server knows where to go and look for the file, assuming you have giving the Tableau Services user access to the directory.

This all works fine, except it relies on someone remembering to manual edit the path name to the full UNC path and not the driver letter name. 

The good news is that there is an easy solution to this and it's how you set up your access to the shared network drive in the first place. Instead mapping a network drive you map a network location. The process is the same as mapping a drive, except it doesn't get a drive letter. 

Choose the custom network location, click next.

Put in the UNC path for your shared drive

Give it a name

click finish and voila

We now have a new network location, note the different icon. 

Now if we create a data connection and use that instead of a lettered drive

And then check the data source properties we see it's got the correct UNC path, we havent typed a thing, not had to edit anything. 

Hope this helps anyone using files on a network share as a data source, if you need any help setting this up, just give me a shout.