Tuesday, 4 February 2014

Boost your Viz performance with Extracts

Why Extracts are great, and why you should use them

   So I am going to lay my stall out early, i love extracts. I use them as much as possible. At work all of the workbooks i create and publish to our Tableau server use extracts. Well that's not entirely true, out of 127 views, I think 6 might look at a live data table. So, you might be asking, why does he do that? Well first, lets look at what data extracts are.

What is a data extract?



   A Tableau data extract, lets call them TDE is a cache of data stored locally to your tableau instance (desktop or server) that is optimised for fast queries by Tableau. It tends to be a sub-set of data that also allows you to query the data offline. In fact if you publish anything to Tableau Public, you have already made an TDE. 

OK I get that but why use them?

   Speed. Whenever Tableau updates a view or dashboard it has to go and query some data somewhere, apply some filters possibly and then bring back those results and display then for you in whatever style you have chosen. If your data source is a nicely structured datawarehouse running on some nice fancy platform like Terradata or Vertcia then you should get sub second responses and the views update pretty quickly. However sometimes you don't have that, maybe your data connection is slow and its feels a little sluggish to respond. Or, in the case with much of what i do, you are connecting to a database that is used as a transactional one, not designed for easy reporting. You connect to this using some custom sql that may take minutes to bring back the data. Clearly you don't want to wait a minute every time you change a filter.
 
Nothing looks worse to the end user than a sluggish interactive experience. You could have the best looking dashboard, clever Jedi calculated fields, carefully chosen colours and all the charts follow best practise. But, none of that matters to the viewer if when they try to interact nothing seems to happen for seconds. That will be the only thing they notice and remember.

Using an TDE ensures that the data source that Tableau is using is going to provide the fastest user experience possible, which after all what we really want to happen.

But you are looking at stale data, isn't that an issue?

   Not in my experience no. The number of times where someone really really needs to look at totally live data is in fact very small. Mostly people are looking for trends over the last month or year, so the fact that the latest data might be a few hours old isn't going to change the results. If the underlying data isn't changing that often, say twice a day, then you only need to refresh your data twice a day to keep up with it. 

An extract can be set up to be refreshed according to a schedule that you can set, eg daily, hourly, weekly, every 15 minutes. The period that you decide on depends on a couple of factors 
  • How often is new data arriving to your data source that the extract is built on? If its hourly then the extract should be done every 2 hours to ensure it gets the updated data. There is no point having extracts just refreshing when there is no new data. If your warehouse gets built overnight, then refresh once a day, when you know that the warehouse build is complete. 
  • Some extracts can take a long time to build, so you need to make sure that the time between extracts is not so short that it cannot update in time. 
So how do we make an extract?

  • First connect to your datasource 
  • Right click on the name of the datasource and select Extract Data

  • From the next dialogue we can extract just a subset of data based on a filter, aggregate values or only get a number of rows. For our purposes we want everything so click extract

  • We now give the extract a name and save it in our Datasource folder.

  • Tableau now connects to the datasource, grabs all the data and packages up into a TDE for us and saves it to disk. 
  • Now if we look at the data connection we see that the icon has changed to denote we are using an extract. 

  • Now, when we publish the workbook it Tableau will also upload the extract along side the workbook and that will be used for the queries. 
  • Before we upload we need to tell Tableau Server how often to refresh so click on Scheduling & Authentication 

  • Change the  authentication to embedded password if you are connecting to a database.

  • Select your refresh rate and click ok and then publish.

  • You now have a workbook linked to the TDE on Tableau server .
Here's two Jedi tips to make it even better.

Show what database you are connected to and how old the data is

As we are now using an extract its good to tell people how old the data is in the dashboard they are using so if we set the title of our worksheet, not the dashboard title to this.

Now when we view the worksheet we see this in the title

We can now see what datasource we are connected to and when it was last updated. This also works for live connections and i would encourage you to use it everywhere. 

Publish your extract to minimise server overheads. 

If you have two workbooks, lets call them A and B and the both connect to the same TDE and we publish them both then we have two copies of the TDE, say TDE1 and TDE2 on the server which both need to be refreshed ie

A-->TDE1             B-->TDE2

This is a waste of resources and also makes updating dashboards hard as if we want to modify the TDE we have to do it for both copies.

Instead we can publish the TDE to the server and then connect to it like it was any other data source. So that we then have one TDE to refresh and maintain.  ie

A--> TDE <--b class="goog-spellcheck-word" data-="" span="" style="background-color: yellow; background-position: initial initial; background-repeat: initial initial;">blogger
-escaped-div="">
           
  • To do this right click on the extract you want to publish and we get this dialog

  • Select the refresh period we want to use and the authentication to Embed Password and click publish
  • Now when we connect to a new datasource and select Tableau Server we see it in the list of available sources. 
  • Note the icon now changes to a little Tableau logo to tell us we are using an Extract on the Tableau Server. 


Now anyone accessing server can use the same extract, that will refresh its self, be the fastest connection to the data and is easily maintainable. 

Heres a little reminder of when you should use a TDE as opposed to connection live to the date

ScenarioConnect LiveUse Extract
Need up to the second reportingX
Data Is updated hourly, daily etcX
Data source is slow, takes seconds to updateX
Using a fast data warehouseX
Don't need instant updatesX

Hopefully this blog post has shown you why you should use extracts as much as possible with you own Tableau dashboards. Extracts are a great way to optimise the end user experience, which is a vital part of our work. A sluggish dashboard isn't going to make you look good, a snappy fast one will. Extracts allow you to gets around the problem of a slow datasource or a less than optimised database. Publishing extracts to Tableau server makes sharing a curated dataset amongst your users really easy. So you know what everyone is looking at the same few datasources so you know the data is consistent across the sites dashboards.

Matt Francis

Author & Editor

Tableau Zen Master, Social Ambassador, Wrangler of Data, Vizzer of Data

2 comments :

  1. Matt great post (and fun of course)! Not sure you can really make this statement: "Using an TDE ensures that the data source that Tableau is using is going to provide the fastest user experience possible"

    Extracts are great, but I'm finding that pre-aggregating your data before bringing it into Tableau is at times even better, cleaner (and can avoid all the Grand-Total, Sub-Total issues Jonathan has written about).

    Don't get me wrong, you are 99% right about extracts, but my main client want the 'illusion' of live data so the extract advantage has been taken off the table for me.

    Cheers,

    --Shawn

    ReplyDelete
  2. Will creating an extract and publishing it to tableau server and then building dashboards from that tableau extract online create a faster environment for the user on published dashboards?

    Thanks-Luke

    ReplyDelete

 
biz.