Why Extracts are great, and why you should use themSo I am going to lay my stall out early, i love extracts. I use them as much as possible. At work all of the workbooks i create and publish to our Tableau server use extracts. Well that's not entirely true, out of 127 views, I think 6 might look at a live data table. So, you might be asking, why does he do that? Well first, lets look at what data extracts are.
What is a data extract?
A Tableau data extract, lets call them TDE is a cache of data stored locally to your tableau instance (desktop or server) that is optimised for fast queries by Tableau. It tends to be a sub-set of data that also allows you to query the data offline. In fact if you publish anything to Tableau Public, you have already made an TDE.
OK I get that but why use them?
Using an TDE ensures that the data source that Tableau is using is going to provide the fastest user experience possible, which after all what we really want to happen.
But you are looking at stale data, isn't that an issue?
Not in my experience no. The number of times where someone really really needs to look at totally live data is in fact very small. Mostly people are looking for trends over the last month or year, so the fact that the latest data might be a few hours old isn't going to change the results. If the underlying data isn't changing that often, say twice a day, then you only need to refresh your data twice a day to keep up with it.
An extract can be set up to be refreshed according to a schedule that you can set, eg daily, hourly, weekly, every 15 minutes. The period that you decide on depends on a couple of factors
- How often is new data arriving to your data source that the extract is built on? If its hourly then the extract should be done every 2 hours to ensure it gets the updated data. There is no point having extracts just refreshing when there is no new data. If your warehouse gets built overnight, then refresh once a day, when you know that the warehouse build is complete.
- Some extracts can take a long time to build, so you need to make sure that the time between extracts is not so short that it cannot update in time.
So how do we make an extract?
- First connect to your datasource
- Right click on the name of the datasource and select Extract Data
- From the next dialogue we can extract just a subset of data based on a filter, aggregate values or only get a number of rows. For our purposes we want everything so click extract
- We now give the extract a name and save it in our Datasource folder.
- Tableau now connects to the datasource, grabs all the data and packages up into a TDE for us and saves it to disk.
- Now if we look at the data connection we see that the icon has changed to denote we are using an extract.
- Now, when we publish the workbook it Tableau will also upload the extract along side the workbook and that will be used for the queries.
- Before we upload we need to tell Tableau Server how often to refresh so click on Scheduling & Authentication
- Change the authentication to embedded password if you are connecting to a database.
- Select your refresh rate and click ok and then publish.
- You now have a workbook linked to the TDE on Tableau server .
Here's two Jedi tips to make it even better.
Show what database you are connected to and how old the data is
As we are now using an extract its good to tell people how old the data is in the dashboard they are using so if we set the title of our worksheet, not the dashboard title to this.
Now when we view the worksheet we see this in the title
We can now see what datasource we are connected to and when it was last updated. This also works for live connections and i would encourage you to use it everywhere.
Publish your extract to minimise server overheads.
If you have two workbooks, lets call them A and B and the both connect to the same TDE and we publish them both then we have two copies of the TDE, say TDE1 and TDE2 on the server which both need to be refreshed ie
This is a waste of resources and also makes updating dashboards hard as if we want to modify the TDE we have to do it for both copies.
Instead we can publish the TDE to the server and then connect to it like it was any other data source. So that we then have one TDE to refresh and maintain. ie
A--> TDE <--b class="goog-spellcheck-word" data-="" span="" style="background-color: yellow; background-position: initial initial; background-repeat: initial initial;">blogger--b>-escaped-div="">
- To do this right click on the extract you want to publish and we get this dialog
- Select the refresh period we want to use and the authentication to Embed Password and click publish
- Now when we connect to a new datasource and select Tableau Server we see it in the list of available sources.
- Note the icon now changes to a little Tableau logo to tell us we are using an Extract on the Tableau Server.
Now anyone accessing server can use the same extract, that will refresh its self, be the fastest connection to the data and is easily maintainable.
Heres a little reminder of when you should use a TDE as opposed to connection live to the date
|Scenario||Connect Live||Use Extract|
|Need up to the second reporting||X|
|Data Is updated hourly, daily etc||X|
|Data source is slow, takes seconds to update||X|
|Using a fast data warehouse||X|
|Don't need instant updates||X|
Hopefully this blog post has shown you why you should use extracts as much as possible with you own Tableau dashboards. Extracts are a great way to optimise the end user experience, which is a vital part of our work. A sluggish dashboard isn't going to make you look good, a snappy fast one will. Extracts allow you to gets around the problem of a slow datasource or a less than optimised database. Publishing extracts to Tableau server makes sharing a curated dataset amongst your users really easy. So you know what everyone is looking at the same few datasources so you know the data is consistent across the sites dashboards.