Tableau — Thanks to Metadata API

Gunarathinam M
2 min readMay 11, 2020

Do you deliver Tableau content with the multi-site environment? Did you came-across to perform impact analysis on Tableau content between Datasource and workbooks? Did you want to audit the usage of a specific data source field across all datasource? Then this article is for you.

We got the below list of challenges on the Tableau server of 240+ sites.

METADATA API
  1. Do Lineage Analysis — Get a list of published data sources and dependent workbooks across all sites?
  2. Get the list of workbooks with disconnected data sources across all sites?
  3. Find all the data source fields prefixed as ‘RLS’ across all data sources and sites?

All challenges are solved using Tableau Metadata API and a few lines of python scripts.

Note: This article gives the approach on how to get the data for analysis but not the exact program.

So, What is Metadata API?

As per official documentation — The Tableau Metadata API enables you to discover and query assets and metadata indexed by Tableau Catalog.

Before further reading, get some basic knowledge on GraphQL,Tableau REST API and basic python scripts on consuming the API(using requests module)

Tableau has an inbuilt GraphiQL tool to develop/test the GraphQL quickly, the same can be enabled in Tableau server.

All the below queries can be executed in GraphiQL tool, post enabling the Metadata API. Access the GraphiQL using the URL — https://<tableau-server>/metadata/graphiql/

#1 Do Lineage Analysis — Get a list of published data sources and dependent workbooks across all sites?

{
publishedDatasources {
name
projectName
downstreamWorkbooks {
name
projectName }
}
}

This query returns all the published data sources and referenced workbooks with the respective project name. Use python script to query all against all sites, parse the JSON response, and format it to CSV for better analysis.

Question #2 Get the list of workbooks with disconnected data sources across all sites?

{
workbooks {
name
projectName
embeddedDatasources {
name
upstreamDatasources {
name
}
upstreamDatabases {
name}
} } }

To get the workbooks with disconnected reference to the data sources, In python script, loop for all sites, in each site, and each workbook, check for the length of upstreamDatasources and upstreamDatabase equals to zero

Question #3 Find all the data source fields prefixed as ‘RLS’ across all sites?

{ workbooks {
projectName
name
sheets{
name
datasourceFields {
name
}}}
}

In Python script, loop for all sites, for each site, parse the JSON response, and check for the name under DatasoureFileds node should start with RLS, process all the nodes matching with this condition and load it to CSV for better Analysis.

References

https://help.tableau.com/current/api/metadata_api/en-us/index.html

https://tc19.tableau.com/learn/sessions/tableau-metadata-api-deep-dive

--

--

Gunarathinam M

Passionate about the Data Analytics and Business Intelligence stack. Technical Architect at CES IT