Looker Studio Distinct is a powerful tool that unlocks data insights and efficiency for businesses. It allows users to create custom reports and dashboards that provide a clear picture of their data.
With Looker Studio Distinct, you can connect to various data sources and create dynamic reports that update automatically. This feature is particularly useful for businesses with large datasets that need to be analyzed regularly.
By using Looker Studio Distinct, businesses can save time and resources by automating report creation and reducing the need for manual data analysis. This leads to increased productivity and better decision-making.
Calculations
In Looker Studio, calculations can be used to manipulate data and create new fields. Looker Studio allows you to create calculations using a variety of functions, including arithmetic operators, string functions, and date functions.
You can use calculations to create new fields in your report, such as a total sales amount or an average rating. Calculations can be used to simplify complex data and make it easier to understand.
Average
Calculating averages can be tricky, especially with denormalized datasets. The average_distinct type is designed to handle this situation by averaging nonrepeated values in a given field.
For example, consider a denormalized table with multiple rows for each order. If you simply use a type: average measure for the order_shipping column, you'll get a value that's not accurate. This is because the average is skewed by the repeated values.
To get an accurate result, you can define how Looker identifies each unique entity using the sql_distinct_key parameter. This parameter ensures that every unique value of sql_distinct_key has just one corresponding value in sql.
For instance, in the example table, every row with an order_id of 1 has the same order_shipping of 10.00, and every row with an order_id of 2 has the same order_shipping of 20.00. This makes it possible to calculate the correct average.
The average_distinct type can also be formatted using the value_format or value_format_name parameters. This allows you to customize the presentation of the calculated average.
To illustrate this, consider the example table again. The median, the middle value, is calculated as (20 + 80)/2 = 50.
Count
Count is a fundamental calculation in LookML that helps you understand the number of unique values in a field. You can use the count_distinct type to calculate this.
The sql parameter for type: count_distinct can take any valid SQL expression that results in a table column. This is useful for measuring unique values in a field.
The count_distinct type makes use of SQL's COUNT DISTINCT function, which is a powerful tool for counting unique values. It's a game-changer for data analysis.
You can add a filter to a measure of type: count_distinct using the filters parameter. This allows you to narrow down the data and focus on specific subsets.
For example, you can create a field number_of_unique_customers, which counts the number of unique customer IDs. This is a great way to understand customer diversity.
Database and Data
To use Looker Studio's distinct features, you need to consider your database dialect. Looker supports median_distinct and percentile_distinct types in specific dialects.
Looker supports median_distinct in Google BigQuery Legacy SQL, Google BigQuery Standard SQL, Google Cloud PostgreSQL, Google Cloud SQL, Greenplum, MariaDB, MySQL, and PostgreSQL 9.5+.
The supported dialects for percentile_distinct are not explicitly listed in the article section, but we can infer that it's likely similar to median_distinct.
Here's a list of dialects that support median_distinct:
- Google BigQuery Legacy SQL
- Google BigQuery Standard SQL
- Google Cloud PostgreSQL
- Google Cloud SQL
- Greenplum
- MariaDB
- MySQL
- PostgreSQL 9.5+
To prepare data for unique pageviews, you can use a Common Table Expression (CTE) with a WITH...AS... statement. This will help you create a virtual table to store your data before calculating the desired value.
Percentiles
Percentiles are a way to measure the middle or extreme values in a dataset, but Looker has a special consideration for fields involved in fanouts. Looker will attempt to use percentile_distinct instead of percentile if the field is involved in a fanout.
To use percentile_distinct, you should specify how Looker should identify each unique entity by using the sql_distinct_key parameter. This is especially important when dealing with joins that involve fanouts, like when each order maps to several order items.
The percentile_distinct type finds the percentile value using the distinct values in a given field, based on the unique values defined by the sql_distinct_key parameter. If the measure doesn't have a sql_distinct_key parameter, Looker tries to use the primary_key field.
Median
When dealing with multiple values, the median is a key percentile to consider. It's particularly useful in cases where you have a fanout, where one entity maps to multiple others. This is where the median_distinct type comes in handy.
The median_distinct type averages the nonrepeated values in a given field, based on the unique values defined by the sql_distinct_key parameter. If the measure doesn't have a sql_distinct_key parameter, Looker tries to use the primary_key field.
This type is especially useful in situations like the one described in the Order Item and Order tables example, where there are multiple rows for each order. The median_distinct takes this into consideration and finds the median between the distinct values, giving you a more accurate result.
To get an accurate result, you need to define how Looker identifies each unique entity using the sql_distinct_key parameter. This ensures that every unique value of sql_distinct_key has just one corresponding value in the measure's sql parameter.
For example, in the Order Item and Order tables example, every row with an order_id of 1 has the same order_shipping of 10, every row with an order_id of 2 has the same order_shipping of 20, and so on. This makes it easier to calculate the correct amount.
Here's a summary of the key takeaways:
- The median_distinct type averages the nonrepeated values in a given field.
- The sql_distinct_key parameter is used to identify unique entities.
- Every unique value of sql_distinct_key must have just one corresponding value in the measure's sql parameter.
Database Dialects for Median
When working with percentiles, it's essential to consider the database dialects that support the median_distinct type. For Looker to support the median_distinct type in your project, your database dialect must also support it.
The latest release of Looker supports the median_distinct type in the following database dialects: Amazon Aurora MySQL, Google BigQuery Legacy SQL, Google BigQuery Standard SQL, Google Cloud PostgreSQL, Google Cloud SQL, Greenplum, MariaDB, MySQL, MySQL 8.0.12+, and PostgreSQL 9.5+.
These dialects are capable of handling the median_distinct type, which is crucial for accurate percentile calculations. The median_distinct type is a specific type of data that requires support from the database dialect to function correctly.
If you're using a database dialect that is not on this list, you may need to consider alternative methods for calculating percentiles. However, if you're using one of the supported dialects, you can rest assured that Looker will be able to support the median_distinct type.
Here is a list of the supported database dialects:
Percentile
Percentile is a type of calculation that helps you understand your data in a more meaningful way. It's used to find the value at a specific point in a dataset, like the 25th or 75th percentile.
To use percentile, you need to consider a few things, especially if your field is involved in a fanout. Looker will try to use percentile_distinct instead, but only if it's available for the dialect.
Percentile_distinct is a specialized form of percentile that's perfect for fanouts. It uses the nonrepeated values in a field, based on the unique values defined by the sql_distinct_key parameter.
If the measure doesn't have a sql_distinct_key parameter, Looker will try to use the primary_key field instead. This is important to note, as it can affect the accuracy of your results.
The percentile_distinct type takes into account the distinct values in a field, like 10, 20, 50, 70, and 110. It's useful for finding the value at a specific percentile, like the 25th or 80th percentile.
To get an accurate result, you need to specify how Looker should identify each unique entity. This is done using the sql_distinct_key parameter, which must have just one corresponding value in the measure's sql parameter.
For example, if you're trying to find the value at the 90th percentile, you can use the percentile_distinct type. Every unique value of sql_distinct_key must have just one corresponding value in the measure's sql parameter.
Frequently Asked Questions
How do you compare data in Looker Studio?
To compare data in Looker Studio, edit your report and select a chart, then choose a metric and adjust the comparison calculation in the Properties panel. This allows you to easily analyze and visualize differences between data sets.
How to remove duplicates in Looker Studio?
To remove duplicates in Looker Studio, use the COUNT_DISTINCT function, which ignores duplicates and counts unique instances within a field. Modify a field's Aggregation type to Count Distinct in the data source for effective duplicate removal.
How to count distinct in Looker Studio?
To count unique items in Looker Studio, use COUNT_DISTINCT or APPROX_COUNT_DISTINCT, or change a field's Aggregation type to Count in the data source or edit the field's aggregation in a chart. Learn more about applying COUNT_DISTINCT in Looker Studio.
What is sum distinct in Looker?
Sum distinct in Looker calculates the total of unique values in a field, ignoring repeated values, to prevent miscalculations in fanout scenarios. This is achieved by specifying a unique identifier, or sql_distinct_key, to determine the basis for distinct values
Sources
- https://cloud.google.com/looker/docs/reference/param-measure-types
- https://community-forums.domo.com/main/discussion/67239/sum-distinct-not-working-correctly-in-the-total
- https://docs.dataddo.com/docs/looker-studio
- https://www.linkedin.com/pulse/maximizing-your-websites-potential-guide-counting-unique-pawlowski
- https://www.cdata.com/kb/tech/ganalytics-cloud-google-data-studio.rst
Featured Images: pexels.com