Thursday, March 13, 2014

Cardinality Related With Performance

In SQL (Structured Query Language), the term cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query.

High-cardinality refers to columns with values that are very uncommon or unique. High-cardinality column values are typically identification numbers, email addresses, or user names.

Normal-cardinality refers to columns with values that are somewhat uncommon. Normal-cardinality column values are typically names, street addresses, or vehicle types.

Low-cardinality refers to columns with few unique values. Low-cardinality column values are typically status flags, Boolean values, or major classifications such as gender.

In set theory, cardinality refers to the number of members in the set. When specifically applied to database theory, the cardinality of a table refers to the number of rows (or tuples) contained in a table.

In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range.

#Cardinality Related With Performance
Taken from forums :

Rama Shankar : The cardinality and join are complex topics which directly relate to SQL performance.

1. What is the role of the cardinality? For. e..g, if I get my cardinality wrong but my Join right, would a reporting query on an attr / analytical view bring back wrong results? or will there be just a performance impact?

Cardinality is used by the SQL engine to determine the optimal query plan to complete execution of a query. Yes, the attr / anal view will bring back wrong results (in context to no.of rows returned and performance). If cardinality is wrong, then your join condition has to be fine tuned. The performance of the join will be impacted if the cardinality / join is wrong.

2. Does the SQL optimizer or the OLAP / Join engine optimize the SQL query generated based on the cardinality rule? Yes, the SQL engine uses the cardinality factor.

Usman Butt : It is better to have low cardinality columns as the last columns of the index and the highest cardinality column at the start of the index. This way the selectivity of the row becomes easier and hence better performance can be gained. So, I would say it is not the matter of performance decrement but it is more the case of there could be room for more performance gain.

No comments:

Post a Comment

Share Your Inspiration...