Optimize SELECT statements

You can change the meaning of the query by adding parentheses to force evaluation of the

first. The following query finds only products under models 20 and 21 that are red.

Using parentheses, even when they aren’t required, can improve the readability of queries, and

reduce the chance of making a subtle mistake because of operator precedence. There is no

significant performance penalty in using parentheses. The following example is more readable

than the original example, although they are syntactically the same.

A

statement is non-procedural; it doesn’t state the exact steps that the database server

should use to retrieve the requested data. This means that the database server must analyze

the statement to determine the most efficient way to extract the requested data. This is

referred to as optimizing the

statement. The component that does this, is called the

Query Optimizer. The input to the Query Optimizer consists of the query, the database schema

(table and index definitions), and the database statistics. The output of the Query Optimizer is a

query execution plan, sometimes referred to as a query plan, or execution plan. The contents of

an execution plan are described in more detail later in this article.

The inputs and outputs of the Query Optimizer during optimization of a single

statement are illustrated in the following diagram:

The sequence in which the source tables are accessed.

The methods used to extract data from each table.

The methods used to compute calculations, and how to filter, aggregate, and sort data

from each table.

A

statement defines only the following:

The format of the result set. This is specified mostly in the select list. However, other

clauses such as

and

also affect the final form of the result set.

The tables that contain the source data. This is specified in the

clause.

How the tables are logically related for the purposes of the

statement. This is

defined in the join specifications, which might appear in the

clause or in an

clause following.

The conditions that the rows in the source tables must satisfy to qualify for the

statement. These are specified in the

and

clauses.

A query execution plan is a definition of the following:

Typically, there are many sequences in which the database server can access the base

tables to build the result set. For example, if the

statement references three tables,

the database server could first access

, use the data from

to extract

matching rows from

, and then use the data from

to extract data from. The other sequences in which the database server could access the tables are:

,

,

, or

,

,

, or

,

,

, or

,

,

Generally, there are different methods for accessing the data in each table. If only a few

rows with specific key values are required, the database server can use an index. If all the

rows in the table are required, the database server can ignore the indexes and perform a

table scan. If all the rows in a table are required but there is an index whose key columns

are in an

, performing an index scan instead of a table scan might save a

separate sort of the result set. If a table is very small, table scans might be the most

efficient method for almost all access to the table.

As data is accessed from tables, there are different methods to perform calculations over

data such as computing scalar values, and to aggregate and sort data as defined in the

query text, for example when using a

or

clause, and how to filter data,

for example when using a

or

clause.

The process of selecting one execution plan from potentially many possible plans is referred to

as optimization. The Query Optimizer is one of the most important components of the

Database Engine. While some overhead is used by the Query Optimizer to analyze the query

and select a plan, this overhead is typically saved several-fold when the Query Optimizer picks

an efficient execution plan. For example, two construction companies can be given identical

blueprints for a house. If one company spends a few days at the beginning to plan how they

will build the house, and the other company begins building without planning, the company

that takes the time to plan their project will probably finish first.

The SQL Server Query Optimizer is a cost-based optimizer. Each possible execution plan has an

associated cost in terms of the amount of computing resources used. The Query Optimizer

must analyze the possible plans and choose the one with the lowest estimated cost. Some

complex

statements have thousands of possible execution plans. In these cases, the

Query Optimizer doesn’t analyze all possible combinations. Instead, it uses complex algorithms

to find an execution plan that has a cost reasonably close to the minimum possible cost.

The SQL Server Query Optimizer doesn’t choose only the execution plan with the lowest

resource cost; it chooses the plan that returns results to the user with a reasonable cost in

resources and that returns the results the fastest. For example, processing a query in parallel

typically uses more resources than processing it serially, but completes the query faster. The

Query Optimizer will use a parallel execution plan to return results if the load on the

server won’t be adversely affected.

The SQL Server Query Optimizer relies on distribution statistics when it estimates the resource

costs of different methods for extracting information from a table or index. Distribution

statistics are kept for columns and indexes, and hold information on the density

of the

underlying data. This is used to indicate the selectivity of the values in a particular index or

column. For example, in a table representing cars, many cars have the same manufacturer, but

each car has a unique vehicle identification number (VIN). An index on the VIN is more

selective than an index on the manufacturer, because VIN has lower density than manufacturer.

If the index statistics aren’t current, the Query Optimizer might not make the best choice for

the current state of the table. For more information about densities, see

Statistics.

Density defines the distribution of unique values that exist in the data, or the average number

of duplicate values for a given column. As density decreases, selectivity of a value increases.

The SQL Server Query Optimizer is important because it enables the database server to adjust

dynamically to changing conditions in the database without requiring input from a

programmer or database administrator. This enables programmers to focus on describing the

final result of the query. They can trust that the SQL Server Query Optimizer will build an

efficient execution plan for the state of the database every time the statement is run.

1

1

Estimated Execution Plan

Actual Execution Plan

Live Query Statistics

OR

SELECT

SELECT

SELECT

SELECT
ProductID, ProductModelID
FROM
Production.Product
WHERE (ProductModelID = 20
OR
ProductModelID = 21)
AND
Color =
'Red'
;
GO
SELECT
ProductID, ProductModelID
FROM
Production.Product
WHERE
ProductModelID = 20
OR (ProductModelID = 21
AND
Color =
'Red'
);
GO

SELECT

ORDER BY
GROUP BY

FROM

SELECT

WHERE

ON

FROM

SELECT

WHERE

HAVING

SELECT

TableA

TableA

TableB

TableB

TableC

TableC

TableB

TableA

TableB

TableA

TableC

TableB

TableC

TableA

TableC

TableA

TableB

ORDER BY
GROUP BY
ORDER BY

WHERE

HAVING

SELECT