Dynamic SQL and Stored Procedures

In this section, we will look at the advantages of using stored procedures over sending SQL statements from the client. We will also look at what happens when you use dynamic SQL in a stored procedure, and show that you lose some of the advantages with stored procedures, whereas other are unaffected.

Caching Query Plans

Every query you run in SQL Server requires a query plan. When you run a query the first time, SQL Server builds a query plan for it – or as the terminology goes – it compiles the query. SQL Server saves the plan in cache, and next time you run the query, the plan is reused. The query plan stays in cache until it’s aged out because it has not been used for a while, or it is invalidated for some reason.

The reuse of cached query plans is very important for the performance of queries where the compilation time is in par with the execution time or exceeds it. If a query needs to run for four minutes, it does not matter much if the query is recompiled for an extra second each time. On the other hand, if the execution time of the query is 40 ms but it takes one second to compile the query, there is a huge gain with the cached plan, particularly if the query is executed repeatedly.

Up to SQL 6.5 the only plans there were put into the cache were plans for stored procedures. Loose batches of SQL were compiled each time. And since the query plan for dynamic SQL is not part of the stored procedure, that included dynamic SQL as well. Thus in SQL 6.5, the use of dynamic SQL nullified the benefit with stored procedures in this regard.

Starting with SQL 7, SQL Server also caches the plans for bare statements sent from a client or generated through dynamic SQL. Say that you send this query from the client, or execute it with EXEC():

SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)

FROM Orders O

JOIN [Order Details] OD ON O.OrderID = OD.OrderID

WHERE O.OrderDate BETWEEN ‘19980201’ AND ‘19980228’

AND EXISTS (SELECT *

FROM [Order Details] OD2

WHERE O.OrderID = OD2.OrderID

AND OD2.ProductID = 76)

GROUP BY O.OrderID

The query returns the total order amount for the orders in February 1998 that contained the product Lakkalikööri. SQL Server will put the plan into the cache, and next time you run this query, the plan will be reused. But only if it is the same query. Since the cache lookup is by a hash value computed from the query text, the cache is space- and case-sensitive. Thus, if you add a single space somewhere, the plan is not reused. More importantly, it is not unlikely that next time you want to run the query for a different product, or a different period.

All this changes, if you instead use sp_executesql to run your query with parameters:

DECLARE @sql nvarchar(2000)

SELECT @sql = ‘SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)

FROM dbo.Orders O

JOIN dbo.[Order Details] OD ON O.OrderID = OD.OrderID

WHERE O.OrderDate BETWEEN @from AND @to

AND EXISTS (SELECT *

FROM dbo.[Order Details] OD2

WHERE O.OrderID = OD2.OrderID

AND OD2.ProductID = @prodid)

GROUP BY O.OrderID’

EXEC sp_executesql @sql, N’@from datetime, @to datetime, @prodid int’,

‘19980201’, ‘19980228’, 76

The principle for cache lookup is the same as for a non-parameterised query: SQL Server hashes the query text and looks up the hash value in the cache, still in a case- and space-sensitive fashion. But since the parameter values are not part of the query text, the same plan can be reused even when the input changes.

To make this really efficient there is one more thing you need to observe. Do you see that I’ve prefixed all tables in the query with dbo? There is a very important reason for this. Users can have different default schema, and up to SQL 2000, all users had a default schema equal to their username. Thus, if default schema for user1 is user1, and this users runs a query that goes “SELECT … FROM Orders”, SQL Server must first check if there is a table user1.Orders, before it looks for dbo.Orders. Since user1.Orders could appear on the scene at any time, user1 cannot share cache entry with a user different default schema. Yes, in SQL 2005 or later, it is perfectly possible that all users have dbo as their default schema, but it seems to be a bad idea to rely on it.

If you instead use stored procedures, it is not equally important to prefix tables with dbo. Microsoft still recommends that you do, but even if you don’t, users with different default schema can share the same query plan.

From what I have said here, it follows that if you use dynamic SQL with EXEC() you lose an important benefit of stored procedures whereas with sp_executesql you don’t. At least in theory. It’s easy to forget that dbo, and if you leave it out in just a single place in the query, you will get as many entries in the cache for the query as there are users running it. Recall also that the cache is space- and case-sensitive, so if you generate the same query in several places, you may inadvertently have different spacing or inconsistent use of case. And this is not restricted to the SQL statement, the parameter list is as much part of the cache entry. Furthermore, since the cache lookup is by a hash value computed from the query text, I would assume that this is somewhat more expensive than looking up a stored procedure. In fact, under extreme circumstances, heavy use of dynamic SQL, can lead to serious performance degradation. Some of my MVP colleagues have observed systems with lots of memory (> 20 GB) when the plan cache has been so filled with plans for SQL statements, that there have been hash collisions galore, and the cache lookup alone could take several seconds. Presumably, the applications in question either did not use parameterised queries at all, or they failed to prefix tables with dbo.

So far, I’ve only talked about dynamic SQL in stored procedures. But in this regard there is very little difference to SQL statements sent from the client, or SQL statements generated in CLR procedures. The same rules apply: unparameterised statements are cached but with little probability for reuse, whereas parameterised queries can be as efficient as stored procedures if you remember to always prefix the tables with dbo. (And still with the caveat that the cache lookup is space- and case-sensitive.) Most client APIs implement parameterised queries by calling sp_executesql under the covers.

In the artice on “SQL Injection article link”, I included an example on how to do parameterised queries with ADO. Here is an example of SqlClient:

cmd.CommandType = System.Data.CommandType.Text

cmd.CommandText = _

” SELECT O.OrderID, SUM(OD.UnitPrice * OD.Quantity)” & _

” FROM dbo.Orders O ” & _

” JOIN dbo.[Order Details] OD ON O.OrderID = OD.OrderID” & _

” WHERE O.OrderDate BETWEEN @from AND @to” & _

” AND EXISTS (SELECT *” & _

” FROM dbo.[Order Details] OD2″ & _

” WHERE O.OrderID = OD2.OrderID” & _

” AND OD2.ProductID = @prodid)” & _

” GROUP BY O.OrderID”

cmd.Parameters.Add(“@from”, SqlDbType.Datetime)

cmd.Parameters(“@from”).Value = “1998-02-01”

cmd.Parameters.Add(“@to”, SqlDbType.Datetime)

cmd.Parameters(“@to”).Value = “1998-02-28”

cmd.Parameters.Add(“@prodid”, SqlDbType.Int)

cmd.Parameters(“@prodid”).Value = 76

In contrast to ADO, SqlClient uses names with @ for parameters. The syntax for defining parameters is similar to ADO, but not identical. This article is long enough, so I will not go into details on how the Parameters collection works. Instead, I refer you to MSDN where both SqlClient and ADO are documented in detail. Whatever client API you are using, please learn how to use parameterised commands with it. Yes, there is a tone of desperation in my voice. I don’t know how many posts I’ve seen on the newsgroups over the years where people build their SQL strings by interpolating the values from input fields into the SQL string, and thereby degrading the performance of their application, and worst of all opening their database to SQL injection.

… and just when you thought you were safe, I need to turn this upside down. Recall what I said in the beginning of this section, that if the query is going to run for four minutes, one second extra for compilation is not a big deal. And if that recompilation slashes the execution time from forty minutes to four, there is a huge gain. Most queries benefit from cached parameterised plans, but not all do. Say that you have a query where the user can ask for data for some time span. If the user asks for a summary for a single day, there is a good non-clustered index that can be used for a sub-second response time. But if the request is for the entire year, the same index would be a disaster, and a table scan is better. Starting with SQL 2005 you can force a query to be recompiled each time it is executed by adding OPTION (RECOMPILE) to the end of the query, and thus you can still use sp_executesql to get the best protection against SQL injection.

For the sake of completeness, I should mention that SQL Server is able to auto-parameterise queries. If you submit:

SELECT OrderID, OrderDate FROM dbo.Orders WHERE CustomerID = N’ALFKI’

SQL Server may recast this as

SELECT OrderID, OrderDate FROM dbo.Orders WHERE CustomerID = @P1

so if next time you submit BERGS instead of ALFKI, the query plan will be reused. Auto-parameterisation comes in two flavours: simple and forced. Simple is the default and is the only option on SQL 2000 and earlier. With simple parameterisation, auto-parameterisation happens only with very simple queries, and, it seems, with some inconsistency. With forced parameterisation, SQL Server parameterises all queries that comes its way (with some exceptions documented in Books Online). Forced parameterisation is, in my opinion, mainly a setting to cover up for poorly designed third-party application that uses unparameterised dynamic SQL. For your own development you should not rely on any form of auto-parameterisation. (But in the situation you really a want a new query plan each time, you may have to verify that it doesn’t happen when you don’t want to.)

They say seeing is believing. Here is a demo that you can try on yourself. First create this database:

CREATE DATABASE many_sps

go

USE many_sps

go

DECLARE @sql nvarchar(4000),

@x int

SELECT @x = 200

WHILE @x > 0

BEGIN

SELECT @sql = ‘CREATE PROCEDURE abc_’ + ltrim(str(@x)) +

‘_sp @orderid int AS

SELECT O.OrderID, O.OrderDate, O.CustomerID, C.CompanyName,

Prodcnt = OD.cnt, Totalsum = OD.total

FROM Northwind..Orders O

JOIN Northwind..Customers C ON O.CustomerID = C.CustomerID

JOIN (SELECT OrderID, cnt = COUNT(*), total = SUM(Quantity * UnitPrice)

FROM Northwind..[Order Details]

GROUP BY OrderID) AS OD ON OD.OrderID = O.OrderID

WHERE O.OrderID = @orderid’

EXEC(@sql)

SELECT @x = @x – 1

END

Now, navigate down to the list of stored procedures. Select all procedures. Then from the context menu select to script them as CREATE TO to a new query window. How long time this takes depends on your hardware, but on my machine it took 90 seconds and at the same time SQL Server grabbed over 250 MB of memory. If you use the Profiler to see what Mgmt Studio is up to, you will see that for each procedure, Mgmt Studio emits a couple of queries with the procedure name embedded. That is, no parameterised statements. Once scripting is complete, issue this command:

ALTER DATABASE many_sps SET PARAMETERIZATION FORCED

and redo the operation. On my machine scripting now completed in five seconds!. This demonstrates that the difference between parameterised and unparameterised can be dramatic. (And that Microsoft can not use their own products properly.) If you run SQL Server on your local machine, you can see this from one more angle, you can stop and restart SQL Server before the two scripting operations, and then use Task Manager to see how much physical memory SQL Server uses in the two cases. That difference lies entirely in the plan cache.

This particular issue has been addressed in SQL Server Management Studio.

Reducing Network Traffic

Another advantage with stored procedures over SQL sent from the client is that less bytes travel the network. Rather than sending a 50-line query over the network, you only need to pass the name of a stored procedure and a few parameters. This gets more significant if the computation requires several queries, possibly with logic in between. If all logic is outside the database, this could mean that data has to travel up to the client, only to travel back in the next moment. With stored procedures you can use temp tables to hold intermediate results. (You can use temp tables from outer layers as well, although it may require some careful use of your client API.)

In this case, the dividing line goes between sending SQL from the client or running stored procedures. If the stored procedures use static SQL only, or invoke dynamic SQL does not matter, nor does it matter if it is a CLR procedure. You still get the gains of reduced network traffic.

Encapsulating Logic

This is not a question of security or performance, but one of good programming practice and modularizing your code. By using stored procedures, you don’t have to bog down your client code with the construction of SQL statements. Then again, it depends a little on what you put into those stored procedure. Myself, I am of the school that the business logic should be where the data is, and in this case there is no dispute that you should use stored procedures to encapsulate your logic.

But there are also people who like to see the database as a unintelligent container of data, and who prefer to have the business logic elsewhere. In this case, the arguments for using stored procedures for encapsulation may not be equally compelling. You could just as well employ careful programming practices in your client language and send SQL strings.

Nothing of this changes if you use dynamic SQL in your stored procedures. The stored procedure is still a container for some piece of logic, and how it looks on the inside does not matter. I’m here assuming that most of your procedures use static SQL only. If all your stored procedures generate dynamic SQL, then you are probably better off in this regard to do it all in client code. Then again, sometimes there is no other application than Query Analyzer or SQL Server Management Studio. (Typically this would be tasks that are run by an admin.) In this case, the only container of logic available is stored procedures, and it’s immaterial whether they use dynamic SQL or not.

Ease of Writing SQL Code

One distinct advantage of writing stored T-SQL procedures is that you get a syntax check directly. With dynamic SQL, a trivial syntax error may not show up until run time. Even if you test your code carefully, there may be some query, or some variation of a query, that is only run in odd cases and not covered in your test suite.

It must be admitted that the strength of this argument is somewhat reduced by the fact that T-SQL is not too industrious on reporting semantic errors. Because of deferred name resolution, SQL Server will not examine queries in stored procedures, where one or more tables are missing, be that misspellings or temp tables created within the procedure. Nevertheless, SQL Server does report sufficiently many errors, for this to be a very important reason to use stored procedures.

Another side of this coin is that when you write dynamic SQL, you embed the SQL code into strings, which makes programming far more complex. Your SQL code is a string delimited by single quotes(), and this string may include strings itself, and to include a single quote into the string you need to double it. You can easily get lost in a maze of quotes if you don’t watch out. (In the article “Good Coding Practices and Tips for Dynamic SQL article link”, we will look a little closer on how to deal with this problem.) The most commonly used client languages with T-SQL – Visual Basic, C#, C++, VBScript – all use the double quote () as their string delimiter, so dynamic SQL in client code or CLR stored procedures is less prone to that particular problem. Then again, in VB you don’t have multi-line strings, so at the end of each line you must have a double quote, an ampersand and an underscore for continuation. It sure does not serve to make coding easier. You are relieved from all this hassle, if you use stored procedures with static SQL only.

Addressing Bugs and Problems

Somewhat surprisingly, one of the strongest arguments for stored procedures today may be that they permit you to quickly address bugs and performance problems in the application.

Say that you generate SQL statements in your application, and that there is an error in it. Or that it simply performs unbearably slow. To fix it, you need to build a new executable or DLL, which is likely to contain other code that also has changed since the module was shipped. This means that before the fix can be put into production, the module must go through QA and testing.

On the other hand, if the problem is in a stored procedure, and the fix is trivial, you may be able to deploy a fix into production within an hour after the problem was reported.

This difference is even more emphasized, if you are an ISV and you ship a product that the customer is supposed administer himself. If your application uses stored procedures, a DBA may be able to address problems directly without opening a support case. For instance, if a procedure runs unacceptably slow, he may be able to fix that by adding an index hint. In contrast, with an application that generates SQL in the client, his hands will be tied. Of course, as an ISV you may not want your customers to poke around in your code, even less to change it. You may also prefer to ship your procedures WITH ENCRYPTION to protect your intellectual property, but this is best controlled through license agreements. (If you encrypt your procedures, the DBA can still change them, if he can find a way to decrypt them.)

In this case, it does not matter whether the stored procedure uses static SQL only, or if it also uses dynamic SQL. For CLR procedures it depends on many objects you have in your assemblies. If you have one assembly per object, installing a new version of a CLR procedure is as simple as replacing a T-SQL procedure.