Use extended events to show your actual io patterns

I have spent a little too much of my life worried about the performance of IO subsystems and sql server, I worry slightly less now ssd’s are more available but I still worry!

I/O patterns are pretty well understood, tools like CrystalDiskMark let you test a subsystem using random and sequential reads and writes of various sizes. Run one of these tools and then put your feet up, you know what performance you will get, job done.

If you don’t want to put your feet up you can investigate what your actual IO patterns are,to do this I like to use extended events and for this there are two events I am interested in:

  • sqlserver.file_read_completed
  • sqlserver.file_write_completed

When you use these you get some really interesting values, notably:

  • mode – this is really cool as it tells you whether the individual IO was completed using the Scatter/Gather API or whether it was Contiguous, I believe that Scatter/Gather is used by the buffeer pool to read pages and Contiguous is used to write log files – but I have no actual evidence of that!
  • duration – the time the IO took – this is really useful in its own right
  • database_id – the database
  • file_id – the data file id – we can use this to group requests to individual files
  • offset – the location of the IO
  • size – the size of the IO

By simply gathering the data and using the database_id, file_id, offset and size we can easily see for each database and file, bear in mind if you are profiling an application you may want to have filtered it to just the database you are interested in, the three important things:

  • Size
  • Type – Read or write
  • Random or Sequential

Random or Sequential we need to work out for ourselves but it is just a simple calculation of offset + size to each operation.

How do we get the data?

If you setup an extended events trace, to do this either use the wizard and add file_read_completed and file_write_completed or use this script:

CREATE EVENT SESSION [IO Patterns] ON SERVER
ADD EVENT sqlserver.file_read_completed(
ACTION(sqlserver.sql_text)),
ADD EVENT sqlserver.file_write_completed(
ACTION(sqlserver.sql_text))
ADD TARGET package0.event_file(SET filename=N’C:\SqlTrace\IOPatterns.xel’)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

I like to include sql_text, I realise it isn’t for everyone but it makes me feel really really good to know exactly what caused a specific IO!

If you use the wizard, find these two events:

from the new extended events session choose read and write completed

 

However you choose to create the session, start it and generate some IO, if you are testing you may need to use “DBCC DROPFREEBUFFERS” to cause some IO, SQL won’t read from disk unless it really needs to!

When you have generated your IO then you will need a query to get the data, something along the lines of this should work nicely (if slowly!):

declare @file nvarchar(255)
select @file = N’c:\SqlTrace\IOPatterns*.xel’

select
event_xml.value(‘(./@timestamp)[1]’, ‘datetime2’) as timestamp,
event_xml.value(‘(./@name)’, ‘varchar(1000)’) as event_name,
event_xml.value(‘(./data[@name=”database_id”]/value)[1]’, ‘int’) database_id,
event_xml.value(‘(./data[@name=”file_id”]/value)[1]’, ‘int’) as file_id,
event_xml.value(‘(./data[@name=”offset”]/value)[1]’, ‘int’) as offset,
event_xml.value(‘(./data[@name=”size”]/value)[1]’, ‘int’) as size,
event_xml.value(‘(./data[@name=”mode”]/value)[1]’, ‘int’) as mode,
event_xml.value(‘(./data[@name=”duration”]/value)[1]’, ‘int’) as duration,
event_xml.value(‘(./action[@name=”sql_text”]/value)[1]’, ‘varchar(4000)’) as sql_text
from (select cast(event_data as xml) xml_event_data
from sys.fn_xe_file_target_read_file(@file, null, NULL, NULL)) as event_table
cross apply xml_event_data.nodes(‘//event’) n (event_xml)

This generates output like:

output from extended events file_read_completed and file_write_completed

output from extended events file_read_completed and file_write_completed

 

The time this works best for me is if I am spec’ing out a new system for an existing application, normally when the old hardware is end of life and is being replaced and I can get a trace of an actual production system and see what it’s IO characteristics are.

Alternatives

The alternatives are to use procmon, this is how I used to do it but now we can tie IO requests to actual queries so you can forget about procmon in my mind!

The other thing that tells you about IO’s is the dynamic management function sys.dm_io_virtual_file_stats and it tells you alot but it doesn’t help me worry less about IO patterns!

SSDT & tSQLt – a serious double act!

Using SSDT to develop T-SQL code is really nice, although in my opinion it isn’t as nice as developing C# yet, it is a great improvement on using Sql Management studio alone. In this day and age all production code should be unit and integration tested and tSQLt is a great solution to help unit test T-SQL code and the two technologies together really make an excellent pair and give us all a great way to develop quickly and with the minimum amount of bugs, if your testing is done correctly.

I find that there are often parts of a database project which naturally feel like they should be done in Visual Studio SSDT such as creating tables and stored procedures and then some parts which work better in SSMS, particularly running tSQLt unit tests with the Redgate Sql Test tool. This doesn’t need to be the case and it is possible to both write and maintain a database schema completely from within Visual Studio which, I think, is one of the best development environments available.

In this post I am going to talk about how I structure my visual studio solutions and how the different parts interact to create a modern development workflow.

The major components in the projects I create are:

  • Database Schema project
  • Database tSQLt unit tests project, either one for all tests and the tSQLt framework or multiple depending on the size of the project
  • NUnit test runner to run the individual tests.

The NUnit test runner is not strictly necessary but I find it means that I can write and run tests from within visual studio (I would highly recommend resharper to help here) and the tests can be run the same way on the Continuous Integration server. There is a tSQLt plugin for team city but there is a NUnit test runner for pretty much every single build system out there.

Database Schema project

This is simple, create a sql server project and use it to write you schema and T-SQL code!

Database tSQLt unit tests project

We then need somewhere we can put our tests, I like to keep them separate from the production code as it means you can validate the production code and run things like the T-SQL Smells tool (http://dataidol.com/davebally/2014/01/28/tsql-smells-in-SSDT-the-monster-lives/) against them without worrying about test code. To be clear I am not saying the test code should be poorly written, just that we can ensure stricter controls over production code and ensure a separation boundary.

To get the tSQLt objects into an SSDT project I can deploy, I initially deployed them to an empty database and the used the schema compare in SSDT to bring the objects into an SSDT project:

tsqlt in ssdt project

NUnit test runner

You need to have some way to run your tests, some people include a post-deploy script which calls the unit tests but I prefer to spend the extra time writing wrapper methods using NUnit so that I can run the tests either as a whole, as a class or individually. I guess that I really prefer using NUnit because I have used it a lot in my c# development and the resharper add-in for running unit tests is really nice.

To help use NUnit I typically write a test helper in C# which creates a Sql connection and calls tSQLt.Run with the name of the test to run and ensure I read in all “InfoMessages”. When you run a tSQLt test you get some messages printed to the output window, including how many tests were run successfully which you need to check, rather than relying on there being 0 errors, there must also be 1 success. You may also get an exception which will be caught by NUnit and used to report that particular test as a failure.

The helper class looks like:

TestGateway.RunTest Helper

Those are the major components and now to add a test, I just need to create a new schema that will be used to store the tests. In order that tSQLt knows to treat a schema as a test class you need to add the extended property:

EXECUTE sp_addextendedproperty @name = N’tSQLt.TestClass’, @value = 1, @level0type = N’SCHEMA’, @level0name = N’HumanResourcesTests’;

You will need to change the level0name to the name of your test schema.

If the unit test and main schema are in different databases then I will also need to add a database reference and use the fully qualified name to get to the other database, this is really trivial and means you get intellisense in SSDT and can run the procs in the same database when deployed.

I am then free to add a test, the new test should be in the schema of the test class, in this case HumanResourcesTests. The test name should also begin with the word “test”:

 

tsqlt test

 

Finally I add the NUnit test so that the test helper can be called to run the actual test:

nunit test to run tSQLt test

 

Running Tests

To mock or fake a table tSQLt currently only supports tables in the same database so you need to deploy all of your code into the same database. You should ideally create a new database, insert any needed data and fake any tables you will interact with and then destroy or have some system to tidy up old databases.

To run the tests you need to deploy the main schema, then deploy the tSQLt project and your tests, when they have been deployed you can then run the NUnit tests, hopefully you are using TDD to write your T-SQL code so the test will fail, you then write the code to make it complete then re-deploy and re-run the tests and you should see little green ticks:

green tests

Including these tests into a continuous build process is quite simple, you just need to get your build system to use the dacpac files created by the SSDT projects, deploy those and then use whatever calls the NUnit tests to call these tests. Because you are using NUnit to run the tests, any failures and messages will work as if the tests were pure c# code.

I am sure there are lots of other ways of working with SSDT and tSQLt but this is how I use it and it feels really good to write, clean, unit tested T-SQL in a modern development environment (intellisense, find references etc).

I have put the sample project in this demo on github, feel free to grab it:

https://github.com/GoEddie/SSDT-tSQLt-Example

 

Get told about explicit & implicit conversions in Sql 2012+ using extended events

Sql Server  2012 introduced a new extended event warning, which tells you when you get a plan with a conversion in it that causes scans instead of seeks so that you know when a plan is used that could have been better if the data types were correct. This is a massive performance issue and will essentially make a filter or join non-sargable causing a table or index scan rather than a seek. The issues of implicit conversions has been covered many times before so I won’t go over it here (see http://sqlperformance.com/2013/04/t-sql-queries/implicit-conversion-costshttps://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/ or even http://lmgtfy.com/?q=performance+issues+with+implicit+conversions)

The new extended event is called “sqlserver.plan_affecting_convert” and fires every time there is a conversion in a plan that stops sql from doing a seek. There are a number of different plan_affecting_converts so check the ConvertIssue attribute for “Seek Plan” for this particular issue.

To demo this, I have created a table with a varchar datatype and a unique index on the column so a seek can be performed, however I use an nvarchar to search the column which means that every single value in the column needs to be converted from a varchar to a nvarchar before the comparison can take place, the plan looks like this and there is a warning to say that it isn’t very good:

Query and Execution Plan with Seek Warning

Now if you ware writing some T-Sql and you see this then you can change it there and then but if the code is loose on the server, previously to this event you need to do things like search the plan cache for implicit conversions remembering, there is a good example of this here:

 http://sqlblog.com/blogs/jonathan_kehayias/archive/2010/01/08/finding-implicit-column-conversions-in-the-plan-cache.aspx

and

http://www.brentozar.com/blitz/implicit-conversion/

Now with this extended event you can sit and wait and get notified when it happens, this lets you be much more pro-active in finding performance issues on your Sql Server. To create the event, either use the wizard to create the session or do something like:

 

CREATE EVENT SESSION [plans_with_conversions] ON SERVER
ADD EVENT sqlserver.plan_affecting_convert(
ACTION(package0.callstack,sqlos.cpu_id,sqlserver.client_app_name,sqlserver.client_connection_id,sqlserver.database_name,sqlserver.plan_handle,sqlserver.server_principal_name,sqlserver.sql_text,sqlserver.tsql_frame,sqlserver.tsql_stack))
WITH (STARTUP_STATE=OFF)

 

You will then get alerts that look like:

 

Convert Warning Extended Event

From this you have the text that caused the issue and the plan handle so you can track down the query that is being so mean to your cpu (and probably i/o subsysten) and make it play nice!

Hapy Extended Eventing!

Querying the DacFx API – Getting Column Type Information

If you are using the DacFx api, it is easy to get a table, you just create a TSqlModel object with your model in and call GetObject or GetObjects and you can get to your TSqlObject table, then to get to the list of columns there are a couple of ways you can do it, you can call the table’s GetChildren which returns all the children on the table and in amongst that lot is one or more Columns:

foreach (var child in table.GetChildren())
{
Console.WriteLine(child.Name + ” : ” + child.ObjectType.Name);
}

This will print out a list of all the children on the table, the Name property is the column name and the ObjectType is the type of object, we want columns so we need to check that ObjectType is the same as ModelSchema.Column. As an aside, the ModelSchema class is very useful as it is a way to query for specific types or to know what type of TSqlObject it is you have. If you need to find out how something works, go to the ModelSchema to get started!

The second way is to query the API using GetReferencedRelationshipInstances which lets you specify the type of objects that you want, in this case we just want the Columns relationships and because we have a Table object, we use the Table.Columns ModelRelationshipClass which returns all the Columns that this table directly references:

foreach (var child in table.GetReferencedRelationshipInstances(Table.Columns))

When we query for relationships, we get a list of relationships (who would have guessed), a relationship looks something like this:

FromObject – this is the table we asked for the references columns from

Object – this is the TSqlObject for the column

ObjectName – the name of the column

Relationship – describes how they are related, in this case it is actually the Table.Columns class (get used to seeing things go round and round in circles with this api!)

Now we have the columns we can query each one to find the actual data type, to do this we query the Column for it’s referenced objects, this time not a related object like a column to a table but just something that is referenced by the column, so we use Column.DataType which gets us an IEnumerable list of the 1 datatype that the column is:

var types = child.Object.GetReferenced(Column.DataType)

The Name property of the type gives the column type such as varchar or tinyint and then the object has some properties you can query to get whether the column is nullable or a max datatype and the length, scale and precision:

var type = child.Object.GetReferenced(Column.DataType).FirstOrDefault();
var isNullable = type.GetProperty<bool?> (DataType.UddtNullable);
var length = type.GetProperty<int?>(DataType.UddtLength);

I guess the important thing to know here is that you should always know what type of TSqlObject you have and as long as you query the API using the GetReferenced* and GetProperty methods and you use the ModelSchema class to get the top level objects, it is easy to get to the properties you need, so in this example using ModelSchema.Table, then Table.Columns then Column.DataType and finally DataType.XX you can easily get the column type.

In my example I enumerate all the objects, it can be simplified (I am not sure I mean simplified!) using linq and something like:

var dataType = table.First()
.GetReferencedRelationshipInstances(Table.Columns)
.First(p => p.ObjectName.ToString() == “[dbo].[AWBuildVersion].[SystemInformationID]”)
.Object.GetReferenced(Column.DataType)
.First()
.Name;

(In case you recognise the table – Yes I do use a dacpac for AdventureWorks for testing!)

As always there is a demo project in github https://github.com/GoEddie/DacFxApi – this example is the ShowTableColumns example.

 

Enjoy!

Dacpac Explorer

I have been playing around with the DacFx Api alot recently and am finishing  off the week with a final demo, the dacpac explorer:

https://github.com/GoEddie/DacpacExplorer/raw/master/download/DacPacExplorer.zip

Github repo is here

What this does is open a dacpac file and show an explorer like interface to let you see what is inside the dacpac without having to manually extract it and then go searching for what you want, I will write a further blog on how it uses the DacFx api but I have tried where possible to use the model to make displaying objects as easy as possible so for example to show all the properties of an object I use:

foreach (var property in item.ObjectType.Properties.OrderBy(p => p.Name))
{
var val = property.GetValue<object>(item);
if (val == null)
{
val = “NULL”;
}

Console.WriteLine(“Property Name: {0} Value: {1}”, property.Name, val.ToString());

}

item in this case is of course our good olf friend TSqlObject.

One extra thing that the Dacpac Explorer does is to give you the option to validate the model when it opens it, this will then give you a list of any warnings or errors (of course if there are no errors then it let’s you know that!), to use the demo app you just choose the dacpac file and choose open, if you want to validate the model then tick the checkbox:

browse

When you then open the dacpac you get a nice (bear in mind I am a developer and data guy and not a designer! ) explorer like interface for viewing the dacpac contents, I have used the adventure works schema as an example:

explore

 

 

This is just a small example of what we can do with the DacFx API, I will write some blogs about how it all works as there are some things that weren’t immediately obvious to me how to do things like getting a columns data type and the columns that belong to an index, but once you see a simple example it is pretty straightforward.

 

Have you used the DacFx Api to write a tool around Sql Server? Share it! Have a request? Share it!

 

Counting logical lines of T-Sql code in a DacPac

There are two general ways of counting lines of code, the first is basically just count the lines in each file that make up a project, the second is to count the logical statements as specified by the language.

Counting lines of T-SQL code is typically quite hard as, personally, I am interested in things like stored procedures and functions and their statements as a metric for lines of code rather than including columns and tables as, for me, they are separate.
I thought this would be a great example of how to use the DacFx API and the T-Sql script dom together to create a tool that I, and hopefully the other person who has visited this blog, would find useful!

What this tool does is count the logical lines of code in any stored procedure, function (scalar or table-valued) and dml triggers, if anyone thinks of other items of code then it is open source so feel free to check it out, add it in and submit a pull request, it is really trivial to add other types of code.

To count the number of statements, I use the T-Sql visitor class to break the each code file into statements.

Feel free to download the tool from:

https://raw.githubusercontent.com/GoEddie/DacFxApi/master/Download/CountLinesOfSqlCodeInDacPac.zip

The source code is in my DacFx Api github repro:

https://github.com/GoEddie/DacFxApi

How does this work?

The first step is to open the dacpac file:

var model = new TSqlModel(fileName, DacSchemaModelStorageType.File);

I then need to query the model for the Objects of the specific type I am looking for:

model.GetObjects(DacQueryScopes.Default, Procedure.TypeClass)
model.GetObjects(DacQueryScopes.Default, ScalarFunction.TypeClass)

Each of these return an IEnumerable<TSqlObject> which I pass to a helper function to query for the name and the script:

Name = codeUnit.Name.ToString()
var script = “”;
if (codeUnit.TryGetScript(out script))

return script;

Once I have the names and contents of all the scripts I am interested in, I need to parse the scripts and break them into statements. This is made really simple using the TSqlFragmentVisitor class.

The way the TSqlFragmentVisitor class works is that you create a class that inherits from it:

public class SqlVisitor : TSqlFragmentVisitor
{
}

There are then a set of methods you can override which are called everytime a statement of that type is encountered, so for example if you use:

public override void Visit(TSqlStatement node)
{
}

This will be called for every statement, it is what I use in this tool as I don’t care what type of statement it is, as long as it is a statement.

There are other override-able methods such as:

public override void ExplicitVisit(SelectStatement node)

This is only called for select statements so it really is powerful and so simple to use!

Once I have the statement count, I tally up the lines of code for all code units and I end up with the total lines of logical code for just code units in a dacpac file!

This is obviously really basic, if you have an example where the lines of code is incorrect, send it to me and I will fix the tool or fix it yourself and send a pull request on github!

Happy DaxFx + TransactSql ScriptDom’ing!

 

p.s. to see how many lines in each code unit, use the /detailed switch!

Dac Fx Api – Use it, write better tools for Sql Server!

The DacFx Api is an api for querying the object model that both DacPac files and SSDT use, it is a giant leap forward in the available tooling for Sql Server and hopefully there will be a raft of great tools to make T-SQL development better.

I really enjoy writing c# code. I typically use visual studio with Resharper and add NUnit and Moq to my test projects and this setup makes developing and testing really simple and removes a lot of the boring repetition you sometimes get. I also love developing T-SQL but the development environment is nowhere near as smooth.

When I first started looking at the DacFx api, I was a little confused as to how it worked, even after reading the introductory blog (http://blogs.msdn.com/b/ssdt/archive/2013/12/23/dacfx-public-model-tutorial.aspx) and when I finally realised what the implications were of the painless sounding sentence “The public model API is loosely typed: the TSqlModel contains loosely typed TSqlObjects that represent all the elements in your schema.” my heart sunk a little as it felt very similar to every object being returned as a System.Object and then having to cast it to the correct type (you don’t by the way, you leave it as the TsqlObject).

What this means is that all objects in the model, a table, schema, procedure etc are all the same type, a TSqlObject. To get any information such as what columns belong to what table or index or whether the index is clustered or not you need to query the objects and ask for the information you want, in reality what you have with the DacFx Api is a queryable set of classes, you just have to know what you have and what you want and although it is a different paradigm to pretty much every API I have ever seen, it is fairly straight forward. I have put an example application which opens a dacpac file and dumps out a list of indexes which gives an overview of how to use the api to get the information you want.

To open a dacpac file and get a TSqlModel object which is the entry point to the API you just need to call:

var model = new TSqlModel(filePath, DacSchemaModelStorageType.File);

When you have the model you can then query it for all the objects you would like, if you want all the indexes in the model just do:

model.GetObjects(DacQueryScopes.Default, Index.TypeClass)

GetObjects searches for and returns a list of specific object types, the query scope means all objects, you can limit it to just system objects or just user objects etc. Finally the TypeClass causes a list of just Indexes to be returned.

The way the TypeClass works is that every object such as a table, index, column etc has a TypeClass and also on the static class that has the TypeClass such as Index or Table there are a number of properties which allow you to access the actual properties you want. If this sounds a little strange, I think it is, but it is fairly straight-forward with an example! We will call the class that holds the TypeClass and other property classes the container class as that is what msdn seems to call them. If you retrieve a list of all the indexes and you have a TSqlObject that you got by using the Index.TypeClass (as above) then you can query the index for its properties. The Index container class has, amongst others, these properties:

Index.TypeClass

Index.AllowPageLocks

Index.Clustered

You can then use these properties to get the value you want from your index TSqlObject instance, so if you do:

foreach (TSqlObject index in model.GetObjects(DacQueryScopes.Default, Index.TypeClass ){

var isClustered = index.GetProperty<bool?>(Index.Clustered)

}

You will get a nullable bool saying whether the index is clustered or not. If you do not know the type of the object you want you can call the non-generic version index.GetProperty(Index.Clustered). The example at: https://github.com/GoEddie/DacFxApi/tree/master/DumpIndexProperties (repo: https://github.com/GoEddie/DacFxApi/) shows some extra information like how to query the relationships to get the columns and included columns that are on the index (in, on, used by??) and how to generate the T-SQL script to deploy the index.

I strongly believe that as Sql Server developers we have the ability to really help move our tooling forward and if you throw into the mix the TSql scripting stuff, the next few years are really going to be exciting times for us Sql devs!

Setting breakpoints in SSMS

Setting breakpoints in SSMS is difficult and a bit of a pain, if you use the object explorer to open a procedure or function and set a breakpoint, SSMS will happily show it as being set but when you debug the code, you get nothing. The reason is that what you need to set the breakpoint on is a temporary file which only SSMS knows about. The file you have set the breakpoint, although looks right is like creating a csharp file on your desktop with the same text as in a code file in a solution, setting a breakpoint and then running your solution and wondering why the debugger didn’t stop in the right place.

This shows a breakpoint which looks like it is set but it is not actually set:

breakpoint will never be hit :(

breakpoint will never be hit 😦

Breakpoint will not be hit, notice the name of the file SQLQuery1.sql.

There are a couple of ways to set the breakpoint in the actual file, the first is to debug your code and step into or F11 into the place where you want the breakpoint, when you do that, SSMS opens the code file for you but instead of using a name like SQLQuery1.sql it opens a mssql:// path:

breakpoint will always be set :)

breakpoint will always be set 🙂

If you notice the name of the tab it shows the file name as the cryptic: “mssql://WIN-UJM5DI37BMD/test/?/=261575970”.

The file name may well look cryptic but it is quite simple, it is the name of the instance then the database, in this case “test”, followed by the object id of the item you want the breakpoint set it.

If we look at the breakpoints window, we can see the same file name and also do some cool stuff like set conditional breakpoints or jump to the breakpoint. This takes you to the file where you can set more breakpoints of course. To get to the breakpoint menu, go to Debug –> Windows –> Breakpoints. You need to have at least one active script window open, I am not sure why, you just do.

breakpoint window

 

Setting your own breakpoints

To set your own breakpoints either get SSMS to show you the debuggable version of the code or add the breakpoint manually, I normally get SSMS to set one and then copy it and modify it to get it set to the object and location I like. To set one manually, in the breakpoint window, choose New —> Break at Function and enter the filename and line and column (I know, more pain!). If you try to set a breakpoint at an invalid location you will get a failure message.

 

Use the strange naming to set a breakpoint

Use the strange naming to set a breakpoint

I hope this makes debugging in SSMS easier, if anyone has any other tips please share!