Sql Server Batch Processor – so critical but ignored and sidelined for so long!

The Sql Server Batch Processor

Little is publicly known about the batch parser in Sql Server, in contrast, cool things like page structures and GAM’s have been pretty widely documented from the inside sql series of books by Kalen Delaney to the blogs of the Sql Server insider Paul Randal. The most that the batch parser normally gets is a line to say that it parses the query to confirm it is valid, this article from 1999 (http://sqlmag.com/t-sql/inside-sql-server-parse-compile-and-optimize) is typical in its thriftiness for the batch processor:

I won’t go into detail about the query-parsing process. In short, the parser checks for correct syntax, including the correct spelling and use of keywords”

In this post I am going to explore how the parser actually works and what components are involved in converting the query that is sent to Sql Server into something that can be acted upon.

Overview of the process

Essentially a string is passed to sql server, the string could say anything such as “exec store_procedure” or “create stored procedure hello_there” the command could be valid or invalid “select top banana from shop”, it might be very short or very long, it might contain one statement or lots. Sometimes everything seems good but then a table doesn’t actually exits so it isn’t very good. The command could be a DDL command, it could be a DML command or it could all be mixed together. There are some commands which are valid at one point but invalid at another, Hekaton has some additional values such as “NATIVE_COMPILATION” which is valid in the case of a Hekaton stored procedure but not valid normally.

The parsing process is about taking these input strings, whatever they might contain and converting them into something that Sql can actually use if it is valid or rejecting the statement if it is invalid. The general idea for things like this is to parse as quick as possible, get everything you need and if it is invalid throw it away as quick as possible – Sql has a hard enough time to do everything it is meant to do without wasting cycles on queries it cannot service.

The whole point of of the parsing process is to take a string and turn it into a normalized tree. A normalized tree is a set of operations that tell Sql Server exactly what to do, rather than:

SELECT blah from TableBlah Where Column = 123”

Sql needs exact instructions such as:

constant @value of type int is 123

Read From Table “TableBlah”

Filter to include rows where Column = @value

Return column Blah

If we look at the specific instructions, there are two things that have happened here, firstly the text has been converted into steps and secondly, the parts that can change such as the value have been removed, i.e. the query has been normalized.

What are the inputs to the parsing?

  • String to be parsed
    • This is passed in from the client as should be fairly obvious
  • Parsing options
    • Are quoted identifiers enabled or not, whether these are on or not really affect how the parser works
  • Compatibility Mode
    • Different keywords and commands are allowed depening on the compatibility mode
  • Server properties
    • Some commands, such as Hekaton, are only valid on certain instance types

The catalogue of object names and security permissions, although used in the process are not part of the query parsing and normalization, they are used when the actual actual work is done.

What is the output from the parsing?

The output is a tree of operations such as a create table statement or a select statement, each of these statements are valid, however in some cases, such as a create database statement without a database name, are invalid and the parser wouldn’t be able to add it to the tree.

The size and complexity of the tree is entirely down to the size and complexity of the query that is parsed. Benjamin Nevarez has some interesting trace flags in his post

(http://www.benjaminnevarez.com/2012/04/more-undocumented-query-optimizer-trace-flags/) which show the logical and physical trees used as part of the compilation process.

A look in detail at the parsing

The parser uses CParser from sqllang.dll which seems to have two roles, firstly it retrieves the actual characters from the string and then stores the tree, it adds operations to the tree by adding new step into the tree.

The batch parser uses a model based on yacc and lex, the internal functions are yyparse and yylex, yyparse is particularly long and probably generated using yacc or what was once yacc but is now, perhaps, heavily customized.

What are yacc and lex?

Aside from sounding like a weird cow like animal that is particularly hard to shear and a character from superman, yacc and lex are basically the most widely used tools by compilers to understand the text that they are given. It is often easy to forget that Sql Server is, amongst other things, a compiler, well it is! Sql has to do something with the text that it receives and yacc and lex are the answer.

Lex breaks the text into tokens, it has a set of tokens it is trained to find, so for example it will know that “select”, “insert”, “with”, “for” and “1980” are all tokens and they have a specific type.

Yacc understands when and where certain tokens can be used, for example “NATIVE_COMPILATION” is a token that lex can parse and returns to yacc that it it a valid token but it is up to yacc to decide whether it is valid after a “SELECT” or only valid after a stored procedure definition on an instance of Sql Server where Hekaton is enabled (i.e. 2014 enterprise edition).

Typically with yacc, what happens is you have a grammar definition that specifies the tokens and where they are valid and then you use yacc to compile that garmmer into actual source code which is essentially a giant state machine which parses queries into a tree. To give some idea about size for this, I believe that the main switch statement in Sql 2014 has 3534 entries in it, this is far to large to be managed sensibly by developers so I really hope that the query parsing team don’t have to manage it manually!

The process for decoding a string is that yyparse which is the state machine built by taking the T-Sql grammar file and compiling it by yacc keeps calling yylex which returns the type of token that it has found until it reaches the end of the string at which point it returns -1.

While yyparse retrieves valid tokens it takes action according to the type of token so for example, the token returned by yylex is a SELECT token then yyparse adds a step into the tree held by the CParser to do a select and adds in all the values needed such as whether there is a table and what the table name is.

What makes the whole process so quick is that firstly that it is basically one big while loop wrapping a case statement which uses only a minimum amount of instructions to parse the text and it makes only a single parse through the query, it isn’t pretty and probably isn’t very easy to write unit tests for but it is fast and does the job. It certainly has to as it is used every time any text is sent to Sql Server to execute, whether valid or invalid.

How was this info gathered

Although I would love to have access to the Sql Server source code, I don’t, this was gathered using a debugger (windbg / cdb), the public symbols and a real thirst for understanding how Sql Server parsed the strange queries that is sent so accurately, quickly and consistently. The books on Sql Server internals seem to leave off the parser which is a shame as it is so core to everything that Sql does, I guess because it is so reliable there is no need to bother documenting what it does or how it works.

T-Sql Code Header Comments

In this post I want to talk about the headers that are included in most stored procedures and because I often see requirements for writing T-Sql code that include something along the lines of “all procedures MUST be well commented and adorned with a template that lists each change”, sample procedures include templates such as:

Sql procedure recommended template 1

Sql procedure recommended template 1


Sql procedure recommended template 2

Sql procedure recommended template 2

These are all taken from stack overflow: http://stackoverflow.com/questions/1085672/sql-comment-header-examples

These templates all impart a sense of professionalism and care has been taken to write correct comments and we all know that dutifully ensuring comments are correct help show how well the database code has been written, except that is rarely the case. The quality of comments bears no relationship to the quality of code, in fact code is compiled by a compiler so incorrect code can be detected, incorrect comments are ignored by the compiler just as much as correct comments.

There is one section in particular that is a problem, the change history – if you use source control you have a history of changes which can be validated and checked, however if instead you rely on the comments then you are relying completely on  developers updating the comments correctly which is unlikely, either due to time constraints (i’ll fix this now and update comments later!) or potentially maliciously. If you then consider that the version history could possibly be incorrect then it can’t be trusted and if it can’t be trusted then it is irrelevant and is a waste of time.

Most templates include the name of the stored procedure, in case you don’t know how to find the name of each stored procedure is at the top of script, either after the word proc or procedure, it is sometimes surrounded by [ and ] or “‘s, if the name is in the comments then it might be after something like “Name:”, “Nombre: “, “Monkier:”, “Namesake: “, “Stage Name:”  or something more exotic:





So if we use source control (which we should) then if we take the above we have a verifiable history of who the author is (they checked in the code), when the create date is (the date it was checked in), the history which includes both comments and actual diff’s can also be taken from source control.

The above code could have been written as:

re-written with no header

re-written with no header


Not everything is useless, there is a real case for showing example usages, especially when there are multiple parameters passed in and it is good to see what possible parameters can be passed in but I think that is about it. You may also wish to include a copyright banner in amongst the header comments which can be included to keep lawyers happy and can be happily ignored by everyone else.

I have come up with a list of common things I see in headers and how they should be handled:

Template Replace With Why?
 Name  Script Header  It is already in an obvious place, if the editor has colouring (everything other than notepad) then it is nicely highlighted
 Author  Source control history  It is verified and easily retrievable
 Create Date  Source control history  It is verified and easily retrievable
 Change History  Source control history  It is verified and easily retrievable
 Description  The stored procedure name  If the name of the stored procedure does not accurately describe the action that it does then rename it!

In another post I would like to talk about comments within stored procedures but I think that is enough for now!

Unit testing constraints in Sql Server

I generally break my unit tests for T-Sql into two broad categories, the first is the schema, things like foreign keys and defaults and the second is the actual code, stored procedures and functions (triggers as well but I do my hardest not to write triggers on account of them being so awful.)

When I look at what tests to write for constraints, I ask myself the question, as I do any time I write anything “what is the point of this, do I need to write it?” and I normally come back with the answer (it is a lot of fun in my head, it is like a regular party with everyone super excited about sql server) “if there is no point in writing the tests, is there any point in the original code”, I then think whether the constraint is necessary and adds anything, if it does then it stays and I write tests for it.

I write tests for the schema as a schema is code, it might be dml instead of ddl but you better believe it is possible to get something wrong and for someone to delete something by mistake and most importantly that I or someone else will want to refactor it later and knowing what will break when a change is made is extremely important.

Sample Schema:

sample schema to demonstrate testing constraints with tSQLt

sample schema to demonstrate testing constraints with tSQLt

To write the actual tests I use tSQLt as my preferred framework and I fake the child table and the reference table and use tSQLt.ApplyConstraint to test the actual constraint. If we take this simple schema, two tables and a foreign key, if we then look at how we test the constraint, what do we care about:

– I care whether the key is defined correctly so will accept values in the child table that are in the parent table
– I care whether the key stops incorrect values going into the child table

I don’t really care if the foreign key allows all the possible values, that is Sql Servers job, I just want to ensure it does what it is I have told it to do.

To write tests, I create a new test class:

exec tSQLt.NewTestClass ‘constraint on users.users gender_id’

I could have one class per table or one per database but there isn’t much of an overhead having lots of classes and it helps to keep them separate and organised so I create one class per constraint.

Once I have the test class, I create the setup routine that will be used by my test “methods” (procedures). The Setup procedure is run before each test and it means you don’t have to keep doing the same thing so it is less rrepetitive and less error prone, remember kids no one is testing the tests!

create procedure [constraint on users.users].[SetUp]
exec tSQLt.FakeTable ‘gender’, ‘users’
exec tSQLt.FakeTable ‘users’, ‘users’

insert into users.gender(gender_id, gender_description)
select 1, ‘Male’

exec tSQLt.ApplyConstraint ‘users’, ‘FK_users_gender_id_genders’, ‘users’

In this case, I fake out the two tables involved in the relationship and in the parent table I add a single row. I could add every possible row and then check they work but I believe that is overkill and while it might be straightforward for gender, it is a little more involved for products that have ever been sold at a major supermarket or retailer. With the data added I then ensure that the constraint is applied and ready to allow good data to go in and block bad data and keep it out.

I then have my “happy path” test which tests that I can add data to the child table:

create procedure [constraint on users.users].[test allows male]
insert into users.users(user_id, gender_id)
select 100, 1

I typically always expect this to work, but it I have setup the foreign key to point to another column by accident then the insert will fail as in my test setup I only inserted a valid value into the gender_id column in the genders table.

Once I have the good path that should work if I have set everything up correctly, I then look to the unhappy path, that is what happens when something goes wrong – if I try to insert data that is incorrect, does it fail? When it fails, does it fail in the way I expect?

create procedure [constraint on users.users].[test disallows unknown values]
exec tSQLt.ExpectException @ExpectedMessagePattern = ‘%INSERT%conflicted%FOREIGN KEY%’
insert into users.users(user_id, gender_id)
select 100, 94898

So here all I am doing is saying that I expect to get an exception somewhere in this procedure and when I do the error is going to match the pattern “%INSERT%conflicted%FOREIGN KEY%” if it doesn’t then the test will fail. I then do the insert with a value I know not to exist (remember I am faking both tables in the Setup so know exactly what values are in each table.

Finally I need to run the tests using either “exec tSQLt.RunAll” or “exec tSQLt.RunTest ‘constraint on users.users’

The tests will run and either pass or fail, to do TDD properly you should create and run the tests, the first one should always complete so it is hard to get the red before the green but the second test should fail before the foreign key is put in, when it is added then the test will succeed as there will be an error message.

I personally think that this finds the right balance between writing tests for the sake of it (1 test for every possible value in the reference table!) and writing enough to know that the constraint is in place and it is doing what it should do – irrespective of whether the values in the table are actually correct.

Like most things in life and in IT the amount and type of tests depends on the type of application and how the application is being used, a test database for me to play around with sql doesn’t really need unit testing but a database that is critical in a medical EMR application should be well tested.

The code that goes along with this is:



Modern T-SQL Development (#MTSQLD)

I was watching this TechEd (I have always loved that name!) video from 2011 about the new way of doing database development, in a modern way – Sql Server Data Tools or SSDT:


It got me thinking how far we have come in the last 3 years and I really feel strongly that as T-Sql developers we have tools and practices we can follow to make our code maintainable, reliable and enjoyable!

We can build systems which are well tested which brings with it a well defined host of benefits such as:

  • Ability to refactor T-Sql code & data schema
  • Confidence that T-Sql code is valid and works as expected

We can concentrate on designing schemas and writing code rather than writing scripts to upgrade schema’s when we deploy using:

  • DacPacs + SqlPackage.exe
  • Redgate Sql compare
  • ApexSql Diff

We can ensure that our code is safe and we can delete unused code because we have source control, my preferred is Git but anything will do, heck even the much derided visual source safe would be an improvement over most T-Sql code projects!

We can refactor code and schemas using SSDT to rename and move objects, we can make changes and build a project and get instant feedback about whether we have broken anything.

We can improve the speed of our development using intellisense and if we are lucky enough to have resharper we can just straight to objects we know the name of (ctrl+t with resharper shortcuts).

We can find out what references objects effectively and find unused objects etc.

We can build a continuous integration pipeline, we can check code in and have a build system like TeamCity checkout the code, build the project, deploy the project and tests and then run the tests – there are a few separate parts to set up here but it is easily doable.

To some extent we can debug T-Sql, this is an area that definitely needs improving but at least we can step through and see the values of variables – until we can visualise what is in tables I think T-Sql debugging will always be lacking.

I am really passionate about modern T-Sql development, please take a look at how you develop T-Sql code and think about whether you are using modern techniques to develop code and whether you life would be any better if you started employing some modern T-Sql development techniques.


In looking around for other resources for modern T-Sql development I was unable to find that much so thought it would be great to start a linked-in group to start collating and sharing best practices, if you are interested please head over to the new “Modern T-Sql Development group on linkedin:



Use extended events to show your actual io patterns

I have spent a little too much of my life worried about the performance of IO subsystems and sql server, I worry slightly less now ssd’s are more available but I still worry!

I/O patterns are pretty well understood, tools like CrystalDiskMark let you test a subsystem using random and sequential reads and writes of various sizes. Run one of these tools and then put your feet up, you know what performance you will get, job done.

If you don’t want to put your feet up you can investigate what your actual IO patterns are,to do this I like to use extended events and for this there are two events I am interested in:

  • sqlserver.file_read_completed
  • sqlserver.file_write_completed

When you use these you get some really interesting values, notably:

  • mode – this is really cool as it tells you whether the individual IO was completed using the Scatter/Gather API or whether it was Contiguous, I believe that Scatter/Gather is used by the buffeer pool to read pages and Contiguous is used to write log files – but I have no actual evidence of that!
  • duration – the time the IO took – this is really useful in its own right
  • database_id – the database
  • file_id – the data file id – we can use this to group requests to individual files
  • offset – the location of the IO
  • size – the size of the IO

By simply gathering the data and using the database_id, file_id, offset and size we can easily see for each database and file, bear in mind if you are profiling an application you may want to have filtered it to just the database you are interested in, the three important things:

  • Size
  • Type – Read or write
  • Random or Sequential

Random or Sequential we need to work out for ourselves but it is just a simple calculation of offset + size to each operation.

How do we get the data?

If you setup an extended events trace, to do this either use the wizard and add file_read_completed and file_write_completed or use this script:

ADD EVENT sqlserver.file_read_completed(
ADD EVENT sqlserver.file_write_completed(
ADD TARGET package0.event_file(SET filename=N’C:\SqlTrace\IOPatterns.xel’)

I like to include sql_text, I realise it isn’t for everyone but it makes me feel really really good to know exactly what caused a specific IO!

If you use the wizard, find these two events:

from the new extended events session choose read and write completed


However you choose to create the session, start it and generate some IO, if you are testing you may need to use “DBCC DROPFREEBUFFERS” to cause some IO, SQL won’t read from disk unless it really needs to!

When you have generated your IO then you will need a query to get the data, something along the lines of this should work nicely (if slowly!):

declare @file nvarchar(255)
select @file = N’c:\SqlTrace\IOPatterns*.xel’

event_xml.value(‘(./@timestamp)[1]’, ‘datetime2’) as timestamp,
event_xml.value(‘(./@name)’, ‘varchar(1000)’) as event_name,
event_xml.value(‘(./data[@name=”database_id”]/value)[1]’, ‘int’) database_id,
event_xml.value(‘(./data[@name=”file_id”]/value)[1]’, ‘int’) as file_id,
event_xml.value(‘(./data[@name=”offset”]/value)[1]’, ‘int’) as offset,
event_xml.value(‘(./data[@name=”size”]/value)[1]’, ‘int’) as size,
event_xml.value(‘(./data[@name=”mode”]/value)[1]’, ‘int’) as mode,
event_xml.value(‘(./data[@name=”duration”]/value)[1]’, ‘int’) as duration,
event_xml.value(‘(./action[@name=”sql_text”]/value)[1]’, ‘varchar(4000)’) as sql_text
from (select cast(event_data as xml) xml_event_data
from sys.fn_xe_file_target_read_file(@file, null, NULL, NULL)) as event_table
cross apply xml_event_data.nodes(‘//event’) n (event_xml)

This generates output like:

output from extended events file_read_completed and file_write_completed

output from extended events file_read_completed and file_write_completed


The time this works best for me is if I am spec’ing out a new system for an existing application, normally when the old hardware is end of life and is being replaced and I can get a trace of an actual production system and see what it’s IO characteristics are.


The alternatives are to use procmon, this is how I used to do it but now we can tie IO requests to actual queries so you can forget about procmon in my mind!

The other thing that tells you about IO’s is the dynamic management function sys.dm_io_virtual_file_stats and it tells you alot but it doesn’t help me worry less about IO patterns!

SSDT & tSQLt – a serious double act!

Using SSDT to develop T-SQL code is really nice, although in my opinion it isn’t as nice as developing C# yet, it is a great improvement on using Sql Management studio alone. In this day and age all production code should be unit and integration tested and tSQLt is a great solution to help unit test T-SQL code and the two technologies together really make an excellent pair and give us all a great way to develop quickly and with the minimum amount of bugs, if your testing is done correctly.

I find that there are often parts of a database project which naturally feel like they should be done in Visual Studio SSDT such as creating tables and stored procedures and then some parts which work better in SSMS, particularly running tSQLt unit tests with the Redgate Sql Test tool. This doesn’t need to be the case and it is possible to both write and maintain a database schema completely from within Visual Studio which, I think, is one of the best development environments available.

In this post I am going to talk about how I structure my visual studio solutions and how the different parts interact to create a modern development workflow.

The major components in the projects I create are:

  • Database Schema project
  • Database tSQLt unit tests project, either one for all tests and the tSQLt framework or multiple depending on the size of the project
  • NUnit test runner to run the individual tests.

The NUnit test runner is not strictly necessary but I find it means that I can write and run tests from within visual studio (I would highly recommend resharper to help here) and the tests can be run the same way on the Continuous Integration server. There is a tSQLt plugin for team city but there is a NUnit test runner for pretty much every single build system out there.

Database Schema project

This is simple, create a sql server project and use it to write you schema and T-SQL code!

Database tSQLt unit tests project

We then need somewhere we can put our tests, I like to keep them separate from the production code as it means you can validate the production code and run things like the T-SQL Smells tool (http://dataidol.com/davebally/2014/01/28/tsql-smells-in-SSDT-the-monster-lives/) against them without worrying about test code. To be clear I am not saying the test code should be poorly written, just that we can ensure stricter controls over production code and ensure a separation boundary.

To get the tSQLt objects into an SSDT project I can deploy, I initially deployed them to an empty database and the used the schema compare in SSDT to bring the objects into an SSDT project:

tsqlt in ssdt project

NUnit test runner

You need to have some way to run your tests, some people include a post-deploy script which calls the unit tests but I prefer to spend the extra time writing wrapper methods using NUnit so that I can run the tests either as a whole, as a class or individually. I guess that I really prefer using NUnit because I have used it a lot in my c# development and the resharper add-in for running unit tests is really nice.

To help use NUnit I typically write a test helper in C# which creates a Sql connection and calls tSQLt.Run with the name of the test to run and ensure I read in all “InfoMessages”. When you run a tSQLt test you get some messages printed to the output window, including how many tests were run successfully which you need to check, rather than relying on there being 0 errors, there must also be 1 success. You may also get an exception which will be caught by NUnit and used to report that particular test as a failure.

The helper class looks like:

TestGateway.RunTest Helper

Those are the major components and now to add a test, I just need to create a new schema that will be used to store the tests. In order that tSQLt knows to treat a schema as a test class you need to add the extended property:

EXECUTE sp_addextendedproperty @name = N’tSQLt.TestClass’, @value = 1, @level0type = N’SCHEMA’, @level0name = N’HumanResourcesTests’;

You will need to change the level0name to the name of your test schema.

If the unit test and main schema are in different databases then I will also need to add a database reference and use the fully qualified name to get to the other database, this is really trivial and means you get intellisense in SSDT and can run the procs in the same database when deployed.

I am then free to add a test, the new test should be in the schema of the test class, in this case HumanResourcesTests. The test name should also begin with the word “test”:


tsqlt test


Finally I add the NUnit test so that the test helper can be called to run the actual test:

nunit test to run tSQLt test


Running Tests

To mock or fake a table tSQLt currently only supports tables in the same database so you need to deploy all of your code into the same database. You should ideally create a new database, insert any needed data and fake any tables you will interact with and then destroy or have some system to tidy up old databases.

To run the tests you need to deploy the main schema, then deploy the tSQLt project and your tests, when they have been deployed you can then run the NUnit tests, hopefully you are using TDD to write your T-SQL code so the test will fail, you then write the code to make it complete then re-deploy and re-run the tests and you should see little green ticks:

green tests

Including these tests into a continuous build process is quite simple, you just need to get your build system to use the dacpac files created by the SSDT projects, deploy those and then use whatever calls the NUnit tests to call these tests. Because you are using NUnit to run the tests, any failures and messages will work as if the tests were pure c# code.

I am sure there are lots of other ways of working with SSDT and tSQLt but this is how I use it and it feels really good to write, clean, unit tested T-SQL in a modern development environment (intellisense, find references etc).

I have put the sample project in this demo on github, feel free to grab it:



Get told about explicit & implicit conversions in Sql 2012+ using extended events

Sql Server  2012 introduced a new extended event warning, which tells you when you get a plan with a conversion in it that causes scans instead of seeks so that you know when a plan is used that could have been better if the data types were correct. This is a massive performance issue and will essentially make a filter or join non-sargable causing a table or index scan rather than a seek. The issues of implicit conversions has been covered many times before so I won’t go over it here (see http://sqlperformance.com/2013/04/t-sql-queries/implicit-conversion-costshttps://www.sqlskills.com/blogs/jonathan/implicit-conversions-that-cause-index-scans/ or even http://lmgtfy.com/?q=performance+issues+with+implicit+conversions)

The new extended event is called “sqlserver.plan_affecting_convert” and fires every time there is a conversion in a plan that stops sql from doing a seek. There are a number of different plan_affecting_converts so check the ConvertIssue attribute for “Seek Plan” for this particular issue.

To demo this, I have created a table with a varchar datatype and a unique index on the column so a seek can be performed, however I use an nvarchar to search the column which means that every single value in the column needs to be converted from a varchar to a nvarchar before the comparison can take place, the plan looks like this and there is a warning to say that it isn’t very good:

Query and Execution Plan with Seek Warning

Now if you ware writing some T-Sql and you see this then you can change it there and then but if the code is loose on the server, previously to this event you need to do things like search the plan cache for implicit conversions remembering, there is a good example of this here:




Now with this extended event you can sit and wait and get notified when it happens, this lets you be much more pro-active in finding performance issues on your Sql Server. To create the event, either use the wizard to create the session or do something like:


CREATE EVENT SESSION [plans_with_conversions] ON SERVER
ADD EVENT sqlserver.plan_affecting_convert(


You will then get alerts that look like:


Convert Warning Extended Event

From this you have the text that caused the issue and the plan handle so you can track down the query that is being so mean to your cpu (and probably i/o subsysten) and make it play nice!

Hapy Extended Eventing!

Querying the DacFx API – Getting Column Type Information

If you are using the DacFx api, it is easy to get a table, you just create a TSqlModel object with your model in and call GetObject or GetObjects and you can get to your TSqlObject table, then to get to the list of columns there are a couple of ways you can do it, you can call the table’s GetChildren which returns all the children on the table and in amongst that lot is one or more Columns:

foreach (var child in table.GetChildren())
Console.WriteLine(child.Name + ” : ” + child.ObjectType.Name);

This will print out a list of all the children on the table, the Name property is the column name and the ObjectType is the type of object, we want columns so we need to check that ObjectType is the same as ModelSchema.Column. As an aside, the ModelSchema class is very useful as it is a way to query for specific types or to know what type of TSqlObject it is you have. If you need to find out how something works, go to the ModelSchema to get started!

The second way is to query the API using GetReferencedRelationshipInstances which lets you specify the type of objects that you want, in this case we just want the Columns relationships and because we have a Table object, we use the Table.Columns ModelRelationshipClass which returns all the Columns that this table directly references:

foreach (var child in table.GetReferencedRelationshipInstances(Table.Columns))

When we query for relationships, we get a list of relationships (who would have guessed), a relationship looks something like this:

FromObject – this is the table we asked for the references columns from

Object – this is the TSqlObject for the column

ObjectName – the name of the column

Relationship – describes how they are related, in this case it is actually the Table.Columns class (get used to seeing things go round and round in circles with this api!)

Now we have the columns we can query each one to find the actual data type, to do this we query the Column for it’s referenced objects, this time not a related object like a column to a table but just something that is referenced by the column, so we use Column.DataType which gets us an IEnumerable list of the 1 datatype that the column is:

var types = child.Object.GetReferenced(Column.DataType)

The Name property of the type gives the column type such as varchar or tinyint and then the object has some properties you can query to get whether the column is nullable or a max datatype and the length, scale and precision:

var type = child.Object.GetReferenced(Column.DataType).FirstOrDefault();
var isNullable = type.GetProperty<bool?> (DataType.UddtNullable);
var length = type.GetProperty<int?>(DataType.UddtLength);

I guess the important thing to know here is that you should always know what type of TSqlObject you have and as long as you query the API using the GetReferenced* and GetProperty methods and you use the ModelSchema class to get the top level objects, it is easy to get to the properties you need, so in this example using ModelSchema.Table, then Table.Columns then Column.DataType and finally DataType.XX you can easily get the column type.

In my example I enumerate all the objects, it can be simplified (I am not sure I mean simplified!) using linq and something like:

var dataType = table.First()
.First(p => p.ObjectName.ToString() == “[dbo].[AWBuildVersion].[SystemInformationID]”)

(In case you recognise the table – Yes I do use a dacpac for AdventureWorks for testing!)

As always there is a demo project in github https://github.com/GoEddie/DacFxApi – this example is the ShowTableColumns example.



Dacpac Explorer

I have been playing around with the DacFx Api alot recently and am finishing  off the week with a final demo, the dacpac explorer:


Github repo is here

What this does is open a dacpac file and show an explorer like interface to let you see what is inside the dacpac without having to manually extract it and then go searching for what you want, I will write a further blog on how it uses the DacFx api but I have tried where possible to use the model to make displaying objects as easy as possible so for example to show all the properties of an object I use:

foreach (var property in item.ObjectType.Properties.OrderBy(p => p.Name))
var val = property.GetValue<object>(item);
if (val == null)
val = “NULL”;

Console.WriteLine(“Property Name: {0} Value: {1}”, property.Name, val.ToString());


item in this case is of course our good olf friend TSqlObject.

One extra thing that the Dacpac Explorer does is to give you the option to validate the model when it opens it, this will then give you a list of any warnings or errors (of course if there are no errors then it let’s you know that!), to use the demo app you just choose the dacpac file and choose open, if you want to validate the model then tick the checkbox:


When you then open the dacpac you get a nice (bear in mind I am a developer and data guy and not a designer! ) explorer like interface for viewing the dacpac contents, I have used the adventure works schema as an example:




This is just a small example of what we can do with the DacFx API, I will write some blogs about how it all works as there are some things that weren’t immediately obvious to me how to do things like getting a columns data type and the columns that belong to an index, but once you see a simple example it is pretty straightforward.


Have you used the DacFx Api to write a tool around Sql Server? Share it! Have a request? Share it!