Over the many years as a Database Administrator, you learned how to squeeze the best out of the hardware you had because it took a long time and a big budget to get new faster servers.
Virtual machines came along and obfuscated that hurdle, I say obfuscate because you still had hardware behind it that the Virtualization Platform had to worry about, and they had to do a thing to squeeze the most out of it.
With the Cloud, this has made it a simple issue to be solved with a click of a button. This idea of just throw more CPU at the problem becomes the easy answer, but not always the right one. Unless you are using Platform-As-A Service platform like Azure SQL Database or Azure SQL Managed Instance then you are running SQL Server on a virtual machine in the cloud and if it was installed and configured by someone that has not lived through the same/similar experiences as I have then there might be something missed.
I will be stepping through a SQL Server 2019 template on Windows Server 2019. Note: I am using a Developer Edition which is a free license edition. In the drop-down, there are other license editions which will be reflected in the price, but you can see there is a BYOL (Bring Your Own License) image template as well.
Anyone familiar with SQL Server knows that typically your highest IO database is TempDB. You may say “I don’t use temporary tables”, but that is irrelevant because there are many implicit uses and internal uses of tempdb. I will not get into all of them here but just know that TempDB optimization it important.
In Azure, all Virtual Machines have a drive included called “Temporary Storage” and, there is a test doc in the root in capital letters DATALOSS_WARNING_README. So why am I interested in this storage? As you can see in the image below, that disk is on Solid State disk local to the host the machine is running on. If something happens to that host, the VM fails over to a different host but, everything on that disk is lost. Why should you have it then? Low latency. If you are writing to the machine something that doesn’t need protection, but you want it as fast as possible, then this is a place for that workload.
Low latency for TempDB is ideal because it will write as fast as possible, and since the tempdb is recreated every time SQL Server service starts this means it will be recreated if it switches host. Just make sure that the tempdb files are in a folder and there is a start-up script to make sure those folders exist and the SQL Server Service account has access to it.
Now that we have tempdb on a fast low latency disk, we can look at optimizing parallel operations with multiple CPUs. It is very unlikely, especially in production, that you will only have one CPU unless you are testing. There are many different views on how many files and how to configure the files, but based on past presentations by Microsoft own Bob Ward (legendary) you should have one tempdb file per CPU presented to the OS up until 8, after that, you should do additional analysis to see if more files would be warranted.
Looking at the Azure template this holds true:
I should say for this purpose, I chose a 2 CPU VM, so the template automatically created the two files.
If you installed it with SQL Server 2016 or higher, likely it automatically configured this. However, this should be changed if you scaled up after the fact. For example, if I change the SKU to a 4 CPU machine, then I should go and add two more files of the same size as the other data files.
Typically, we would add trace flags 1117 and 1118 to the start parameters:
- T1117 – When growing a data file grows all files at the same time so, they remain the same size, reducing allocation contention points.
- T1118 – When doing allocations for user tables, always allocate full extents. Reducing contention of mixed extent allocations
Since SQL Server 2016, this was unnecessary for TempDB as that database will always assume this behavior. For more information go to https://bobsql.com/sql-2016-it-just-runs-faster-t1117-and-t1118-changes-for-tempdb-and-user-databases/
If we look at the templated VM, we can see these trace flags are in the startup parameters. This is likely one of two reasons: either they just left it in there from previous images, or it is to cover user databases with multiple files in the same filegroup. It doesn’t hurt to have it in there either way.
We already talked about the Temporary Storage but, what about the other disk? This deals with your user databases, not your system databases.
I will say this about the master, msdb, and model: The template shows it on the OS drive with the installation. Typically, these are low IO databases, but if IO does increase then move them with user databases.
There should be at least one for data and one for log and they should be SSD not HDD. For important workloads put on Premium SSD.
There are higher levels out there but obviously more expensive, what you should be doing is watching your response time on your databases.
This is likely where people will see high CPU because if the system cannot write to disk quickly enough, it will spool, and the CPU rises as it keeps the data queued to be written to disk.
In the cloud, storage is cheap compared to more CPU.
Data and log files should be separated. Some people will dispute this because with SSDs we don’t need to concern ourselves with random IO versus sequential IO, which was one of the reasons to separate them.
The thing is that IO is occurring all the time and you could have IO occurring to data files and logfiles at the same time which if you put on the same disk could cause contention.
This also circles back to the disk types, perhaps you want to keep costs down and you notice only one of your databases does enough IO to warrant faster disk-based on the response time of the files, then you can put that file on its own lower latency disk.
Here is one that most people miss. When you set up those separate disk, you should set the Allocation Unit Size to 64KB.
As you can see in the templated VM, that is exactly what they did. Why is this important?
SQL Server does many IO operations in extents, which are 8 pages in the database. Each page in SQL Server is 8KB in size, so an extent is 64KB. SQL Server will be doing things like read-ahead reads in 64KB chunks. By setting the allocation unit size to 64KB, you can reduce the number of read operations in the file system.
Unfortunately, the only way to fix this is by formatting the disk!
There are a few OS configurations that I typically look for and, we hope to see it here.
This is a massively overlooked configuration and one that is easy to fix. By default, the Windows OS sets this to Balanced which, will automatically dial down the speed of the CPU to save power. The problem is that a CPU constantly changing its frequency can cause timing issues with SQL Server. This should be set to High Performance so that the CPU doesn’t change its speed. I was pleased to see this already changed in the templated VM.
Local Policy Settings
There are four settings that the SQL Server Service account should have access to:
- Adjust memory quotas for a process
- Lock pages in memory
- Log on as a service
- Perform volume maintenance task
Now in this case the template uses NT Service\MSSQLServer account to run under, but if you use a domain service account then you will need all of these.
For performance you will need:
Lock Pages in Memory – This is so SQL Server memory space is not pushed out by another process.
Perform Volume Maintenance task – This is so when you create data files, or they grow/autogrow it will just allocate the space without zeroing out the file. Log files must zero out the file. This is called instant file initialization.
SQL SERVER SETTINGS
Set a maximum memory leaving enough space for the OS to function depends on your VM size but, this is what I would do:
I would also try to set the minimum and maximum the same. Now the template doesn’t do this for us so, we should set it in the SQL Server Properties.
There are several ways to handle backups. I would recommend integrating with Azure backup. To do so, you just have to add the service account to the SQL instance after the fact, but whatever you do, make sure your backups are on a separate disk either on the VM, a different storage account, or using something like Azure backup. Logical disk corruption can still occur.
here are third-party apps out there that can interfere with performance and I would say that you should not mix this with your environment, but I will talk about one that likely will be on there.
If you have a virus scanner installed, be sure to add exclusions for the following file extensions
- *.mdf,*.ndf – typical file extensions for datafiles
- *.ldf – typical file extension for log files
- *.bak, *.trn – typical backup file extensions if you backup locally.
- SQL Server folder – perhaps a bit overkill but worthwhile
- Failover clustering directories if you enabled failover clustering.
If you are building a new SQL Server in Azure, I strongly suggest you start with the templated solutions or look at using PaaS services so that the underlying platform is optimally configured for SQL Server workloads and you are not just throwing more money at more CPUs.
If you have existing servers out there, go and check these settings and make plans to change them, especially the Power Setting, go do that right now!