Paul S. Randal

Summer Sale: $699 Blackbelt Bundle!

Paul Randal — Tue, 23 Jun 2026 18:37:41 +0000

Summer Sale time!

Hundreds of dollars lower-than-ever prices on our signature 158-hour Blackbelt training bundle:

US$699 for one-year access (or to upgrade) – $200 lower than ever before!
– US$1,299 for Lifetime access (no expiration) – $300 lower than ever before!

Both prices include the 2022 updates and Q&A sessions.

If you have a one-year subscription and would like to upgrade it to Lifetime, purchase a second one-year and I’ll remove the expiration dates and make sure you get the 2022 Q&As/Updates recordings with Lifetime access for free. It doesn’t matter if your original subscription has expired!

Lifetime subscribers will also get the upcoming re-record of IECAG (clustering and AGs) for SQL Server 2025 for free, and Kimberly’s upcoming IESP (stored proc performance) for free.

See our shop for details and let me know of any questions.

Enjoy!

The post Summer Sale: $699 Blackbelt Bundle! appeared first on Paul S. Randal.

The SQL Server Transaction Log, Part 4: Log Records

Paul Randal — Tue, 16 Jun 2026 19:44:30 +0000

(This post first appeared on SQLperformance.com four years ago as part of a blog series, before that website was mothballed later in 2022 and the series was curtailed. Reposted here with permission, with a few tweaks.)

Previously I’ve talked about why logging is required and the architecture and circular nature of the log, so now it’s time to look at the real heart of the logging system—the log records themselves. As always, I recommend you read the previous posts in the series before reading this one.

What Are Log Records?

The simplest definition of a log record: it describes a single change to a database. A single operation in the database may cause multiple changes, but each change will usually have its own log record to describe it. An example of this is updating a column in a single row—it will do the following:

Perform the update, which generates one log record.
If it’s the first time the extent of the page (containing the updated row) is part of what has been changed since the most recent full backup, a bit must be set in the relevant differential bitmap, which generates another log record.

Each log record has a fixed overhead, so sometimes, the Storage Engine breaks this rule by combining multiple changes into a single log record to reduce the number of log records written to the log. Some examples of this are:

When an update happens to multiple columns in a single table row, a single log record can contain all the column changes for one row instead of one log record per column change. Note if any nonclustered indexes need to be updated as well, those changes will be in separate log records.
When an index is being built or rebuilt, the Storage Engine will fill an entire page with rows and then use a single log record to log the whole page image instead of one log record per row inserted into the new page.

Each log record has a Log Sequence Number (LSN), which I defined back in part two as :: (4 bytes: 4 bytes: 2 bytes). Each LSN is unique, is ever increasing, and identifies the physical location of a log record within the log. Log records never move in the log, so their LSNs never change once created. The page header for each data file page in the database contains the LSN of the most recent log record describing a change to the page and is used during crash recovery, which I’ll cover in a future post.

Log Record Contents

You can think of a log record as similar to a table record—it has a defined structure comprised of a number of fields. All log records share a common header with metadata about the log record and may then have varying additional fields depending on the log record type. Some of the header fields include:

The log record LSN
The log record type
The change context, if any (e.g., heap or clustered index)
The transaction ID the change is part of, if any. Some log records are non-transactional and can’t be rolled back—for instance, changes to PFS pages and differential bitmaps
The length of the log record in bytes
The LSN of the previous log record generated by the transaction, if any
The amount of free space reserved in the log in case the log record needs to be rolled back

The log manager must always ensure there’s enough free space in the log to generate any log records required to roll back a transaction without having to grow the size of the log, making log space reservation critical. If this were not the case, and a transaction rollback required the log to grow, which then failed (e.g., out of space on the log drive), the database must be set offline in the SUSPECT state. You can read a bit more about rollbacks in my article SQL101: Introduction to SQL Server Transactions, and I’ll cover them in depth in the next article.

Log records describing changes to table/index pages also include:

The allocation unit ID that owns the page
The page ID and the slot ID of the row being changed
The key values of the row being changed, if any
A list of what locks were held to protect the change
The offset of the change in the row
The before and after images of the change, or an array of before and after images if multiple parts of the row are changed

The before and after images are the set of bytes changed and what they were changed to, respectively, and these allow the change to be rolled back or replayed. Some log records only include the after image since they can’t be rolled back, making the before image unnecessary—for example, a log record generated as part of rolling back another log record or a non-transactional log record.

Many more fields may be present depending on the type of log record, and we’ll encounter some of them as we progress through the rest of this series.

Log Record Types

There are between 70 and 100 log record types, depending on the version of SQL Server (there’s no documented list). All log record names start with LOP, which stands for Log Operation, and some common types include:

LOP_FORMAT_PAGE – creating a new page image
LOP_MODIFY_ROW – modifying a row from 1 to 16 contiguous bytes, potentially involving multiple adjacent columns
LOP_MODIFY_COLUMNS – modifying multiple disjoint columns in a row
LOP_SET_BITS – setting some bits in an allocation bitmap to 1 or 0
LOP_INSERT_ROWS – inserting one or more rows on a page
LOP_DELETE_ROWS – deleting one or more rows on a page
LOP_EXPUNGE_ROWS – used by the ghost cleanup task when removing deleted rows from a page
LOP_BEGIN_XACT – always the first log record generated by a transaction
LOP_COMMIT_XACT – the log record signifying a transaction has successfully committed
LOP_ABORT_XACT – the log record signifying a transaction has finished being rolled back

Examining Log Records

The way to examine log records is via the undocumented fn_dblog function. It requires two LSN parameters specifying the starting and ending points in the log, but I seldom use those and instead specify NULL for both, which means scan all available log records in active VLFs. The great thing about this function is it lets you limit which fields it outputs (otherwise, you get all 130), and you can specify predicates.

Note:

If you want to examine all possible log records, including those in VLFs marked inactive by log truncation, you can enable trace flag 2537.
If you’re playing around and want to see what log records an operation does, use a test database in simple recovery mode, and issue a checkpoint to truncate the log. This means you can run your operation, and then fn_dblog will only show you the log records since the checkpoint.

Let’s look at a simple example. I’ve created a database, a table with a single integer column, inserted one row, and performed a backup and a checkpoint. Now I’ll do the following:

UPDATE TestTable SET Column1 = 2;
GO

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Log Record Length]
FROM fn_dblog (NULL, NULL)
WHERE [Operation] NOT LIKE '%CKPT';
GO

Current LSN             Operation        Context       Transaction ID  Log Record Length
----------------------  ---------------  ------------  --------------  -----------------
00000030:000000af:0001  LOP_BEGIN_XACT   LCX_NULL      0000:000004be   124
00000030:000000af:0002  LOP_SET_BITS     LCX_DIFF_MAP  0000:00000000   56
00000030:000000af:0003  LOP_MODIFY_ROW   LCX_HEAP      0000:000004be   104
00000030:000000af:0004  LOP_COMMIT_XACT  LCX_NULL      0000:000004be   84

The LSN is in hex, and you can see all four log records for the transaction are in the same log block (the 000000AF). Here’s what we’re seeing:

The transaction begins with the LOP_BEGIN_XACT with transaction ID 000004be and no context.
As the update will change an extent after a backup, the differential bitmap bit for it must be set, using the LOP_SET_BITS with the context of LCX_DIFF_MAP. Notice the log record doesn’t have a transaction ID as diff map changes are non-transactional (i.e. they are never rolled back)
The actual modification is logged with the LOP_MODIFY_ROW with context LCX_HEAP (because the table is a heap) and is part of the transaction.
The transaction ends with the LOP_COMMIT_XACT.

It can be fascinating to spelunk around in the log to see what SQL Server is doing for some operations. You might think this doesn’t have any use in a production environment, but here’s a post showing the function being utilized (and the fn_dump_dblog function for examining log records in backups) to find out who performed a DROP TABLE and the restore point to get it back.

Until Next Time…

Since I’ve now laid the groundwork for understanding log records, in the next post of this series, I’ll show some deeper examples, including how transaction rollback works. And feel free to post a comment with any questions you might have.

The post The SQL Server Transaction Log, Part 4: Log Records appeared first on Paul S. Randal.

SQL101: Application data consistency checking

Paul Randal — Thu, 23 Apr 2026 16:36:54 +0000

It’s quite common for a company that experiences a corruption-causing disaster, but has no valid backups to restore from and no ability to fail over to a redundant secondary, to just run repair and then immediately start running in production again.

Over the years I’ve taught countless classes and conference sessions that talk about corruption and how it happens, how to run consistency checks, using DBCC CHECKDB, and running repair. When I’m talking about repair, I explain that if you have to run REPAIR_ALLOW_DATA_LOSS and it actually deletes data records, you need to reinitialize any affected replication topologies, and validate any affected constraints. This is all documented in Books Online, but that doesn’t mean that people read it and know that – lol!

What *isn’t* in Books Online, and I make sure the attendees realize, is that repair is all about making the database structurally consistent – it doesn’t know anything about the data relationships between tables (either protected by constraints or just inherent in the schema design). This means that after a data-losing repair, the data relationships in the database may well be broken.

I then always ask the audience: “How many of you have an application data-consistency checker that you can run to validate all the business rules and relationships that the application depends on? In fact, how many of you even have a way to test that the application is working correctly after running a repair or deploying new code?”

Every time I ask, I might (rarely) get one or two hands go up, out of a class of 30 or a session of 50 or more. I’ve never had more than two, and usually I don’t get any.

It’s just not something people think about. They assume that if repair runs correctly then they can carry on and everything will work. No.

Call to action:

You really should have some way to validate that your application is running on correct data. Otherwise, in the best case, it will fail, but in the worst case, it will continue running erroneously – maybe with wrong results that affect your business.

This basically means codifying the required relationships in constraints and/or some code that verifies the required relationships (if they can’t be expressed as relational constraints) are correct.

If you have a third-party application, it might be hard or impossible to persuade the vendor to provide such a tool, especially if they specifically don’t support running repair on their database.

Alternative: have a bullet=;proof baclup-and-restore strategy and/or failover solution.

Bottom line: Your business depends on the applications running in the data tier, and you need to make sure, as much as you can, that they’re running on correct data.

The post SQL101: Application data consistency checking appeared first on Paul S. Randal.

SQL101: Phishing attacks

Paul Randal — Thu, 02 Apr 2026 19:34:42 +0000

I thought it was bad previously, but in 2026 I’ve noticed a big rise in emails trying to scam me into clicking a link – known as phishing. Phishing describes an email that entices the recipient to open it and maybe click a link, which then installs some malware on the computer. This could be something that logs keystrokes and sends them to another system on the Internet.

You’ve likely have received emails like that, purporting to come from Microsoft or PayPal or some other company you recognize, and urging you to click a link to fix something to do with your account. Just since Sunday, I’ve received many phishing emails, including:

To our A/P department, purporting to come from me, giving the ok to pay an invoice from a fake CEO training course in my name
Domain-name expiry notifications
Anti-virus order receipts and click/call if that’s a mistake
Fake DocuSign emails for contracts/receipts

I’ve read many books on hacking, from both sides of the ethical fence, and it always strikes me that security checklists and security reviews of SQL Server environments are all well and good, but there are a few missing things that I think are worth considering.

For instance, does your company provide training or guidance on recognizing and avoiding phishing emails? Such phishing emails could be cleverly targeted, especially if hackers are going after a specific company and make an email look like it’s coming from a source the DBA trusts. If a DBA clicked a phishing link and unknowingly installed malware on a personal laptop, say, and then connected to a work system, the hackers could capture the DBA’s login credentials.

Such phishing emails could be cleverly targeted, especially if hackers are going after a specific company and make an email look like it’s coming from a source the DBA trusts. Hacking books have plenty of descriptions of this being done to companies like banks and defense contractors.

A way to test people in your company would be to create a fake email with a link that takes them to a web page showing that they’ve clicked something they shouldn’t have, and to be wary in future – or just to keep track of what proportion of recipients in the company were fooled into clicking the link.

Another thing to be wary of is social engineering. This is where a hacker calls someone on the phone, pretending to be someone who needs some information that can help them break into a computer system, and fools the person into giving that information out. I’ve read about this being used many times in the past, and is a relatively common technique used by phone scammers. “Hello, we’re from Microsoft support and your computer has been hacked…”

Finally, one of the things you might consider for your company is engaging the services of a third-party company that does what’s called penetration testing. These people will deliberately try to hack into your environment, with your permission, to discover security weaknesses that you can then patch before a malicious hacker tries to break in. Sometimes this is known as ethical hacking, and you can actually learn how to do it yourself, to think about security from the attacker’s perspective and assess your own environment for security flaws.

Call to Action

If you’re responsible for databases that contain any information that you don’t want someone to have unauthorized access to, you need to make sure that your security doesn’t have any problems. That includes making sure that the users are educated about ways that they can be duped into giving out info or installing malware, and testing your system’s defenses to see if they can be broken. You can be sure that someone out there will try to get in sooner or later.

PS A few interesting books on hacking that spring to mind:

@War: The Rise of the Military-Internet Complex
Ghost in the Wires: My Adventures as the World’s Most Wanted Hacker (I actually crossed paths with Mitnick when I was at DEC – R.I.P.)
The Art of Deception: Controlling the Human Element of Security
(where the word comes from) Hackers: Heroes of the Computer Revolution

The post SQL101: Phishing attacks appeared first on Paul S. Randal.

SQL101: Top Ten SQL Server Performance Tuning Best Practices

Paul Randal — Wed, 25 Mar 2026 20:11:58 +0000

There are a huge number of best practices around SQL Server performance tuning – I could easily write a whole book on the topic, especially when you consider the number of different database settings, server settings, coding practices, wait types, and so on that can affect performance. For this post I decided to step back a bit from a list of specifics and list some general recommendations for how to approach performance tuning so that you maximize effort and minimize distractions.

1) Don’t Assume the Symptom Is the Root Cause

Many DBAs and developers tend towards what I call ‘knee-jerk performance troubleshooting’, where a minimal amount of analysis and investigation is performed and the assumption is made that the most prevalent symptom of poor performance must be the root cause. When this happens, and effort is made to try to address the supposed root cause, it can lead to a lot of wasted time, and frustration that the mitigation efforts don’t help the situation.

My favorite example of this, and a problem I’m sure you’ve all had, is when average disk latency is high. The classic knee-jerk reaction is that it must be the I/O subsystem that has a problem, so the company spends money on a better I/O subsystem and the problem goes away for a little while and then comes back again, because the problem is not the hardware itself, but something happening within SQL Server.

For a case like this, it’s generally better to take a mental step back and ask why is SQL Server overloading the I/O subsystem or more precisely, why is SQL Server doing so many physical reads. There are many reasons this could be happening, such as (but not limited to):

An inefficient query plan doing a large, parallel table scan instead of using a nonclustered index because of something like a missing index, or implicit conversion, or out-of-date statistics
Memory pressure on the buffer pool (meaning there isn’t enough space to hold the usual ‘working set’ of database pages) from the OS

It always pays to do some investigation instead of jumping to a quick conclusion on the root cause.

2) Determine the Scope of the Problem

It’s extremely important to figure out what the scope of the problem is, as that determines how you’ll go about investigating the problem, what metrics to gather, and what scripts and tools to use. For instance, being asked to investigate stored procedure XYZ which takes twice as long to run as it usually does is very different from being asked to tune all long-running stored procedures.

Stored procedure metrics can be obtained by running the query in Management Studio, and noting duration, CPU, and IO statistics. That information can also be obtained from the plan cache, and you can also leverage the plan cache when you need to find the longest-running stored procedures. The following query, adapted from the popular set of DMV scripts here, lists the slowest 25 procedures, based on average duration:

SELECT
    TOP (25) [p].[name] AS [SP Name],
    [eps].[min_elapsed_time],
    [eps].[total_elapsed_time] / [eps].[execution_count] AS [avg_elapsed_time],
    [eps].[max_elapsed_time],
    [eps].[last_elapsed_time],
    [eps].[total_elapsed_time],
    [eps].[execution_count],
    ISNULL ([eps].[execution_count] /
        DATEDIFF (MINUTE, [eps].[cached_time], GETDATE ()), 0) AS [Executions/Minute],
    FORMAT ([eps].[last_execution_time],
        'yyyy-MM-dd HH:mm:ss', 'en-US') AS [Last Execution Time],
    FORMAT ([eps].[cached_time],
        'yyyy-MM-dd HH:mm:ss', 'en-US') AS [Plan Cached Time]
    -- ,[qp].[query_plan] AS [Query Plan] -- Uncomment if you want the query plan
FROM sys.procedures AS [p] WITH (NOLOCK)
INNER JOIN sys.dm_exec_procedure_stats AS [eps] WITH (NOLOCK)
    ON [p].[object_id] = [eps].[object_id]
CROSS APPLY sys.dm_exec_query_plan ([eps]. [plan_handle]) AS [qp]
WHERE
    [eps].[database_id] = DB_ID ()
    AND DATEDIFF (MINUTE, [eps].[cached_time], GETDATE()) > 0
ORDER BY [avg_elapsed_time] DESC
OPTION (RECOMPILE);

There are also tools like the Top SQL functionality in SolarWinds SQL Sentry that can help identify highest impact and highest resource using queries.

3) Define the Goal for Success

Once you have the scope of the problem, the next step is to determine the goal of the performance tuning effort, so you know when you’ve achieved success and can move on to another task. Don’t allow the goal to be something undefined and open-ended like ‘stored procedure XYZ needs to be faster’, it needs to be well-defined such as ‘stored procedure XYZ needs to run at the speed it did before, i.e. at 50% of the current elapsed time’.

Sometimes the investigation will be a bit more involved if the scope is wider, requiring capturing metrics and information over time before any analysis and mitigation can start. For instance, one of the first consulting clients I worked with had a somewhat open-ended goal for me which was, paraphrasing, ‘tempdb runs out of space once a week, and we need it not to do that’ without any idea why. The investigation involved me setting up two SQL Agent jobs; one every 10 seconds to look for large uses of tempdb and log information to a table, and another once an hour to email me any results from the previous hour. The general code I wrote to find space-hogs in tempdb is below:

-- InternalMB/Pages: worktables (cursor, spool) , workfiles (hash joins), sort
-- UserMB/Pages: everything else
--
SELECT
    GETDATE () AS [Date],
    [tsu].[session_id] AS [SessionID],
    [tsu].[exec_context_id] AS [ExecContextID], -- anything over 0 means parallelism
    ([tsu].[user_objects_alloc_page_count] -
        [tsu].[user_objects_dealloc_page_count]) AS [UserPages],
    ROUND (CONVERT (FLOAT, ([tsu].[user_objects_alloc_page_count] -
        [tsu].[user_objects_dealloc_page_count]) * 8) / 1024.0, 2) AS [UserMB],
    ([tsu].[internal_objects_alloc_page_count] -
        [tsu].[internal_objects_dealloc_page_count]) AS [InternalPages],
    ROUND (CONVERT (FLOAT, ([tsu].[internal_objects_alloc_page_count] -
        [tsu].[internal_objects_dealloc_page_count]) * 8) / 1024.0, 2) AS [InternalMB],
    [er].[plan_handle] AS [Plan],
    [est].[text] AS [Text]
FROM sys.dm_db_task_space_usage [tsu]
JOIN sys.dm_exec_requests [er]
    ON [er].[session_id] = [tsu].[session_id]
CROSS APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
/*
WHERE
    -- Optionally, filter by a size limit
    -- E.g., the 16384 is 128MB in 8KB pages
    (([user_objects_alloc_page_count] - [user_objects_dealloc_page_count]) +
        ([internal_objects_alloc_page_count] - [internal_objects_dealloc_page_count])) >= 16384
*/
ORDER BY
    (([user_objects_alloc_page_count] - [user_objects_dealloc_page_count]) +
        ([internal_objects_alloc_page_count] - [internal_objects_dealloc_page_count])) DESC;

4) Understand the Limitations

Before you start proposing or making changes, it’s important to know if there are any things you simply cannot do. Here are some examples:

If the application is written by a vendor, you’re not going to be able to make code changes to improve performance
If the application is written by a vendor, you might not even be able to add or change indexes without voiding the vendor’s support agreement
You might not be able to change a setting like MAXDOP or parameter sniffing for the whole server, which may mean using an ALTER DATABASE SCOPED CONFIGURATION option for just one database

Even if you can change code, there may be a lengthy testing process which prevents a change from being immediately implemented, so you may need to pursue alternative solutions (potentially short-term) to quickly fix the problem.

5) Change One Thing at a Time

One of the most confusing things to do when performance tuning is to make multiple changes at the same time, as then you won’t know which change had an effect, or whether multiple changes cancelled each other out. Always change one thing at a time and keep a note of what you changed and what effect it had, if any. Also, if a change doesn’t have any effect then revert the change so that it doesn’t become a complication if the workload evolves at a later date.

6) Do Not Test in Production

One of the worst things to do when performance tuning is to make changes directly in production, as that can lead to dire consequences for the workload and business if a change create a huge negative effect. This means you need a separate test/QA environment that can be used to evaluate changes under production workload conditions, or as close to it as possible. And that leads nicely into the next point…

7) Understand How Test Compares to Production

If your test system doesn’t compare to production then you may not see the same change in performance in production as you do in test. Classic examples of this include:

A production system with a certain number of CPUs (e.g. four 8-core processors) and a lower powered test system to save money (e.g. four quad-core processors)
Along the same lines, test system having a lot less memory than production, or a different NUMA configuration, or a lower-rated storage subsystem
Test system only having a subset of the production data to test with
Test system not being able to simulate the production workload

All of these things can result in the test system producing different query plans, or the workload in test having very different characteristics than in production. This means you’ll be performance tuning for a different workload and environment and the efficacy of the changes may not translate to the production environment.

8) Understand the Implications of the Change

After you’ve determined what the necessary change is, you need to consider what wider effect, if any, making that change will have. For example, if you need to change MAXDOP or the cost threshold for parallelism, that will flush the plan cache, and you might run the risk of parameter-sensitive queries recompiling with sub-optimal plans.

Other changes might be more environmental, like offloading parts of a query workload to a readable secondary in an availability group. That can lead to index fragmentation issues on the primary database, which can be a performance problem of their own (as I described in this SQLPerformance.com post).

You don’t want to solve one performance problem and end up with an unexpected different problem to then have to solve.

9) Create a Rollback Plan

It’s very important that you have a complete log of what’s been changed and have the ability to revert the changes if something goes wrong. This means preserving original copies of all code and schema and ideally having a script you can run to quickly roll back the changes.

If this would be hard to do, and would really entail restoring the database from backups, one thing to consider is creating a database snapshot of the database and keeping it around for a few days. A database snapshot automatically keeps a pre-change copy of all changed data file pages since the time the database snapshot was created and allows you to effectively put the database back to that time with a one-line T-SQL command (internally SQL Server does this by pushing the pre-change pages back into the real database – called ‘reverting the database to the database snapshot’).

10) Remove Diagnostic Elements from Production

Once you’ve finished the investigation and reached the performance tuning goal, make sure you remove all of the diagnostics that you implemented to help with the investigation, as they could cause performance problems themselves if left in place, especially Extended Event sessions as they can become ‘silent killers’ that use up a lot of CPU resources with no other clue that they are the problem.

You can see which Extended Event sessions are running using the following code:

SELECT
    [ses].[name] AS [Session Name],
    CASE
        WHEN [xs].[address] IS NOT NULL THEN 'Running'
        ELSE 'Stopped'
    END AS [State],
    [xs].[create_time] AS [Start Time]
FROM sys.server_event_sessions AS [ses]
LEFT OUTER JOIN sys.dm_xe_sessions AS [xs]
    ON [ses].[name] = [xs].[name]
ORDER BY [State], [Start Time];

And if you’re on SQL Server 2025 and using Extended Events, there’s a new MAX_DURATION option you can use to ensure a diagnostic session stops running after a certain amount of time.

Summary

You should always take a step-by-step approach to performance tuning rather than jumping right in and changing things haphazardly in production, and I hope this post has provided you with a simple framework you can put into practice. There’s a lot of code out there to help you with various investigations, plus free tools like Plan Explorer – I can’t recommend this enough! Happy tuning!

The post SQL101: Top Ten SQL Server Performance Tuning Best Practices appeared first on Paul S. Randal.

The Curious Case of… no buffer pool memory and no OS memory available

Paul Randal — Fri, 13 Mar 2026 20:01:07 +0000

Jonathan had a client issue recently where SQL Server’s buffer pool had been forced down to a ridiculously small size, only a few hundred MB, but the OS also showed basically no free memory. Page Life Expectancy was zero! What was going on?

From investigating SQL Server’s memory usage, the memory manager showed that target and total memory were the same, at only 1.2GB, and lock pages in memory was correctly set. The next thing to check was for ballooning in VMware – a common cause of memory issues – but this wasn’t the problem either.

Next, querying sys.dm_os_sys_memory showed almost 100GB in kernel non-paged pool memory. Jonathan immediately guessed that it was a kernel driver memory leak – based on some deliberate-error-inducing coding he’d done of kernel drivers – so the next step was to see what filter drivers were on the system.

You can do this using the command fltmc on any Windows machine. For instance, in a command prompt using ‘Run as administrator’ on a Windows 10 laptop, it shows:

C:\WINDOWS\system32>fltmc

Filter Name                     Num Instances    Altitude    Frame
------------------------------  -------------  ------------  -----
bindflt                                 1       409800         0
UCPD                                   11       385250.5       0
WdFilter                               11       328010         0
storqosflt                              0       244000         0
wcifs                                   1       189900         0
dbx                                     5       186500         0
CldFlt                                  5       180451         0
FileCrypt                               0       141100         0
luafv                                   1       135000         0
npsvctrig                               1        46000         0
RsFx0800                                1        41008.00      0
Wof                                     8        40700         0
FileInfo                               11        40500         0

And then you can investigate each driver using the Allocated Filter Altitudes list from Microsoft (as they assign a position in the filter driver stack to each filter driver), starting with those with the highest altitudes. From there, you can prove what’s taking up memory using the PoolMon tool from the Windows Driver Development Kit (see Use PoolMon to Find a Kernel-Mode Memory Leak).

Turns out in the client’s case it was a driver from an old tool that hadn’t been un-installed – problem solved!

Bottom line: don’t have the mindset that what manifests as a problem in SQL Server is always a SQL Server problem, as sometimes the issue is environmental and SQL Server is the unwitting victim of the root cause!

The post The Curious Case of… no buffer pool memory and no OS memory available appeared first on Paul S. Randal.

The SQL Server Transaction Log, Part 3: The Circular Nature of the Log

Paul Randal — Mon, 09 Mar 2026 20:02:18 +0000

In the second part of this series, I described the structural hierarchy of the transaction log. As this post is chiefly concerned with the Virtual Log Files (VLFs) I described, I recommend you read the second part before continuing.

When all is well, the transaction log will endlessly loop, reusing the existing VLFs. This behavior is what I call the circular nature of the log. Sometimes, however, something will happen to prevent this, and the transaction log grows and grows, adding more and more VLFs. In this post, I’ll explain how all this works, or sometimes doesn’t.

VLFs and Log Truncation

All VLFs have a header structure containing metadata about the VLF. One of the most important fields in the structure is the status of the VLF, and the values we’re interested in are zero, meaning the VLF is inactive, and two, meaning the VLF is active. It’s important because an inactive VLF can be reused, but an active one cannot. Note that a VLF is wholly active or wholly inactive.

A VLF will remain active while required log records are in it, so it can’t be reused and overwritten (I’ll cover log records themselves next time). Examples of reasons why log records may be required include:

There’s a long-running transaction the log records are part of, so they cannot be released until the transaction has committed or has finished rolling back
A log backup hasn’t yet backed up those log records
That portion of the log has not yet been processed by the Log Reader Agent for transactional replication or Change Data Capture
That portion of the log hasn’t yet been sent to an asynchronous database mirror or availability group replica

It’s important to note that if there are no reasons for a VLF to remain active, it won’t switch to being inactive again until a process called log truncation occurs – more on this below.

Using a simple hypothetical transaction log with only five VLFs and VLF sequence numbers starting at 1 (remember from last time that in reality, they never do), when the transaction log is created, VFL 1 is immediately marked as active, as there always has to be at least one active VLF in the transaction log—the VLF where log blocks are currently being written to. Our example scenario is shown in Figure 1 below.

(Figure 1: Hypothetical, brand-new transaction log with 5 VLFs, sequence numbers 1 through 5 (my image))

As more log records are created, and more log blocks are written to the transaction log, VLF 1 fills up, so VLF 2 has to become active for more log blocks to be written to, as shown in Figure 2 below.

(Figure 2: Activity moves through the transaction log (my image))

SQL Server tracks the start of the oldest uncommitted (active) transaction, and this LSN is persisted on disk every time a checkpoint operation occurs. The LSN of the most recent log record written to the transaction log is also tracked, but it’s only tracked in memory as there’s no way to persist it disk without running into various race conditions. That doesn’t matter as it’s only used during crash recovery, and SQL Server can work out the LSN of the “end” of the transaction log during crash recovery. Checkpoints and crash recovery are topics for future posts in the series.

Eventually, VLF 2 will fill up, and VLF 3 will become active, and so on. The crux of the circular nature of the transaction log is that earlier VLFs in the transaction log become inactive so they can be reused. This is done by a process called log truncation, which is also commonly called log clearing. Unfortunately, both of these terms are terrible misnomers because nothing is actually truncated or cleared.

Log truncation is simply the process of examining all the VLFs in the transaction log and determining which active VLFs can now be marked as inactive again, as none of their contents are still required by SQL Server. When log truncation is performed, there’s no guarantee any active VLFs can be made inactive—it entirely depends on what’s happening with the database.

There are two common misconceptions about log truncation:

The transaction log gets smaller (the “truncation” misconception). No, it doesn’t – there’s no size change from log truncation. The only thing capable of making the transaction log smaller is an explicit DBCC SHRINKFILE.
The inactive VLFs are zeroed out in some way (the “clearing” misconception). No – nothing is written to the VLF when it’s made inactive except for a few fields in the VLF header.

Figure 3 below shows our transaction log where VLFs 3 and 4 are active, and log truncation was able to mark VLFs 1 and 2 inactive.

(Figure 3: Log truncation marks earlier VLFs as inactive (my image))

When log truncation occurs depends on which recovery model is in use for the database:

Simple model: log truncation occurs when a checkpoint operation completes
Full model or bulk-logged model: log truncation occurs when a log backup completes (as long as there isn’t a concurrent full or differential backup running, in which case log truncation is deferred until the data backup completes)

There are no exceptions to this.

Circular Nature of the Log

To avoid the transaction log having to grow, log truncation must be able to mark VLFs inactive. The first physical VLF in the log must be inactive for the transaction log to have its circular nature.

Consider Figure 4 below, which shows VLFs 4 and 5 are in use and log truncation has marked VLFs 1 through 3 as inactive. More log records are generated, more log blocks are written into VLF 5, and eventually, it fills up.

(Figure 4: Activity fills up the highest physical VLF in the transaction log (my image))

At this point, the log manager for the database looks at the status of the first physical VLF in the transaction log, which in our example is VLF 1, with sequence number 1. VLF 1 is inactive, so the transaction log can wrap around and begin filling again from the start. The log manager changes the first VLF to active and increases its sequence number to be one higher than the current highest VLF sequence number. So it becomes VLF 6, and logging continues with log block being written into that VLF. This is the circular nature of the log, as shown below in Figure 5.

(Figure 5: The circular nature of the transaction log and VLF reuse (my image))

When Things Go Wrong

When the first physical VLF in the transaction log isn’t inactive, the transaction log cannot wrap around, so it will grow (as long as it’s configured to do so and there is sufficient disk space). This often happens because there’s something preventing log truncation from deactivating VLFs. If you find the transaction log for a database is growing, you can query SQL Server to find out if there’s a log truncation problem using this simple code below:

SELECT [log_reuse_wait_desc]
FROM [master].[sys].[databases]
WHERE [name] = N'MyDatabase';

If log truncation was able to deactivate one or more VLFs, then the result will be NOTHING. Otherwise, you’ll be given a reason why log truncation couldn’t deactivate any VLFs. There is a long list of possible reasons described here in the section Factors that can delay log truncation.

It’s important to understand the semantics of what the result is: it’s the reason log truncation couldn’t do anything the last time it tried to run. For instance, the result might be ACTIVE_BACKUP_OR_RESTORE, but you know that that long-running full backup has finished. This just means that the last time log truncation was attempted, the backup was still running.

In my experience, the most common reason for log truncation being prevented is LOG_BACKUP; i.e., go perform a log backup! But there’s also an interesting, weird behavior with LOG_BACKUP. If you continually see the result LOG_BACKUP but you know log backups are happening successfully, it’s because there is very little activity in the database and the current VLF is the same as it was the last time a log backup was performed. So, LOG_BACKUP means “go perform a log backup” or “all of the log records backed up are from the current VLF, so it couldn’t be deactivated.” When the latter happens, it can be confusing.

Circling Back…

Maintaining the circular nature of the transaction log is very important to avoid costly log growths and the need to take corrective action. Usually, this means ensuring log backups are happening regularly to facilitate log truncation and sizing the transaction log to be able to hold any large, long-running operations like index rebuilds or ETL operations without log growth occurring.

In the next part of the series, I’ll cover log records, how they work, and some interesting examples.

The post The SQL Server Transaction Log, Part 3: The Circular Nature of the Log appeared first on Paul S. Randal.

SQL101: Indexing Strategies for SQL Server Performance

Paul Randal — Wed, 04 Mar 2026 19:50:17 +0000

One of the easiest ways to increase query performance in SQL Server is to make sure that it can quickly access the requested data, and this is done as efficiently as possible. In SQL Server, using one or more indexes can be exactly the fix you need. In fact, indexes are so important that SQL Server can even warn you when it figures out that there’s an index missing that would benefit a query. This high-level post will explain what indexes are, why they’re so important, and a bit of the both the art and science of various indexing strategies.

What Are Indexes?

An index is simply a way of organizing data. SQL Server supports a variety of index types (see here for details) but this post will consider only the two most common ones, which are useful in a variety of ways and for a wide number of workloads: clustered and nonclustered non-columnstore indexes.

A table without a clustered index is called a heap, where the data rows in the table are unordered. If there are no indexes on the heap that means finding a particular data value in the table requires reading all the data rows in the table (called a table scan). That is obviously very inefficient, and becomes more so the larger the table grows.

A clustered index on a table arranges all the data rows in the table into a sorted order and places a navigational “tree” with the organized data so that it is easily navigated. The table is no longer a heap; it’s a clustered table. The order is defined by the clustered index key, which is comprised of one or more columns from the table. The structure of a clustered index is known as a B-tree, and this basic data structure allows a specific data row to be located (called a “seek”) based on the clustered index key value, without having to scan the whole table.

A good example of a clustered index is a table that stores the details of a company’s employees, where the table has a clustered index using the Employee ID as the key. All the rows in the table are stored in the clustered index in order of Employee ID, so finding the details of a particular employee using their Employee ID is very efficient.

A clustered index only allows efficient location of data rows based on the clustered index key. If it is necessary to be able to find data rows quickly using a different key value, then one or more additional indexes must be created, otherwise a table scan is required. For nonclustered indexes, each index row contains the nonclustered index key value and a locator for the corresponding data row (this is the data row’s physical location for a heap or the data row’s clustered index key for a clustered index).

Continuing the Employee table example, if someone wants to find the details of a particular employee and only knows the employee’s name, a nonclustered index could be created with a composite key of the LastName, FirstName, and MiddleInitial table columns. That would allow the Employee ID for an employee to be found, and then retrieve all the employee’s details from the corresponding data row in the clustered index.

Why Are Indexes So Important?

As you have no doubt gathered, the primary use of indexes is to allow the efficient retrieval of data from a table without having to perform a table scan. By limiting the amount of data that has to be accessed and then processed, there are a lot of benefits to overall workload performance:

Minimal amount of data has to be read from disk. This prevents undue pressure on the I/O subsystem from many queries reading inefficient or larger amounts of data, and help to prevent ‘churn’ in the buffer pool (the in-memory cache of data file pages) by not forcing data already in memory to be dropped from memory to make space for data be read from disk. In some cases, no data will have to be read from disk, if the required data is already in memory.
Minimal amount of data has to take up space in the buffer pool. This means more of the ‘working set’ of the workload can be held in memory, further reducing the need for physical reads.
Any reduction in the amount of physical reads that a query must perform will lead to a drop in execution time.
Any reduction in the amount of data that flows through the query plan will lead to a drop in execution time.

As well as indexes, there are other things that can help produce the benefits above, including:

Using proper join conditions.
Using search arguments to further narrow the data required.
Avoiding coding practices that force a table scan to be used, such as in advertently causing implicit conversions.
Making sure statistics are maintained correctly, so the query optimizer can choose the best processing strategies and indexes.
Taking into account the execution method of a query where a cached plan has been used, resulting in parameter sensitivity problems.

But these are all topics for future posts!

The Art and Science of Indexing

There are two parts to index tuning a workload – there’s both an art and a science. The science is that for any query there is always a perfect index, but the art is realizing that index may not be in the best interests of the overall database or server workload and figuring out what the best overall solution is for your server takes analyzing the server’s workload and priorities.

Clustered index key choice is more of a science than an art, and is a whole discussion by itself, but we usually say that a clustered index key should have multiple properties (in no particular order):

The clustered index key is the data row locator that is included in every index row in every nonclustered index. This means the narrower it is, the less space it will take up overall and that will help with data size.
Fixed-width. A clustered index key should be narrow but also use a fixed-width data type. When a variable-width data type is used then the data row and all nonclustered index rows will incur additional overhead.
If the clustered index key is not unique, then a special, hidden ‘uniquifier’ column is added to the clustered index key for all non-unique data rows, making the clustered index key up to four bytes longer for those rows.
If a clustered index key value changes, the data row must be deleted and reinserted internally, and all nonclustered index records containing that data row locator must be updated.
Ever-increasing. This property helps to prevent index fragmentation from occurring in the clustered index.
Non-nullable. The clustered index key should be unique by definition (see #3, above) so it implies that it cannot allow NULL values. In some SQL Server versions and in some structures, a nullable column would incur more overhead than a non-nullable column. Ideally, none of the columns that make up the clustered index key would allow NULL values.

As a generalization and because you can only have one clustered index, it’s usually nonclustered indexes (and multiple of them) that help queries run more efficiently.

The science of constructing the best nonclustered index for a query involves:

Understanding the search arguments being used and the type of query (as there are different indexing strategies, for instance, when search arguments use AND or OR clauses, when aggregates are involved, and for different join types). The search arguments are basically which table columns are necessary to identify the required data rows. These will likely be part of the nonclustered index keys.
Understanding the ‘selectivity’ of the data in each of these key columns. This will dictate the order of the columns in the index key, with the most selective predicates leading the key definition.
Understanding the SELECT list for the query. Any of these columns may be candidates for being included in the index as non-key columns to avoid the query having to go to the data row to retrieve them (also known as “covering” a query).

And there’s also SQL Server’s missing indexes functionality that will recommend the best index for a query (it focuses on just the science of “query tuning” but not the art of “server tuning”).

The art then becomes taking that index and figuring out whether and how it can be consolidated with other existing or also recommended indexes, so the table doesn’t become over-indexed.

As a simple example, let’s say that a table has ten int columns named col1 through col10.

The first query to index is SELECT col2, col3 FROM table WHERE col6 = value. A nonclustered index on col6 would avoid a table scan, but would require the query to go to the data row to get the values for col2 and col3. A more efficient nonclustered index would have col6 as the key and include col2 and col3 as non-key columns. This is called a covering index, because the index row has all the columns necessary for the index and removes the need to use the clustered index as well to get the additionally requested columns.

The second query to index is SELECT col4 FROM table WHERE col6 = value. The science tells us that a nonclustered index on col6 that includes col4 is likely the best index for the query. But then there are two nonclustered indexes keyed on col6, each including different non-key columns. This is where the art comes in, as the best index for the overall workload is likely a single nonclustered index on col6 that includes col2, col3, and col4. Now you have one index with more uses and fewer overall indexes on the table.

And the art can continue through multiple iterations.

Let’s say a third query is created that is SELECT col4, col5 from table where col6 = value AND col2 = value. The science may say that the best nonclustered index is on (col6, col2) if col6 is more selective than col2, and including col4 and col5 as non-key columns. The art then has us look at consolidation and end up with a single nonclustered index on (col6, col2) that includes col3, col4, and col5. This satisfies all three queries with a single nonclustered index instead of three, so it takes up less space overall at the expense of being less efficient for each query than the individual “perfect” nonclustered indexes would be. However, there’s an added benefit to this consolidation – the fewer nonclustered indexes there are, there less amount of index maintenance needs to be done when a data row is inserted, deleted, or updated.

Obviously, there’s point where you may over-consolidate as well, and that’s where experience in indexing design helps hone your art, so you’re not under-indexing, over-indexing, or over-consolidating.

Summary

There’s a lot more to the art and science of designing an indexing strategy than can be covered in a post such as this but hopefully you now understand why having a good indexing strategy is so important. A deeper primer on indexing is Kimberly L. Tripp’s 7-hour Pluralsight course SQL Server: Indexing for Performance.

The post SQL101: Indexing Strategies for SQL Server Performance appeared first on Paul S. Randal.

March Madness: $699 Blackbelt Bundle!

Paul Randal — Sun, 01 Mar 2026 22:18:13 +0000

Spring is already in the air here in Redmond so time for March Madness!

Hundreds of dollars lower-than-ever prices on our signature 158-hour Blackbelt training bundle:

US$699 for one-year access (or to upgrade) – $200 lower than ever before!
– US$1,299 for Lifetime access (no expiration) – $300 lower than ever before!

Both prices include the 2022 updates and Q&A sessions.

Lifetime subscribers will also get the upcoming re-record of IECAG (clustering and AGs) for SQL Server 2025 for free, and Kimberly’s upcoming IESP (stored proc performance) for free.

See our shop for details and let me know of any questions.

Enjoy!

The post March Madness: $699 Blackbelt Bundle! appeared first on Paul S. Randal.

The Curious Case of… finding long IAM chains

Paul Randal — Wed, 25 Feb 2026 02:36:27 +0000

In the previous Curious Case I described an issue Jonathan had at a client with very long IAM chains, and the circumstances leading to it.

The question was how to prove that some allocation units had IAM chain lengths way out of proportion to the amount of data in the allocation unit, without tediously walking through each IAM chain, starting with the first IAM page (whose ID is always stored in sys.allocation_units internal table).

The answer was to do exactly that, but remove the tedium by writing some nifty code to do it, making use of the sys.dm_db_page_info DMF that was added in SQL Server 2019 instead of having to use DBCC PAGE with the results INSERT … EXEC‘d into a table.

(DMF? Yes, Dynamic Management Function. Remember – they’re all DMOs – Dynamic Management Objects – and either views or functions – DMVs or DMFs. DMVs just look up information but DMFs have to do some work. They’re just collectively called DMVs for simplicity.)

Specifically, the answer was for Jonathan to write the nifty code :-) and here it is. Give it a whirl and let me know if you find any indexes with massive IAM chains compared to the number of data or index pages.

;WITH IAM_PAGES AS
(
    SELECT
        1 AS [IAM_Page_Ordinal],
        P.[object_id],
        P.[index_id],
        P.[partition_number],
        IAU.[total_pages],
        IAU.[used_pages],
        IAU.[data_pages],
        IAM_Page.[file_id],
        IAM_Page.[page_id],
        [pfs_page_id],
        [gam_page_id],
        [sgam_page_id],
        [next_page_file_id],
        [next_page_page_id],
        [is_iam_page]
    FROM sys.partitions P
    INNER JOIN sys.system_internals_allocation_units AS IAU
        ON P.[hobt_id] = IAU.[container_id]
    OUTER APPLY sys.fn_PageResCracker (IAU.[first_iam_page]) AS IAM_Page
    OUTER APPLY sys.dm_db_page_info (
            DB_ID (), IAM_Page.[file_id], IAM_Page.[page_id], 'DETAILED') AS Page_Info
        WHERE IAM_Page.[page_id] <> 0 AND OBJECT_SCHEMA_NAME (P.[object_id]) <> N'sys'
UNION ALL
    SELECT           
        [IAM_Page_Ordinal] + 1,
        IAMP.[object_id],
        IAMP.[index_id],
        IAMP.[partition_number],
        IAMP.[total_pages],
        IAMP.[used_pages],
        IAMP.[data_pages],
        Page_Info.[file_id],
        Page_Info.[page_id],
        Page_Info.[pfs_page_id],
        Page_Info.[gam_page_id],
        Page_Info.[sgam_page_id],
        Page_Info.[next_page_file_id],
        Page_Info.[next_page_page_id],
        Page_Info.[is_iam_page]
    FROM IAM_PAGES AS IAMP
    OUTER APPLY sys.dm_db_page_info (
            DB_ID (), IAMP.[next_page_file_id], IAMP.[next_page_page_id], 'DETAILED') AS Page_Info
        WHERE IAMP.[next_page_page_id] <> 0
),
IAM_Counts AS
(
    SELECT
        [object_id],
        [index_id],
        [partition_number],
        [total_pages],
        [used_pages],
        [data_pages],
        COUNT (*) AS [IAM_Page_Count]
    FROM IAM_PAGES
    GROUP BY [object_id], [index_id], [partition_number],
        [total_pages], [used_pages], [data_pages]
)
SELECT * FROM IAM_Counts
WHERE [data_pages] < [iam_page_count]
--  AND [object_id] = OBJECTD_ID ('Schema.TableName')
OPTION (MAXRECURSION 0);
GO

The post The Curious Case of… finding long IAM chains appeared first on Paul S. Randal.