Accidental DBA
Posts
“SQL Server Is Slow”, Part 4 of 4

“SQL Server Is Slow”, Part 4 of 4

Wrapping up the series by digging into the guts of SQL Server

Kevin Hill
September 26, 2025

Parts 1, 2 and 3 got you to the (SQL) engine room. Now we use community-trusted tools to find what’s going on, fix it safely, and hopefully keep it from coming back.

This post will lean heavily on the First Responder Kit from Brent Ozar, sp_WhoIsActive from Adam Machanic and others. They took what Microsoft provides and made them better. Hundreds or thousands of hours of work to make FREE things for you to use.

This is the most complex newsletter I’ve ever sent. Your experiences may differ, and my code samples might have some bugs or formatting issues from pasting here. Test first.

The Fork in the Road

After ruling out all the previous items from parts 1,2 and 3, you’ll probably land in one of two branches:

Branch A: Obvious & Fixable Now - A misconfiguration, a runaway query, or an ugly blocking chain.
Branch B: Systemic & Chronic - Indexing issues, bad query plan, stats/CE/database compat changes, or a new workload exposing weaknesses.

First touch (rank evidence before touching anything)

Run these back-to-back to see “right now” and “what changed”:

-- 60s wait/CPU/IO snapshot with expert details
EXEC sp_BlitzFirst @Seconds = 60, @ExpertMode = 1;

-- Active requests + blockers + plan handles (save to table if you’ll compare)

EXEC sp_WhoIsActive
     @get_plans = 1,
     @get_additional_info = 1,
     @find_block_leaders = 1;

If the server feels resource constrained (CPU/memory/tempdb/log), confirm with Erik Darling’s:

EXEC sp_PressureDetector;

Check Query Store (if enabled) for regressed queries over the last few hours. This is database level. Database Z that nobody thinks about could be sitting in the corner making life miserable for Database A.

Branch A: Obvious & Fixable Now (surgical)

A1) Confirm the bad actor (copy/paste or screenshot what you find)

sp_WhoIsActive: highest CPU/reads/duration, blocking leaders, tempdb usage.
- You may get lucky and see an obvious long running query blocking everything, or hogging CPU
sp_BlitzFirst (0s mode) for a quick wait profile on demand:

EXEC sp_BlitzFirst @Seconds = 0, @ExpertMode = 1;

A2) Get permission & document the plan

Who’s affected, what’s the risk, what’s the rollback.

A3) Take the action (one change at a time)

Kill a true runaway query(after sign-off):

KILL <session_id>;

Fix low-hanging config already validated in earlier parts: MAXDOP, Cost Threshold, auto-close/auto-shrink OFF, compatibility/CE sanity.
Remove a bad plan:

-- Remove the specific plan from the cache (sample)
DBCC FREEPROCCACHE (0x060006001ECA270EC0215D05000000000000000000000000);

Update some stats

UPDATE STATISTICS dbo.TableName WITH FULLSCAN;  -- or sampled if huge

This will purge some plans from the cache, but new plans will have great stats on at least one table.

A4) Re-measure immediately

EXEC sp_BlitzFirst @Seconds = 30, @ExpertMode = 1;

Goal: waits normalize, blockers drop, and users confirm latency relief.

Branch B: Systemic & Chronic (rank → fix → re-measure)

B1) Rank the biggest fish

EXEC sp_BlitzFirst  @Seconds = 0, @ExpertMode = 1;   -- overall pain profile

EXEC sp_BlitzCache  @SortOrder = 'cpu'; -- try 'reads', 'avg cpu', 'memory grant', ‘avg duration’ etc

EXEC sp_PressureDetector;  -- resource pressure confirmation

Use Query Store timelines to spot regressions and plan churn.

B2) Indexing reality check

EXEC sp_BlitzIndex @DatabaseName = 'YourDB', @Mode = 0;  -- database-wide health

Cross-check with native DMVs when you need specifics:

Top impact missing indexes (advisory; validate!)

EXEC sp_BlitzIndex @Mode = 3, @GetAllDatabases = 1, @BringThePain = 1;  -- database-wide health

Unused/rarely used indexes (drop/consolidate after monitoring)

-- Unused Index Script
-- Original Author: Pinal Dave
-- Edit the last Where clause to suit your situation

Create Table #Unused(
                [Database] varchar(1000),
                [Table] varchar (500),
                [IndexName] varchar(500),
                [IndexID] bigint,
                [UserSeek] bigint,
                [UserScans] bigint,
                [UserLookups] bigint,
                [UserUpdates] bigint,
                [TableRows] bigint,
                [Impact] bigint,
                [DropStatement] varchar(1500)
                )

exec sp_MSforeachdb

'use [?]

Insert #Unused
SELECT
                Db_Name(DB_ID()) as [Database]
                ,o.name AS [Table]
                , i.name AS IndexName
                , i.index_id AS IndexID
                , dm_ius.user_seeks AS UserSeek
                , dm_ius.user_scans AS UserScans
                , dm_ius.user_lookups AS UserLookups
                , dm_ius.user_updates AS UserUpdates
                , p.TableRows
                , dm_ius.user_updates * p.TableRows
                , ''DROP INDEX '' + QUOTENAME(i.name)
                + '' on '' + QUOTENAME(Db_Name(DB_ID())) + ''.''
                + QUOTENAME(s.name) + ''.''
                + QUOTENAME(OBJECT_NAME(dm_ius.OBJECT_ID)) AS ''drop statement''

FROM sys.dm_db_index_usage_stats dm_ius
                INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id
                                AND dm_ius.OBJECT_ID = i.OBJECT_ID
                INNER JOIN sys.objects o ON dm_ius.OBJECT_ID = o.OBJECT_ID
                INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
                INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.OBJECT_ID
                                                                FROM sys.partitions p
                                                                GROUP BY p.index_id, p.OBJECT_ID) p
                                ON p.index_id = dm_ius.index_id AND dm_ius.OBJECT_ID = p.OBJECT_ID
WHERE
                OBJECTPROPERTY(dm_ius.OBJECT_ID,''IsUserTable'') = 1
                AND dm_ius.database_id = DB_ID()
                AND i.type_desc = ''nonclustered''
                AND i.is_primary_key = 0
                --AND i.is_unique_constraint = 0
                --AND o.name in (''CloverSummary'')
ORDER BY
                (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC

--GO
'

Select *
from #unused
Where 1=1
                --and [IndexName] like '%_DDBA'
                --and [IndexName] IN ('')
                --and [database] Not in ('MSDB','tempdb')
                --and [database] in ('StackOverflow')
                --and UserSeek + UserScans + UserLookups < 1
                --and [Table] in ('')
Order By [Database] asc, UserSeek + userscans + UserLookups, impact desc


Drop table #Unused

B3) Bad Queries (identify → inspect plan → fix → handoff)

When is a query “bad”? Probably if it ranks in your top 10 by total CPU/reads/Durstion in the target window, has high avg cost and high executions, triggers spills/memory grants, blocks, or just regressed in Query Store.
- Using the @SortOrder parameter, also check for ‘memory grants’ and ‘avg duration’
Find them fast:

EXEC sp_BlitzCache @SortOrder = 'cpu';           -- or 'reads' , 'avg cpu' , ‘avg duration’ etc.

EXEC sp_WhoIsActive @get_plans = 1

Classic face-palm: Key Lookup from “one more column”
- What happens: a nonclustered index is used for a seek but doesn’t cover a new column added to SELECT, so the engine performs one lookup per row. This shows up in the Query Plan as a ‘Key Lookup’ or ‘RID Lookup’ operator.
- Why it hurts: on hot paths, that lookup loop can hammer CPU/IO and tank a busy server.
- Fix choices: INCLUDE the missing column(s) on the used index, reshape keys to cover, or trim the SELECT if it isn’t needed.

Quick hunt for plans with lookups:

SELECT TOP 50
       DB_NAME(qp.dbid) AS dbname,
       (qs.total_logical_reads*1.0/NULLIF(qs.execution_count,0)) AS avg_reads,
       qs.execution_count, qt.text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE qp.query_plan.exist('//RelOp[@LogicalOp="Key Lookup" or @LogicalOp="RID Lookup"]') = 1
ORDER BY avg_reads DESC;

Typical covering fix (example):

CREATE NONCLUSTERED INDEX IX_Sales_OrderDate_CustomerID

  ON dbo.Sales (OrderDate, CustomerID)

  INCLUDE (TotalDue);   -- add the “new SELECT column” here

Other common suspects: implicit conversions on join/filter columns, parameter sniffing, spills from oversized sorts/hashes, scalar UDFs/multi-stmt TVFs, RBAR triggers.
Developer handoff package (keep it short and useful):
- Evidence: normalized query text + sample parameters, actual plan (.sqlplan or XML), metrics window (total/avg CPU/reads/duration/executions), warnings (spills, lookups, implicit conversions).
- Hypothesis & options: e.g., new column caused Key Lookup on PK_MyTable_ID → options: covering INCLUDE, index reshape/filtered index, query/ORM change, or plan stability tactic.
- Safety: size/impact estimate, rollback (drop index/unforce plan), and success criteria (Query Store deltas, BlitzFirst/Cache snapshots).

B4) Plan stability

Verify statistics are being updated “often enough” and at the right sample size
Look for plan cache bloat and churn
Parameter sniffing fixes: targeted OPTION (RECOMPILE), OPTIMIZE FOR cases, or Query Store forced plan (monitor; keep rollback).
Memory grants/spills: better indexes (narrow keys + good includes), stats refresh, and watch row-goal operators.

B5) Stats/CE/compat sanity

Ensure AUTO_UPDATE_STATISTICS is on (and consider ASYNC).
Recent compat level or CE changes? Compare before/after in Query Store.

B6) Parallelism & CPU policy

Validate MAXDOP and Cost Threshold against core count + workload mix.
Use BlitzCache to spot skewed exchanges or thrashy parallel plans.

B7) Tempdb & log health

sp_PressureDetector will flag contention/pressure; confirm with file IO stats:

SELECT DB_NAME(vfs.database_id) AS dbname, mf.type_desc, mf.physical_name,
       vfs.num_of_reads, vfs.num_of_writes,
       vfs.io_stall_read_ms, vfs.io_stall_write_ms
FROM sys.dm_io_virtual_file_stats(NULL,NULL) vfs
    JOIN sys.master_files mf ON vfs.database_id = mf.database_id AND vfs.file_id = mf.file_id
ORDER BY (vfs.io_stall_read_ms + vfs.io_stall_write_ms) DESC;

Right-size VLFs (all active databases), ensure multiple equally sized tempdb data files, and watch version store if RCSI/SI is on.

B8) New workload exposure

Correlate Query Store/BlitzFirst windows with code deploys, new ORM queries, reporting jobs, ETL shifts, or seasonal peaks. Fix that pattern first (indexing, parameterization, caching, schedule changes).

Change Control & Safety Net

One change at a time → measure → document → keep/rollback.
Save “before/after” artifacts: sp_WhoIsActive snapshots (to table), BlitzFirst output, BlitzCache exports, Query Store screenshots/DMVs.
Always include who/when/why and expected KPI movement.
Assume someone will want a post-mortem. I wrote one this morning for a client outage we fixed.

What “Good” Looks Like

Waits shift away from bottleneck classes to benign background categories.
Top 10 statements show reduced CPU/reads; fewer/shorter blocking chains.
Tempdb/log growth stabilize; fewer off-hours alerts.
Users say “fast enough,” matched to your baseline and SLAs.

Keep It from Coming Back

Maintain baselines (perf counters, waits, file IO, Query Store top queries).
Align index & stats maintenance to your workload.
Add ‘deploy gates’ for schema/index/compat changes with pre/post metrics.
Consider lightweight Extended Event sessions for spills, long-running, parameter-sensitive queries.
Review Query Store regressions and any forced plan safety periodically.

The Bottom Line:

Slow SQL isn’t mysterious; it’s measurable. Rank the pain, fix the biggest offender or the pattern behind it, and prove the result with before/after metrics. Keep notes as you go, and be methodical.

Above all...be calm. Everyone else can panic while you be the hero.

First Month of Pocket DBA® Free!

Pocket DBA

SQL tidBITs:

Unused indexes are a killer. Backhoes cutting internet connections suck. 30 millisecond timeout is WAY different than 30 seconds. Non-SQL Server apps hogging the CPU can be seen in Task Manager.

All of these are things I’ve seen when someone yelled at me for the “SQL Server is Slow”

Be calm, follow your process, be the hero.

Link Party:

Security Advisories
September 2025 Patch Tuesday fixes 81 flaws including two zero-days — Patch available; Microsoft addressed zero-day flaws affecting SQL Server and SMB.

SQL / Data Vendors & Platform News
What’s New in SQL Server 2025 Preview (RC1): AI, encryption, replication changes — New engine features, enhancements, and preview improvements.

Breaking changes in SQL Server 2025 Preview: linked servers, replication, log shipping — Important migration warnings if upgrading to 2025 Preview.

Release Notes for SQL Server Management Studio 21 (v21.5.14) — Details on bug fixes, UI changes, and connection dialog enhancements.
How to Measure Replication Latency in SQL Server AlwaysOn Synchronous Availability Groups — SQLServerCentral guide for monitoring hidden delays in HA setups.

Tech Vendor Moves
NVIDIA Stock Surges on $100B OpenAI and $5B Intel Deals: Driving Sustainable AI Computing

Oracle eyes $20 billion AI cloud computing deal with Meta — If finalized, Oracle would supply Meta with large-scale computing power for AI workloads.

PostgreSQL 18 Released — New version includes improved I/O subsystem, uuidv7(), OAuth2 support, virtual generated columns, and easier major version upgrades.

Reply

or to participate.