Tuesday, March 6, 2012

Large table cleanup with minimal locks


Problem

Your SQL Server production database must be available 7x24 but it contains millions of obsolete rows which hurt performance.  How do you delete these rows without the lock contention and business impact of a standard delete?  In this tip we look at how you can delete massive number of rows in small minimal lock batches.

Solution

Quick OLTP is critical to a business’s success so DBAs need to mindful of data contention caused by application cleanup jobs.  One approach is to select a huge number of keys in one pass then break the cleanup into small batches with commits and sleep windows to minimize contention.  I recommend enabling the cleanup in a SQL job and run at low system usage time.
This is accomplished with the T-SQL code below.  You would customize the definition of obsolete rows (I call deadheads) to your shop. In this case a sale without money and no contact is obsolete.  Historical cleanup is a common use of this routine.  This version works with multiple tables in this case a parent table and its two child tables.  The column SalesID relates the tables.
Two layers of temporary tables are used: the outer table to contain all the obsolete primary keys for this run; the inner table to contain a subset of the same.  The number of keys placed in the temporary tables is best determined by altering the number of rows until the highest rows deleted per minute is reached.  I started with 1,000 rows in the outer table and 100 in the inner table.  I slowly scaled up to 500,000 rows in the outer table and 5,000 rows in the inner table.  25,316 deletions per minute was the peak value.  Beyond 500,000 rows in the outer table the deletion per minute value declined on my server.
Since the application tables are defined with primary keys using the identity property the rows are inserted with ever increasing primary key values making it easy to pick up the oldest rows via SELECT top.  Your case may be based upon a date column so alter the SELECT top 500000 accordingly.
The code loops through the 500,000 row outer table loading 5,000 keys values into the inner table.  Those 5,000 rows are then deleted from the application tables and outer table. The last step is to truncating the inner table.  This loop continues until all 500,000 outer table rows are deleted.  I find keeping the inner table small (5000) minimizes lock contention but you will need to adjust this number to your shop.
If desired use the WAITFOR command to pause processing and allow other applications to process without contending with cleanup.
At any point the job can be cancelled resulting in a brief rollback only for the last inner table set of keys. Committed deletes are not rolled back.
The job can be restarted from the top without issue and can be run as many times a day as needed.  I run this job two times a night on tables needing a quick cleanup then pull back to once a week to maintain optimal performance.

The Code

USE [msdb]
GO
/****** Object:  StoredProcedure [dbo].[usp_ObsoleteRowCleanup]    Script Date: 12/18/2011 14:00:38 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

/********************************************************************

  File Name:    usp_ ObsoleteRowCleanup.sql
  Applies to:   SQL Server 2005/8
  Purpose:      To cleanup obsolete rows.
  Prerequisite: None.
  Assumption:   Removal of obsolete rows improves performance
  Parameters:   None
  Returns:      0 = Succesful, 1 = Errors.
  Author:       Edward A. Polley
  Spec
  -- Definition of deadhead.
  Select COUNT(*) from SalesLead L, Transaction T
  WHERE L.SalesID = T.SalesID
  AND   L.ContactId IS NULL
  AND   T.MoneyIn = 0  

********************************************************************/
CREATE PROCEDURE [dbo].[usp_ ObsoleteRowCleanup]
--Will delete first 500000 deadheads
--Delete is from child tables: SalesZip and Transaction then parent SalesLead
-- 
AS

SET NOCOUNT ON
SET QUOTED_IDENTIFIER OFF

DECLARE
       @DeadHeadCnt int,     
       @Toprow int

BEGIN
      PRINT 'Building Outer Temp Tbl ' 
      PRINT getdate()
           
      CREATE TABLE #OuterTemp(SalesID bigint) -- hold all deadhead keys
      CREATE INDEX IX_OuterTemp_SalesId on #OuterTemp (SalesID) 
   
      CREATE TABLE #InnerTemp(SalesID bigint) -- hold subset of deadhead keys
      CREATE INDEX IX_InnerTemp_SalesID on #InnerTemp (SalesID)   

      INSERT INTO #OuterTemp                  -- select first 500,000 keys
        Select TOP 500000 L.SalesID from YourDB.dbo.SalesLead L (Nolock),
                                       YourDB.dbo.Transaction (Nolock) T
        WHERE L.SalesID = T.SalesID
        AND   L.LeadContactId IS NULL -–substitute your condition
        AND   T.MoneyIn = 0  
        OPTION (MAXDOP 8);          -- allow parallel processing                    
    -- mainline
      Select @DeadHeadCnt = count(*)From #OuterTemp
      PRINT 'Number of Dead Heads'
      PRINT @DeadHeadCnt

      While @DeadHeadCnt > 0
      BEGIN
            PRINT 'Begins deletes ' 
            PRINT getdate()
            PRINT 'Top row '
            SELECT @toprow = min(SalesID) From #OuterTemp   -- ever incr key
            PRINT @toprow
           
            BEGIN TRANSACTION
              INSERT INTO #InnerTemp      -- experiment with this value
                  Select TOP 5000 SalesID from #OuterTemp
            
  -- Delete from application tables by processing small intertemp
              DELETE from YourDB.dbo.SalesZIP where SalesID in
                  (Select SalesID from #InnerTemp (nolock) );  --innertemp
                 
              DELETE from YourDB.dbo.Transaction where SalesID in
                  (Select SalesID from #InnerTemp (nolock) ) ;
           
              DELETE from YourDB.dbo.SalesLead WHERE SalesID in
               (Select  SalesID from #InnerTemp (nolock) );

             --- Delete from outer table         
              DELETE from #OuterTemp WHERE SalesID in
               (Select  SalesID from #InnerTemp (nolock) )  ;

              Select @DeadHeadCnt = count(*)From #OuterTemp -- reset loop cnt
              TRUNCATE TABLE #InnerTemp   -- cleanup working table

              COMMIT 

            --WAITFOR DELAY '00:00:01'   -- adjust as needed
        
            PRINT 'End Deletes' 
      END  
-- Destroy all temporary tables.
IF OBJECT_ID('tempdb..#OuterTemp') IS NOT NULL
      DROP TABLE #OuterTemp

IF OBJECT_ID('tempdb..#INNERTEMP') IS NOT NULL
      DROP TABLE #InnerTemp
END
GO

Source:
http://www.sqlservercentral.com/articles/T-SQL/87284/

No comments:

Post a Comment