Monday, January 2, 2012

SQL Server Database Growth and Autogrowth Settings

Essentials of SQL OS Memory allocation and management

The Windows Operating System runs every process, including the SQL Server process, in its own dedicated area of virtual memory, known as the Virtual Address Space (VAS). The VAS is divided into two regions; kernel mode (or system) space and user mode (or application) space. The kernel mode VAS is used by the OS, for mapping various system data structures such as the file cache, Paged and Non-Page pools. The user mode VAS is used to map memory for the currently-executing application process i.e. SQL Server.

Under default configuration, all SQL Server's memory allocations are made in the user mode VAS, using calls to the VirtualAlloc() Windows API function. Any memory allocated by VirtualAlloc() is pageable, meaning that the Windows OS can force this memory to be paged to disk, in response to memory pressure. Memory allocated byVirtualAlloc doesn't have to be physically present but Windows ensures that the amount of memory committed by SQL Server will be less than, or equal to, the installed physical memory, plus page file capacity.

When the Lock Pages in Memory permission is assigned to the SQL Server service account, then memory allocations for the buffer pool are made using calls to a function in the AWE API calledAllocateUserPhysicalPages().All of the memory that is allocated using the AllocateUserPhysicalPages() API are considered locked, i.e. non-pageable, and must be backed by physical memory on the server.

As a general rule, SQL Server will use as much memory as you can give it, and it will not release the memory that it has allocated under normal operations, unless the Windows Server OS sets the memory low resource notification flag. A component of the SQLOS called the Resource Monitor monitors the QueryMemoryResourceNotificationWindows Server API and when Windows sets the low memory resource notification, the SQLOS will respond by sweeping its caches internally to reduce the process working set, and release memory back to Windows.

At this point any memory that is pageable, i.e. memory allocated via VirtualAlloc(), may be paged to disk in order to free up more memory for the OS. Conversely, any memory allocated using AllocateUserPhysicalPages() is locked and cannot be paged. In cases where large amounts of memory are locked, it can limit how much memory the Windows OS can reclaim under pressure, and this could lead to system instability.
Nevertheless, the Windows OS will do what it can to reduce memory consumption and any of the SQLOS structures that aren't locked, for example the thread stacks, and any other non-buffer pool memory, can still be paged out. This can cause problem performance issues, but it will rarely impact SQL Server performance in the same way as having 50 GB+ of buffer pool paged to disk, immediately.

The potential problem with using Lock Pages in Memory is that if the SQLOS can't respond quickly enough to a low memory notification, to release the memory that Windows needs, it can cause Out of Memory (OOM) errors in Windows, and instability. The way to avoid this, as we'll discuss a little alter, is to make appropriate configuration changes to prevent Windows from experiencing memory pressure in the first place.

A Brief History of Locked Pages in Memory

Let me start by saying that this whole debate of whether or not to use LPIM is framed entirely within the context of a 64-bit environment. However, in order to understand fully why this topic has caused so much confusion over the years, it's worth briefly revisiting the bad old days of 32-bit.

If you are running 32-bit SQL Server, and need access to more than 2 GB of user mode VAS, then you have to use Lock Pages in Memory; there is no debate there. You must configure the OS to use Physical Address Extensions (PAE), enable Address Windowing Extensions (AWE), and then assign the Lock Pages in Memory permission to the SQL Server account so that it can allocate AWE memory, via calls to AllocateUserPhysicalPages(). So, in a typical 32-bit server using LPIM you'd have, in addition to the 2 GB of pageable user mode VAS, a separate AWE-mapped area, up to 64 GB (the PAE pointer was 36-bit), of non-pageable memory. This AWE-mapped area is for exclusive use by the data cache portion of the buffer pool. The rest of the buffer pool (mainly the plan cache), and other non-buffer pool allocations are still mapped within the 2 GB of user mode VAS.

However, the advent of a 64-bit SQL Server process completely changed the dynamics of memory allocation by the SQLOS. In place of the default 2 GB user mode VAS, for a 32-bit process, a 64-bit process has access to up to 8 TB of user mode VAS out-of-the-box, without any need for further configuration changes! 64-bit users now have a potentially-vast amount of memory for the buffer pool, but all of which is allocated via VirtualAlloc, and backed by user mode VAS, and so is pageable i.e. in the absence of LPIM, the memory allocated for the data and plan cache is pageable.

In 64-bit SQL Server, the SQL Server account still requires the Lock Pages in Memory permission in order to be able to allocate locked pages, via AllocateUserPhysicalPages(), but there are a couple of big differences:
  • The underlying reliance on AWE-mapped memory is removed. You do not need AWE in 64-bit SQL Server; the awe enabled sp_configure option has no meaning. The continued use of the same AWE API function is purely to ensure that the allocated pages are locked.
  • Memory allocated via AllocateUserPhysicalPages() can be used for both the data cache and plan cache. In 64-bit SQL Server, the plan cache is no longer allocated separately (it now uses stolen pages from the buffer pool)
The crux of the problem is that, under a number of conditions, the Windows Server OS may trigger hard working set trims of the running processes, forcing large amounts of memory allocated by SQL Server to be paged out to disk, and leading to performance degradation in the SQL Server environment. Some of the specific scenarios where this can occur have been documented by the Product Support Services group at Microsoft in KB 918483 (http://support.microsoft.com/kb/918483).

This was a particular problem for early 64-bit environments – SQL Server 2005 on Windows Server 2003 – where the OS was aggressive in its requests to trim SQL Server's working set, in response to memory pressure. The problem was greatly exacerbated by the fact that in early 64-bit SQL Server Lock Pages in Memory was an Enterprise-only feature. If you were running Standard Edition, there was nothing you could do to prevent these working set trims.

So, whereas 32-bit users had a relatively small area of pageable memory (the 2 GB of user mode VAS) and then a bigger area of locked memory for the data cache, which was protected from hard trims, early Standard edition 64-bit users had no defense against the OS hard trimming the most significant portion of the SQL Server working set, in response to memory pressure.

If you have a SQL Server with 64GB RAM and 52GB of that is allocated to the buffer pool, these hard trims have a significant performance impact on the server operation, since the entire purpose of the buffer pool is to minimize disk access by caching frequently, or at least recently used pages in memory where the access time is significantly faster than it would by retrieving the pages from disk.

I can only guess at the number of product support cases that were created as a result of this issue, and it took a lot of pressure from the community, and specifically from the MVPs, to have LPIM added to Standard Edition. Finally, the pressure paid off and Bob Ward announced, first at PASS Europe 2009 and then on his blog that SQL Server Standard Edition would finally include the option to use Lock Pages in Memory. This change was released in May 2009 with CU4 for SQL Server 2005 Service Pack 3 and CU2 for SQL Server 2008 Service Pack 1 (http://support.microsoft.com/kb/970070). If you were on Standard Edition, you needed to apply the appropriate cumulative update to be able to enable Trace Flag 845 to make use of Lock Pages in Memory for the buffer pool.
The situation stabilized, and subsequently, in Windows Server 2008, changes made to the memory manager (also documented in the previously-referenced KB article) greatly reduced the problem of hard working set trims for SQL Server. This prompted Microsoft to announce, soon after the release of Windows Server 2008, that Lock Pages in Memory was no longer required.

This brings us more or less back to the current situation, and the advice from the Brent, Glenn and others that if you're running SQL Server 2005 on Windows Server 2003, you need LPIM; if you're running Windows Server 2008, or later, you don't, at least not as a default.

Why Lock Pages in Memory should be a default configuration

I'll present my reasons for the continued use of LPIM, as a default choice, even on Windows Server 2008 and Windows Server 2008 R2, in terms of the major counter-arguments:
  • Improvements in memory management mean it's no longer required – my experience suggests otherwise
  • On 64-bit, use of LPIM can cause OS instability during memory pressure, as it limits the memory Windows can rapidly reclaim through paging – these issues can be avoided by careful configuration of SQL Server memory settings

Hard trims still happen on recent Windows Server versions

It's certainly true that changes made to Windows Server 2008 memory manager make the problem much less drastic than it was under Windows Server 2003. However, some of the problems listed in the KB article still occur under Windows Server 2008 and Windows Server 2008 R2 and can still result in hard trims of the working set and to serious problems for SQL Server.

I've worked with numerous clients, who were Windows Server 2008 and Windows Server 2008 R2 and were nevertheless suffering from performance problems that had their root cause in hard working set trims issued at the behest of Windows.

In some cases, these trims weren't actually being caused by memory pressure on the system; in the worst case the server had 64 GB RAM installed in it, and at the point that the hard trims were being triggered by Windows Server 2008 R2, the server had over 48 GB of available memory!

What is really insidious about this particular case is that because less than 50% of the SQL Server process memory was getting trimmed, no notifications about the trim were being logged in the SQL Server error log! The only way to track down this problem was to monitor paging, via the Performance Monitor counters for the Process object (as documented in http://support.microsoft.com/kb/918483).

Over time, I have engaged with countless customers where this has proven to be the cause of their performance issues with SQL Server, and as a result I have reverted to the stance that Lock Pages in Memory should be used as a default configuration if you are running SQL Server on a 64-bit instance of SQL, with more than 16-32 GB RAM, regardless of the version and edition of Windows Server OS that you are running.

If you are suffering performance problems related to memory trims, then it's very likely that enabling Lock Page in Memory will help. Of course, if you can find out the underlying cause of the working set trim and stop it happening then this is even better. Unfortunately, tracking down the cause of the sorts of problems detailed in the KB article can take a long time, even with the help of Microsoft Customer Support Services, and until you do, you will continue to have performance problems with SQL Server. I would rather prevent issues from occurring in an environment than wait to find out if it might occur and then try to react to performance issues after the fact.

Preventing Problems with Lock Pages in Memory

With Lock Pages in Memory, as with everything, there is no such thing as a free lunch and while it can help prevent potential problems associated with hard working set trims of SQL Server, it can also lead to out-of-memory conditions for the Windows Server OS, if appropriate configuration of the overall system has not been made to prevent the OS from getting into memory pressure. Even though the SQLOS is designed to monitor for low memory notifications from the Windows OS through the QueryMemoryResourceNotification API, it is possible that under load, the SQLOS won't be able to respond quickly enough to a low memory condition, and the Windows OS could become unstable as a result.

As a best practice, even when you're not using Lock Pages in Memory and certainly before enabling it, you need to set an appropriate value for the 'max server memory' sp_configure option, in order to limit the amount of memory that SQL Server allocates for its buffer pool and to leave enough memory available for the Windows Server OS, and other applications running on the server, to be able to operate without triggering memory pressure on the server. These "other applications" include anti-virus software, Integration Services, and any multi-page allocations by SQL Server that occur outside of the buffer pool.

Unfortunately, there is no hard and fast rule that determines what the optimal value for 'max server memory' will be, for a given instance of SQL Server. The best recommendation I can make would be to set this value artificially low and then gradually fine tune the value, based on monitoring of the Memory\Available Mbytes performance counter in Windows, till you reach the optimum value for the server.

I tend to start out by reserving 1-2 GB RAM for the OS, and then an additional 1GB for each 4 GB of RAM installed from 4-16 GB, and then 1 GB for every 8 GB RAM installed above 16 GB RAM. I then monitor the Memory\Available Mbytes counter over time to determine peak memory usage is for the system. Memory in excess of what's required to support this peak memory usage can be added to the 'max server memory' option.

At a more technical level, you can perform the necessary calculations for the size of the SQL Server Thread Stack, estimate the usage of memory from multi-page allocators in SQL Server, such as SQLCLR, add in the additional memory requirements for the SQL Server process, and then for each of the applications or services that are running, to try to arrive at a reasonable the starting value for 'max server memory'. My personal experience has been that trying to select a value in this manner tends to result in setting the value too high, and it isn’t always clear that this is the case until you have a problem.

Considerations for Virtual environments

Brent, in his previously-referenced blog post, gives the basis of a compelling argument against the use of LPIM, for SQL Servers running in a virtual environment. Having SQL Server running on Virtual machines certainly does pose an interesting problem regarding use of Lock Pages in Memory, since the potential for memory overcommit exists, depending on the hypervisor being used. Memory overcommit is a scenario where the memory allocated to the virtual machines running on the host exceeds the total amount of physical RAM available in the server.

When memory overcommit occurs, one of the first ways that the hypervisor reacts is to make use of a special driver, known as a balloon driver, which is installed in the VM as part of the VM tools. In essence, the hypervisor sets the balloon driver the task of reducing memory consumption in the VM to a target level, and the balloon driver responds by acquiring memory in the VM. This 'ballooning' activity creates memory pressure in the VM, which in turn prompts the guest OS to reduce the physical memory usage of processes in the VM. The released memory is then available to the hypervisor for allocation to the other VMs running on the host, as necessary to prevent memory pressure at the hypervisor level. Ultimately, if not enough memory can be released, the hypervisor will also begin hard paging VM memory to disk, which can have a huge impact on performance.

The situation is even more complex when SQL Server is using Lock Pages in Memory, since the ballooning causes memory pressure in the VM, but the guest OS is limited in the amount of memory it can free up by paging to disk. SQLOS will still respond to the memory pressure by reducing its memory usage internally but it may fail to respond quickly enough, resulting in an OOM condition for Windows OS running in the guest, just as if a physical machine ran out of memory.

There are a number of ways to deal with the balloon driver issues associated with VMs, the worst of which is to disable the balloon driver entirely for the SQL Server VM. The balloon driver exists to allow the hypervisor to manage memory pressure in the best way possible for overall performance of the VMs running on the host, so it should not be disabled as a general rule.

Instead, if the VM has a condition such that Lock Pages in Memory is needed to prevent hard paging of the SQL Server working set, or the VM needs to be guaranteed to have a minimum set of memory resources, then a reservation should be configured for the VM so that the hypervisor only balloons its memory as a last resort. In a situation like this, it would be best to set the 'min server memory' sp_configure option so that the SQL Server only reduces its memory down to the minimum level required for appropriate application performance.

Summary

So there you have it, Lock Pages in Memory warts and all; the history of it usage, the confusions surrounding its use, and why I believe it's still a good default configuration for 64-bit SQL Server instances, even when running Windows Server 2008 and Windows Server 2008R2.

In my experience, the same problems with hard working set trims, which plagued 64-bit SQL Server instances running on Windows 2003, can and do still occur even under Windows Server 2008 and Windows Server 2008R2, and the potential dangers of using Locked Pages in Memory can be minimized by accurate configuration of 'max server memory', and careful memory usage monitoring.
Ultimately the decision is yours, but hopefully this article will at least help make that decision better-informed. On this topic, I take the same stance as the great Benjamin Franklin: "An ounce of prevention is worth more than a pound of cure."

Source:

No comments:

Post a Comment