User login

Ask the Performance Team

Syndicate content
Thoughts from the Commercial Technical Support Windows Server Performance Team
Updated: 16 years 4 weeks ago

Task Scheduler Changes in Windows Vista and Windows Server 2008 – Part One

Tue, 06/24/2008 - 11:00

Today we are looking at a couple of new changes/additions to the Task Scheduler service in Windows Vista and Server 2008.  As an overview, the Task Scheduler service provides controlled, unattended management of task execution, launched either on a schedule or in response to events or system state changes.  If you have worked with Task Scheduler in the past, then the updates/changes are fairly significant.  So, with that said, let’s dive right in … starting with the User Interface:

As you see above, Task Scheduler has now been integrated into the MMC as a new snap-in.  Say goodbye to the stand alone Scheduled Tasks window via Control Panel, and hello to your one stop shopping location for everything related to the Task Scheduler.  Within this window, you are presented with the Task Status and Active Tasks' section.  These sections allow you to quickly view the status of your tasks and which ones are currently active.  There are quite a few changes, so to keep our post brief, we’re only going to cover Triggers and Conditions and Settings in this post – beginning with Triggers:

The ability to trigger a task based on any event captured in the event log is one of the most powerful new features of the Windows Vista / Server 2008 Task Scheduler.  This new capability allows administrators to send an e-mail or launch a program automatically when a given event occurs.  And it can be used to automatically notify a support professional when a critical event—for example, a potential hard drive failure—occurs on a client machine.  It also enables more complex scenarios, such as chasing down an intermittent problem that tends to manifest overnight.  Task Scheduler can be configured to notify an administrator by e-mail that a problem has occurred.  An administrator can also use Task Scheduler to automatically launch a program to collect more data when the error occurs.

Setting up tasks to launch when events occur is easy with the new Task Scheduler Wizard in Windows Vista / Server 2008.  An administrator can simply select the task in the Event Viewer to be used as a trigger and, with one click, launch the Task Scheduler Wizard to set up the task.  The seamless integration between the Task Scheduler user interface and the Event Viewer allows an event-triggered task to be created with just five clicks.  In addition to events, the Task Scheduler in Windows Vista / Server 2008 supports a number of other new types of triggers, including triggers that launch tasks at machine idle, startup, or logon.  A number of additional triggers allow administrators to set up tasks to launch when the session state changes, including on Terminal Server connect and disconnect and workstation lock and unlock.  Task Scheduler still allows tasks to be triggered based on time and date, and provides easy management of regularly scheduled tasks.

In the new Task Scheduler, triggers can be further customized to fine tune when tasks will launch and how often they will run. You can add a delay to a trigger, or set up a task to repeat at regular intervals after the trigger has occurred.  Administrators can also set limits on tasks, indicating that the task must stop running after a given period of time.  Activation and expiration dates can also be specified.

In addition to specifying Triggers, a number of conditions can be defined for each task.  Conditions are used to restrict a task to run only if the machine is in a given state.  For example, you can launch a program when an event occurs only if the network is available, launch an action at a specific time only if the machine is idle, or launch an action at logon only if the computer is not operating in battery mode.  In Windows Vista / Server 2008, administrators can define conditions based on the idle state of the computer, the power source of the computer (AC versus batteries), network connectivity, and the power state of the computer ("ON" versus in a sleep state).  Perhaps most importantly, a task can be configured to awaken the computer from hibernation or standby to run a task.

Administrators can use settings to instruct Task Scheduler what actions to take if a task fails to run correctly. In case the task fails, administrators can indicate how many times to retry it. If the computer is not powered on when a task is scheduled, an administrator can use settings to ensure that the task will run as soon as the machine is available. An administrator can also define a maximum execution time for a task, ensuring that the task will time out if it runs too long.

With that, it’s time to wrap up this post.  In our next post we will cover Flexible Actions and Triggers, Security and Reliability.

- Blake Morrison

Share this post :

Working with Very Large Print Jobs

Fri, 06/20/2008 - 11:00

There are sometimes situations where printing of very large documents containing high resolution graphics, text and images is needed.  With the growing technology of high end cameras flourishing in the market, image sizes are growing larger and larger.  Additionally, image editing applications present endless opportunities to enhance and modify images to your heart's content. Due to the amount of information stored in images like this, the final spool job can sometimes reach multiple gigabytes in size.  There are some issues seen when we print extremely large print jobs – our focus today will be on those issues, as well as some solutions.  Let’s get started with having a look at the issues first.

  1. When a print job reaches 3.99 GB in size, the counter for the job size resets to 0 and it starts growing again.
  2. While printing the job, the printer prints the initial data and then suddenly spits out the paper as if the print job is over.  On restarting the print job, it starts printing again from the beginning.

To keep the scope of our discussion within reason, the environment in our example is Windows XP/ Windows Vista/Windows Server 2003 x64 clients and Windows Server 2003 or Windows Server 2008 x64 as the print server.  To begin with, it is often thought that a print job cannot grow over 4 GB in size, but this is not true.  The spool file (.spl file) which gets created can actually grow easily to over 4 GB in size.  Thus, the obvious question, why do very large print jobs fail to print as expected?  There are two reasons for this behavior:

The first aspect is when we are seeing the print job go to 3.99 GB and reset to zero and start over again.  This behavior is just a benign issue of the display of the print job and has no actual effect on the actual printing of the Job.  The issue with the UI showing the wrong size is known and does not impact the actual print job.  The UI only displays 32-bit sizes and wraps larger values.  Internally print job sizes are kept as 64-bit values.

The second issue is why the print job itself actually fails.  First, it is essential to know which application is being used to generate the print job - is it a 32-bit application or 64-bit native application We normally see this issue when we have a 32-bit application printing to a 64-bit server.  Here is what happens. When the application is printing, there are two ways the job may be programmatically created, as we can see in the diagram below (we also discussed several aspects of printing in our post on Basic Printing Architecture last year:

  1. Printing via GDI
  2. Printing directly through the Print Spooler (winspool.drv) bypassing GDI

The Graphics Device Interface (GDI) enables applications to use graphics and formatted text on both the video display and the printer.  Microsoft Windows based applications do not access the graphics hardware directly.  Instead, the GDI interacts with device drivers on behalf of applications. The GDI can be used in all Windows-based applications.  When a print job is created via the GDI interface, there is a limitation of 4 GB per page.  If a single page is over 4 GB in size, it will not print properly.  If a job is made up of multiple pages, but no single page is over 4 GB in size, you should not have a problem.  So, what is the solution for printing large documents?

  • In the case of a single job for instance, you can select the option to 'Print directly to the printer' on the Advanced tab under the printer properties.  However, it would not be recommended to configure this as the default setting, since that basically defeats the purpose of having a print server.
  • The application you are using may allow you to resize or spread out the images so that a single page will not be over 4 GB in size.  The problem with this of course is knowing which pages will be of what size until you try to print and it fails.
  • There is another way to make this work - if you are the developer of the application in question.  You can use certain API's to facilitate large print jobs.  The application can generate a printer-ready PDL of any size and complexity and use the AddJob, ScheduleJob, StartDocPrinter, WritePrinter and EndDocPrinter API's to spool the raw PDL.  PDL stands for Page Description Language, and is basically the format by which a page is put together into a print job.  You can think of it as sort of the most basic format of a print job.  PCL and PostScript for instance are forms of PDL.

Winspool.drv is the client interface into the spooler.  It exports the functions that make up the spooler's Win32 API, and provides RPC stubs for accessing the server.  The OpenPrinter, StartDocPrinter, StartPagePrinter, WritePrinter, EndPagePrinter, and EndDocPrinters functions mentioned above are all provided by winspool.drv.  The functions in winspool.drv are mainly RPC stubs to the local spooler service (Spoolsv.exe).  By using these API's to create the job, the spooler will be able to bypass GDI and send the PDL directly to the printer via Winspool.  Here is how a print job would be created with help of these API's:

  1. Application calls OpenPrinter to get a handle to a printer from the Spooler.
  2. Application calls StartDocPrinter to notify the Spooler that a print job is started.  If successful, a job id is returned to represent the job created by the Spooler.
  3. Application calls StartPagePrinter, WritePrinter, and EndPagePrinter repeatedly to define pages and write data to the printer.
  4. Application calls EndDocPrinter to end the print job. In your code it may look similar to the following:

OpenPrinter() StartDocPrinter() StartPagePrinter() (This starts a new page) WritePrinter() (This writes data to the page) EndPagePrinter() (This ends the page. Repeat the last three steps until all pages are done) EndDocPrinter() CloserPrinter()

And with that, we’ve reached the end of this post.  Hopefully this information helps you understand some of the challenges involved with very large print jobs.

- Ashish Sangave

Share this post :

To DEP or not to DEP …

Tue, 06/17/2008 - 11:00

In my previous posting on Access Violations, I briefly mentioned Data Execution Prevention (DEP).  I have recently had the opportunity to work on a couple of customer issues that caused me to dig a bit deeper into the workings of DEP, so I figured that I would pass this knowledge on.  To begin with, some quick background on DEP.  Data Execution Prevention, or DEP, is Microsoft's software implementation that takes advantage of hardware NX  or XD support.  NX stands for No Execute and XD stands for Execute Disabled and are the ability for the processor to mark physical memory locations with a flag indicating whether or not the data in that location should be executable or not.  NX is AMD's implementation and XD is Intel's, but they are basically the same thing.  This software support requires the Windows PAE kernel be installed, but this should happen automatically, so you don't have to set the /PAE switch in your Boot.ini.  What all of this means is that with DEP, the operating system has the ability to block certain code from executing on the system.  DEP was first introduced with Windows XP Service Pack 2 and has been included in every Microsoft OS and service pack since then.

With hardware enforced DEP, all memory spaces are automatically marked as non-executable unless they are explicitly told they are being allocated for executable code.  This flag is set on a per-page basis and is set via a bit in the page table entry (PTE) for that page.  If something tries to execute code from a memory region that is marked as non-executable, the hardware feature passes and exception to DEP within Windows and lets it know that this is happening. DEP then causes an assert within the code stack that is executing, which causes it to fail with an access violation, which should look pretty much like the following:

In the past, this was not enforced and code could execute from basically anywhere.  This allowed virus and malware writers to exploit a buffer overflow, and spew a string of executable code out into an unprotected data region.  It could then execute it from that location uncontested. Those of you who remember the outbreaks of Blaster and Sasser – those are prime examples of using this sort of exploit.  By combining processor NX or XD support with Windows OS support, this type of vulnerability should be largely mitigated.

Sometimes an innocent application will trigger DEP simply due to faulty coding.  We often see this on older applications or things like shareware.  It is usually not intentional and never caused a problem in the old days, but now that security is paramount, inefficient (and sometimes sloppy!) memory management can cause some serious issues.  The right answer of course is for the application vendor to rewrite the portion of the app that is triggering DEP, but that is of course not likely in the case of older applications or shareware applications.  In this case, you can exempt the application for DEP monitoring so that DEP ignores it.  As long as you trust the application in question and know it is not really doing anything malicious, exempting it from DEP should not be a problem. Here is what the GUI looks like:

You can add a program to the exemption list by simply clicking Add and browsing to the .EXE file in question.  However, there are a couple of other ways to disable DEP for a specific application beyond using the GUI.  The first is by changing the Application Compatibility settings for the application in the registry.  To do this, browse to the following key in the registry:  HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AppCompatFlags\Layers.  For each application for which you want to disable DEP, you create a string value where the name of the value is the full path to the executable.  You would then set the value data to “DisableNXShowUI” as shown below.

If you have several applications for which you want to disable DEP across your environment, it may be worthwhile to use the Application Compatibility Toolkit to deploy a custom Compatibility Database (see the TechNet article on Resolving Application Compatibility Issues with Compatibility Administrator for more details).

Turning our attention back to the boot.ini for a second before we wrap up, you may have noticed an entry in your Boot.ini saying Optout or Optin, like this:

[boot loader] timeout=30 default=multi(0)disk(0)rdisk(0)partition(2)\WINDOWS [operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINDOWS="Windows Server 2003, Standard" /fastdetect /noexecute=optout

The 'noexecute' value dictates what state DEP operates under. There are four different entries possible:

  • Optin:  DEP is enabled for Windows system binaries and any application that 'opts in'.  By default, only Windows binaries will be covered.  This is the value set if you choose the option 'Turn on DEP for essential Windows programs and services only' listed in the screenshot above.
  • Optout:  DEP is enabled for all processes, not just Windows binaries.  An application or process can be 'opted out' on an individual basis.  The Application Compatibility Toolkit can be used to create shims to opt-out apps, then deployed on your network.  This option is set if you choose 'Turn on DEP for all programs and services except those I select', like in the screenshot above.
  • AlwaysOn:  DEP is on for all processes, period. You cannot exempt processes from DEP monitoring, and any Application Compatibility Toolkit shims do not apply.
  • AlwaysOff:  Totally disables DEP regardless of hardware support. In addition, the PAE kernel will not be installed unless /PAE is put in the boot.ini.

Please note that these last two values must be set manually.

With that, we’ve come to the end of this post.  Hopefully you find this information useful!

- Tim Newton

Share this post :

“Nothing Changed in Our Environment”

Fri, 06/13/2008 - 11:00

When customers call us with issues – in particular application or program failures, one of the first questions that we ask is, “What changed in the environment”.  More often than not, the answer is, “Nothing”.  In some cases, that may be true, however in a majority of cases, there has been some change of which the system administrator that we are working with is unaware.  Tim Newton discussed some aspects of program crashes in his recent post, Access Violation?  How dare you …, but let’s go ahead and recap some of them.  The most common cause for an application crash is when a program tries to read or write memory that is not allocated for reading or writing by the application – a general protection fault.  Some other causes are listed below:

  • Attempting to execute privileged or invalid instructions
  • Unforeseen circumstances or poor code writing that results in the program executing an endless loop
  • Attempting to perform I/O operations on hardware devices to which it does not have permission to access
  • Passing invalid arguments to system calls
  • Attempting to access other system resources to which the application does not have permission to access

At this point, let’s digress a little bit and introduce a couple of quirky terms that we use to discuss “bugs”.

Heisenbug: The Heisenbug takes its name from the Heisenberg Uncertainty Principle.  A Heisenbug is a bug that disappears or alters its characteristics when it is observed.  The most common example of a Heisenbug is being unable to reproduce a problem when running a program in debug mode.  In debug mode, memory is often cleaned before the program starts.  Variables may be forced onto stack locations as opposed to being kept in registers.  Another reason that you may see a Heisenbug in debug mode is that debuggers commonly provide watches or other user interfaces that cause code (such as property accessors) to be executed, which in turn may alter the state of the program.

Bohrbug:  The Bohrbug takes its name from the Bohr Atomic Model.  A Bohrbug is a bug that manifests reliably under a well-defined (but possibly unknown) set of conditions.  Thus, in contrast with Heisenbugs, a Bohrbug does not disappear or alter its characteristics when it is researched.  These include the easiest bugs to fix (where the nature of the problem is obvious), but also bugs that are hard to find and fix and remain in the software during the operational phase.

Most of the application issues that we deal with are Bohrbugs, although we often encounter Heisenbugs when dealing with applications that exhibit Heap Corruption.  In some cases, enabling Pageheap on an application causes the problem to no longer occur.  OK, getting back to our original discussion, let’s take a look at a couple of common scenarios:

Scenario One: The Spooler Service is crashing on a print cluster that has been online “since forever” (yes, that’s actually how some administrators may describe their problem to us!) until today and no changes have been made.  From the administrator’s perspective nothing has changed in the environment.  By this, the administrator usually means that the drivers are still the same, and there have been no recent updates to the OS.  However, there are some variables to consider:

  • The problem may be caused by a specific driver which has an inherent bug with respect to the number of Print Devices using it.  The issue suddenly begins to manifest as the number of print devices and / or users has increased beyond a critical point
  • A bug related to an input data pattern may be invoked because a new applications elsewhere in the environment is passing data to the driver that it is unable to interpret
  • The Spool folder hasn’t been excluded from Real-time Antivirus Scanning (or was excluded previously but for some reason is no longer excluded).  A recent Pattern or Engine update may be causing corruption of spooled data
  • There may be an inherent bug in the printer driver that is related to size of the print job that it can accept
  • There may be a printer driver related to a Network Printer that does not handle network issues gracefully.  A network issue may be invoking some fault within the driver

As you can see, from the Print Server administrator’s perspective, nothing in fact has changed.  However, subtle changes in related system or external conditions are causing a problem.  With that, let’s take a look at our second scenario …

Scenario Two: The server is experiencing a hang.  It has been running fine since the day it was brought online, and all of a sudden the server is experiencing issues.  The last server maintenance was performed a couple of months ago, but beginning yesterday morning, the server keeps locking up.  So what’s going on?

In many enterprises, IT departments are somewhat autonomous.  A single server may have components that are managed by several different teams.  For example, Antivirus and Anti-Spyware software are managed by the Security team, the Storage team is responsible for the SAN environment, Host Bus Adapters (HBA’s) and related firmware.  Meanwhile, the Windows team is responsible for the Server Operating System, including the overall system configuration and performance.  With this type of division and ownership, it can become problematic for all the teams to stay in sync.  This is not an indictment of any of the teams, it is an unavoidable by-product of decentralization.  So what might be going on in this scenario?

  • The Security team may have pushed out a new Antivirus pattern update to equip the systems to defend itself from some high risk security threat in the wild.  This pattern update might have a bug related to high server workload.  This might manifest as memory depletion (Paged / NonPaged Pool depletion for example)
  • An Antivirus Pattern update was released which has a conflict with the OS component but surfaces only under certain conditions – for example, in a scenario where there is excessive realtime scanning being performed as the result of a large number of users who have their “My Music” folder redirected to their Network Drive
  • An update to the firmware and drivers on a SQL server was performed by the storage team.  The new Multipath I/O (MPIO) driver may have a bug which manifests when the I/O activity reaches a certain threshold.  Since the update, the server ran fine for almost a month.  However, at month-end processing there are now heavy SQL queries and reporting being performed.  This results in additional stress on the disk subsystem – resulting in the inherent bug to surface and affect the production environment
  • Although it may be rare, the problem could be caused by a hardware component that has developed a fault over time.  This problem results in “bit-flips", which might cause a fault in the driver based on the code logic of the driver.  The end result is a system hang or crash
  • The backup agent hasn’t been updated for quite some time.  However, there may be a bug related to pool memory leaks under certain circumstances (such as the size of backup data being pulled from the server).  Over time the utilization of the server has increased.  The bug surfaces under these conditions and causes exhaustion of pool memory – resulting in the server hang
  • Normal usage of the server may put the server beyond a critical point in terms of what the hardware and software is able to handle.  The most common example of this is a file or application server.  Over time, as a result of normal business growth the workload of the server may have reached the point where the operating system or application is simply unable to keep up.  At this point, it would be time to consider scaling the environment or server(s) to address the problem

Again, based on the scenario above, there are some fairly innocuous changes that, at the time of implementation, did not result in issues.  However, over time or under certain conditions, problems do surface – but, “Nothing changed in the environment” …

With that, it’s time to bring this post to a close.  Thanks for stopping by!  By the way, you can find more information on the quirky terms Heisenbug and Bohrbug as well as other similar terms on the Wikipedia page devoted to Unusual Software Bugs.

- Pushkar Prasad

Share this post :

EDIT (6/23): Added Wikipedia link to article

The Basics of Page Faults

Tue, 06/10/2008 - 11:00

In our last post, we talked about Pages and Page Tables.  Today, we’re going to take a look at one of the most common problems when dealing with virtual memory – the Page Fault.  A page fault occurs when a program requests an address on a page that is not in the current set of memory resident pages.  What happens when a page fault occurs is that the thread that experienced the page fault is put into a Wait state while the operating system finds the specific page on disk and restores it to physical memory.

When a thread attempts to reference a nonresident memory page, a hardware interrupt occurs that halts the executing program.  The instruction that referenced the page fails and generates an addressing exception that generates an interrupt.  There is an Interrupt Service Routine that gains control at this point and determines that the address is valid, but that the page is not resident.  The OS then locates a copy of the desired page on the page file, and copies the page from disk into a free page in RAM.  Once the copy has completed successfully, the OS allows the program thread to continue on.  One quick note here – if the program accesses an invalid memory location due to a logic error an addressing exception similar to a page fault occurs.  The same hardware interrupt is raised.  It is up to the Memory Manager’s Interrupt Service Routine that gets control to distinguish between the two situations.

It is also important to distinguish between hard page faults and soft page faults.  Hard page faults occur when the page is not located in physical memory or a memory-mapped file created by the process (the situation we discussed above).  The performance of applications will suffer when there is insufficient RAM and excessive hard page faults occur.  It is imperative that hard page faults are resolved in a timely fashion so that the process of resolving the fault does not unnecessarily delay the program’s execution.  On the other hand, a soft page fault occurs when the page is resident elsewhere in memory.  For example, the page may be in the working set of another process.  Soft page faults may also occur when the page is in a transitional state because it has been removed from the working sets of the processes that were using it, or it is resident as the result of a prefetch operation.

We also need to quickly discuss the role of the system file cache and cache faults.  The system file cache uses Virtual Memory Manager functions to manage application file data.  The system file cache maps open files into a portion of the system virtual address range and uses the process working set memory management mechanisms to keep the most active portions of current files resident in physical memory.  Cache faults are a type of page fault that occur when a program references a section of an open file that is not currently resident in physical memory.  Cache faults are resolved by reading the appropriate file data from disk, or in the case of a remotely stored file – accessing it across the network.  On many file servers, the system file cache is one of the leading consumers of virtual and physical memory.

Finally, when investigating page fault issues, it is important to understand whether the page faults are hard faults or soft faults.  The page fault counters in Performance Monitor do not distinguish between hard and soft faults, so you have to do a little bit of work to determine the number of hard faults.  To track paging, you should use the following counters: Memory\ Page Faults /sec, Memory\ Cache Faults /sec and Memory\ Page Reads /sec.  The first two counters track the working sets and the file system cache.  The Page Reads counter allows you to track hard page faults.  If you have a high rate of page faults combined with a high rate of page reads (which also show up in the Disk counters) then you may have an issue where you have insufficient RAM given the high rate of hard faults.

OK, that will do it for this post.  Until next time …

Additional Resources:

- CC Hameed

Share this post :

Pages and Page Tables – An Overview

Fri, 06/06/2008 - 11:00

Over the course of our posts on Memory Management and Architecture, we have made several references to Page Tables and Page Table Entries (PTE’s).  Today we’re going to dig into Pages and Page Tables.  If you are new to Memory Management, or need a quick refresher on the basics, I strongly recommend reviewing our Memory Management 101, Demystifying /3GB and x86 Virtual Address Space posts first.

When a program is first loaded, the logical memory address range of the application is divided into fixed size units called pages.  As each page is referenced by a program, it is mapped to a physical page that resides in physical memory.  The mapping is dynamic which ensures that logical addresses that are frequently referenced reside in physical memory.  Remember that each individual process that is launched is allocated its own virtual address space and application program threads are only permitted to directly access the virtual memory locations that are associated with their parent process’ address space.  This is where Page Tables come into play.

Page Tables are built for each process address space.  The Page Table maps logical virtual addresses for a process to physical memory locations.  The location for a set of Page Tables for a process is passed to the processor hardware during a context switch.  The processor refers to the Page Tables to perform virtual to physical address translation as the process threads are executed.  At this point, there are a few terms to become familiar with when dealing with Pages and Page Tables:

  • Working Set Pages:  The active pages of a process currently backed by RAM (also known as Resident Pages)
  • NonResident Pages:  Virtual memory addresses that are allocated, but not backed by RAM
  • Committed Pages: Pages that have Page Table Entries.  Committed Pages may be either resident or nonresident

As we mentioned above, Virtual Memory Manager ensures that logical addresses that are frequently referenced reside in physical memory.  It does so through the use of a Least Recently Used (LRU) page replacement policy.  The VMM also attempts to maintain a pool of free or available pages to ensure that page faults (which we will cover in our next post) are resolved rapidly.  When the virtual pages of active processes overflow the size of RAM, the Memory Manager tries to identify pages that are older or inactive that are candidates to be flushed from physical memory and stored on disk.  A copy of inactive virtual memory pages is held in the paging file.  The operating system checks to see if a page that it temporarily removed from the process working set has been modified since the last time that it was stored in the page file.  If the copy in the page file is current, there is no need to re-copy the contents to disk before removing them from physical memory.

All this seems fairly straightforward – and if the Memory Manager is successful in keeping the active pages of processes in RAM then the Memory Manager’s operations do not affect the user experience.  However, if there is insufficient physical memory to hold the active pages of running processes, then the system will exhibit performance degradation.

With that, we’re going to wrap up this post.  In our next post, we’ll discuss Page Faults.  Until next time …

- CC Hameed

Share this post :

Access Violation? How dare you ...

Tue, 06/03/2008 - 11:00

I am sure we have all seen access violations occur since we took ownership of our first x86 PC's.  The infamous "Bluescreen", application crashes, it doesn't really matter, access violations are all over the place.  For any of you that remember the good old Windows 9x days, a General Protection Fault and Invalid Page Fault are basically the same thing (and a segmentation fault too).  To many people, the phrase 'access violation' is synonymous with "crash". But what exactly is an access violation?

To put it simply, an access violation occurs any time an area of memory is accessed that the program doesn't have access to.  This can be due to bad code, faulty RAM or even a bad device driver. It really doesn't matter who the culprit is, the root issue is basically the same.  For instance, memory location zero is reserved for the operating system, so any application that tries to access this address will crash with an access violation.  The problem with this is that it is very easy to end up with a value of zero.  If you set a pointer and initialize the value to NULL (which is 0), then try to access it, you will crash in this fashion.  We call this a NULL Pointer and it is very common. The error you will receive should be similar to the following:

Unhandled exception at 0x00032b15 in Application.exe: 0xC0000005: Access violation reading location 0x00000000

This states that the program Application.exe, which was loading at the arbitrary address 0x00032b15, attempted to read address 0x00000000.  The code 0xC0000005 is the code for access violation, so expect to see this quite a bit.  In a memory or user dump, you may see if referred to as STATUS_ACCESS_VIOLATION.  This type of error can occur when either reading or writing, so it is pretty common.  Below is an example of how this may look in a bugcheck dump, by simply doing a "!analyze -v". In this case, it was due to a driver fault causing an access violation.

You will also get an access violation if a program triggers Data Execution Prevention (DEP).  This is a feature that uses both hardware and software to minimize the threat of malicious code like viruses.  How this works is that memory locations can be marked as being used either for executable code or for data.  Viruses commonly dump their payload into a data location and then execute it from there (like in a buffer overflow scenario).  This is exactly what DEP is designed to prevent.  If something tries to execute code from a data location, DEP will trigger an access violation to protect the system.  The reason this is important to us is that some applications do the same thing simply due to the application's programmer not quite following the rules.  For instance, if an application dynamically generates code, such as in a Just-In-Time scenario, and do not explicitly mark the code as executable, they will run into the Wall of DEP (OK, I couldn't resist the pun).

I hope this helps explain some of the common causes of access violations.  See you next time.

- Tim Newton

Share this post :

Two Minute Drill: Overview of SMB 2.0

Fri, 05/30/2008 - 11:00

The Server Message Protocol (SMB) is the file sharing protocol used by default on Windows-based computers.  Although file sharing and network protocols are primarily supported by our Networking team, it is important to understand how SMB works given its importance to network activities.  SMB 2.0 was introduced in Windows Vista and Windows Server 2008.  SMB 1.0 was designed for early Windows network operating systems such as Microsoft LAN Manager and Windows for Workgroups.  SMB 2.0 is designed for the needs of the next generation of file servers.  Both Windows Server 2008 and Windows Vista support SMB 1.0 and SMB 2.0.

There are several enhancements in SMB 2.0, including:

  • Sending multiple SMB commands in the same packet which reduces the number of packets sent between a client and server
  • Larger buffer sizes
  • Increased scalability, including an increase in the number of concurrent open file handles on the server and the number of shares that a server can share out
  • Support for Durable Handles that can withstand short network problems
  • Support of Symbolic Links

The version of SMB used for file sharing is determined during the SMB session negotiation.  If both the client and server support SMB 2.0, then SMB 2.0 is selected during the initial negotiation.  Otherwise SMB 1.0 preserving backwards compatibility.  The table below shows the version of SMB that will be used in different client / server scenarios:

Client Server SMB Version Windows Server 2008 / Vista Windows Server 2008 / Vista SMB 2.0 Windows Server 2008 / Vista Windows 2000, XP, 2003 SMB 1.0 Windows 2000, XP, 2003 Windows Server 2008 / Vista SMB 1.0 Windows 2000, XP, 2003 Windows 2000, XP, 2003 SMB 1.0

Both SMB 1.0 and 2.0 are enabled by default on Windows Vista and Windows Server 2008.  In some testing and troubleshooting scenarios it may be necessary to disable either SMB 1.0 or SMB 2.0.  However, it should be noted that this is not a recommended practice.  To disable SMB 1.0 for Windows Vista or Windows Server 2008 systems that are the “client” systems (accessing the network resources), run the following commands:

sc config lanmanworkstation depend= bowser/mrxsmb20/nsi sc config mrxsmb10 start= disabled

To disable SMB 1.0 on a Windows Vista or Windows Server 2008 system that is acting as the “server” system (hosting the network resources), a registry modification is required.  Navigate to the HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters key.  If there is no REG_DWORD value named Smb1, you will need to create it.  This value does not exist by default.  Once the value is created, set the value to 0 to disable SMB 1.0 or 1 to enable SMB 1.0.

Finally, to disable SMB 2.0 on Windows Vista or Windows Server 2008 systems that are acting as the “server”, navigate to the registry key listed above.  Instead of creating the Smb1 REG_DWORD value, you would create a REG_DWORD value called Smb2.  Set the value to 0 to disable SMB 2.0 and 1 to enable SMB 2.0.

And with that, we have reached the end of our Two Minute Drill on SMB 2.0.  Until next time …

- CC Hameed

Share this post :

Two Minute Drill: Find /3GB without using boot.ini

Tue, 05/27/2008 - 11:00

We've talked a lot about the /3GB switch and its effect on system resources in previous posts.  Today we are going to discuss how to determine whether or not /3GB is enabled on a 32-bit system without looking at the boot.ini file or using MSCONFIG.EXE.  Finding out this information is not as difficult as you would think – there are actually multiple ways to find this information.  We are going to find this information in three different ways – by looking in the registry, by using PSTAT.EXE and by looking at a Memory Dump File.  So, without further delay, let’s look at the simplest of the three methods – finding the information in the registry.

To find the information in the Registry, all you have to do is look in the HKLM\SYSTEM\CurrentControlSet\Control key, and examine the SystemStartOptions value.  Below is the value from a Windows XP system that I have configured with /3GB.

 

As you can see, the ‘/’ character is removed from the string in the Registry, but the options themselves are determined easily enough.  With this in mind, here’s a quick tip for Systems Administrators who might need to find this information for multiple systems – use a simple script or batch file to query this value in the registry on all your machines and write the output to a text file.  Remember that you will need to be able to access the registry remotely for this to work!

Let’s now take a look at the second method of finding out if /3GB is enabled – by using PSTAT.EXE.  PSTAT.EXE is part of the Resource Kit Utilities for Windows 2000 and can be downloaded from the Microsoft web site.  Run PSTAT.EXE and redirect the output to a text file:

When you examine the output file, search for HAL.DLL (the Hardware Abstraction Layer DLL.  Below is the output from my Windows XP SP3 system:

ModuleName Load Addr ------------------------ hal.dll E0B82000

The key piece of information here is the Address at which the module is loaded.  In our post on the x86 Virtual Address Space we noted that the System Space (Kernel Mode) memory range on a 32-bit system ranged from 0x80000000 to 0xFFFFFFFF on a system without /3GB and 0xC0000000 to 0xFFFFFFFF on a system with /3GB enabled.

Memory Address ranges without /3GB Memory Address ranges with /3GB

As you can see from the diagram above, the Kernel and Executive, HAL and Boot Drivers load between Addresses 0x80000000 and 0xBFFFFFFF on a system that does not have /3GB configured.  So, looking at the address where HAL.DLL is loaded, we can see that the module is loaded at Address 0xE0B82000.  Since this address is outside of the range where the module would load if the system was not configured with /3GB we can deduce that /3GB is configured on this system.

Finally, let’s look at determining whether or not /3GB is in use by examining a memory lmdump file.  I generated a manual dump on my XP Machine with and without /3GB enabled.  Let’s first take a look at the dump with /3GB enabled.  Believe it or not, you really don’t have to do any work to determine if /3GB is enabled beyond loading up your memory dump file into the debugger!  Below is the output from the debugger when I opened the dump file:

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\WINDOWS\3GBMEMORY.DMP] Kernel Complete Dump File: Full address space is available Symbol search path is: SRV*C:\SYMBOLS*http://msdl.microsoft.com/downloads/symbols Executable search path is: Windows XP Kernel Version 2600 (Service Pack 3) MP (2 procs) Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Built by: 2600.xpsp.080413-2111 Kernel base = 0xe0ba3000 PsLoadedModuleList = 0xe0c29720 Debug session time: Thu May 15 09:33:21.044 2008 (GMT-5) System Uptime: 1 days 2:14:13.500

The important piece of information here is the Kernel base.  As you can see, the address is 0xE0BA3000 (the text in red above).  Remember that if /3GB is not configured, the Kernel loads between 0x80000000 and 0xBFFFFFFF – since we are loading at 0xE0BA3000, we can deduce that /3GB is configured.  Before we wrap up, let’s take a look at a dump from the same machine when /3GB is not configured.

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\WINDOWS\NO3GBMEMORY.DMP] Kernel Complete Dump File: Full address space is available Symbol search path is: SRV*C:\SYMBOLS*http://msdl.microsoft.com/downloads/symbols Executable search path is: Windows XP Kernel Version 2600 (Service Pack 3) MP (2 procs) Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Built by: 2600.xpsp.080413-2111 Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055d720 Debug session time: Thu May 15 12:58:35.741 2008 (GMT-5) System Uptime: 0 days 1:54:45.750

As we can see in this output, the Kernel Base is at 0x804D7000 – inside the range for the Kernel on a system without /3GB.

So there you have it – three different ways to find out whether or not a system is configured with the /3GB switch using different tools.  That brings us to the end of this Two Minute Drill.  Until next time …

- CC Hameed

Share this post :

DST: Upcoming Changes for Morocco and Pakistan

Fri, 05/23/2008 - 11:00

On June 1, 2008 there will be two new Daylight Saving Time changes that go into effect.  Pakistan and Morocco plan to introduce Daylight Saving time.  The governments of the two nations recently announced the change as part of their energy savings plans.  Although the changes go into effect on the same day in both countries, please note that they will have different end dates, as outlined below:

Pakistan:  DST begins on June 1st.  The clocks will move forward from 12:00:59 AM to 1:01:00 AM.  The UTC Offset will change from +5 hours to +6 hours for Pakistan.  DST will end at 12:00:59 AM on Sunday, August 31.  At this time, the clocks will roll back to 11:01:00 PM on Saturday, August 30.  The UTC Offset will change from +6 hours to +5 hours for Pakistan.

Morocco:  DST will begin on Saturday, May 31 at 11:59:59 PM when the clocks will move forward to 1:00:00 AM on Sunday, June 1.  This will result in the UTC Offset for Morocco changing from 0 to +1 hour.  DST in Morocco ends on Saturday, September 27, at 11:59:59 PM.  At this time, the clocks will roll back to 11:00:00 PM on Saturday, September 27, and the UTC offset will change from +1 hour to 0.

From Microsoft’s perspective, due to the short notice provided for these changes, Windows will not be creating one-off hotfix packages to accommodate the changes.  The plan is to include these updates in the next Windows DST cumulative package, which is scheduled for release later in the summer.  To provide a more immediate solution, we will be updating Microsoft KB Article 914387 with the changes for Morocco and Pakistan.  The changes for this article should be in place by the end of this week.  Additional information will also be available on the Daylight Saving Time Help and Support Center.

OK, that’s it for this post – please remember to test any changes in an isolated environment before implementing them in your live production environment.  Until next time …

- CC Hameed

Share this post :

Two Minute Drill: RELOG.EXE

Tue, 05/20/2008 - 11:00

Following on from our last Two Minute Drill, today's topic is the RELOG.EXE utility.  RELOG.EXE creates new performance logs from data in existing performance logs by changing the sampling rate and / or converting the file format.  RELOG.EXE is not a new tool - it is however one of those tools that most administrators are not aware of.  Although RELOG.EXE is a fairly simple tool, it is incredibly powerful.  Let's look at the built-in help file for RELOG.EXE:

RELOG <filename [filename ...]> [options]

Parameters:
  <filename [filename ...]>     Performance file to relog.

Option Description -? Display context sensitive help -a Append output to the existing binary file -c <path> Counters to filter from the input log -cf <filename> File listing performance counters from the input log.  The default is all counters in the original log file -f <CSV | TSV | BIN | SQL> Output file format -t <value> Only write every nth record into the output file -o Output file path or SQL database -b <M/d/yyyy h:mm:ss [AM | PM> Begin time for the first record to write into the output file -e <M/d/yyyy h:mm:ss [AM | PM> End time for the last record to write into the output file -config <filename> Settings file containing command options -q List performance counters in the input file -y Answer yes to all questions without prompting

Now, let's look at some common scenarios:

Scenario 1: Converting an existing Performance Monitor Log

Although most administrators are comfortable using the .BLG file format and reviewing Performance data within the Performance Monitor tool, there are some advantages to reviewing the data in a different format such as a Comma-Separated Value file (.CSV).  The process to convert a .BLG to .CSV is straightforward using RELOG.EXE: relog logfile.blg -f csv -o logfile.csv

Scenario 2: Filtering a Performance Monitor Log by Performance Counter

In our last Two Minute Drill we showed you how to capture a baseline performance monitor log.  We also provided a couple of sample commands that we use in our troubleshooting to capture performance data.  However, once we get those performance logs, filtering through them can sometimes be very time consuming - especially in instances where the system is extremely active.  Oftentimes, it is useful to have both the raw data as well as a filtered subset that only shows a couple of counters.  Using RELOG.EXE we can do just that - in this example, we are going to separate out just the Private Bytes counter for all Processes: relog originalfile.blg-c "\Process(*)\Private Bytes" -o filteredfile.blg

Scenario 3: Filtering a Performance Monitor Log by Time

The last scenario we are going to look at is extracting a subset of performance data from a Performance Monitor log based on time.  This is especially useful when you have a large data sample where there are multiple instances of an issue that occurred during the time that the performance data was captured.  Using RELOG.EXE with the -b and -e options we can pull out a subset of this data and write it to a separate file - I am going to use a sample of the baseline file I created earlier: RELOG.EXE baseline.log.blg -b "5/6/2008 8:00:00 AM" -e "5/6/2008 8:34:00 AM" -o filteredcapture.blg.

As you can see there are fewer samples in the filteredcapture.blg file.  This particular type of filtering is extremely useful when you want to send a subset of performance data to other systems administrators (or even Microsoft Support!)

And that's it for our post on RELOG.EXE.  Until next time ...

- CC Hameed

Share this post :

Troubleshooting Server Hangs – Part Four

Fri, 05/16/2008 - 11:00

Welcome to Part Four of our Server Hang troubleshooting series.  Today we are going to discuss PTE depletion and Low Physical Memory conditions and how those two issues can lead to server hangs.  In our post on the /3GB switch we mentioned that in general, a system should always have around 10,000 free System PTE’s.  Although we normally see PTE depletion issues on systems using the /3GB switch, that does not necessarily mean that using the /3GB switch is going to cause issues – what we said was that the /3GB switch is intended to be used in very specific instances.  Tuning the memory further by using the USERVA switch in conjunction with the /3GB switch can often stave off PTE depletion issues.  The problem with PTE depletion is that there are no entries logged in the Event Viewer that indicate that there is a resource issue.  This is where using Performance Monitor to determine whether a system is experiencing PTE depletion comes into play.  However, Performance Monitor may not identify why PTE’s are being depleted.  In instances where a process has a continually rising handle count that mirrors the rate of PTE depletion, it is fairly straightforward to identify the culprit.  However, more often than not we have to turn to a complete dump file to analyze the problem.

Below is what we might see in a dump file in a scenario where we have PTE depletion when we use the !vm command to get an overview of Virtual Memory Usage:

*** Virtual Memory Usage *** Physical Memory: 2072331 ( 8289324 Kb) Page File: \?? \C: \pagefile.sys Current: 2095104Kb Free Space: 2073360Kb Minimum: 2095104Kb Maximum: 4190208Kb Available Pages: 1635635 ( 6542540 Kb) ResAvail Pages: 1641633 ( 6566532 Kb) Locked IO Pages: 2840 ( 11360 Kb) Free System PTEs: 1097 ( 4388 Kb) ******* 1143093 system PTE allocations have failed ****** Free NP PTEs: 14833 ( 59332 Kb) Free Special NP: 0 ( 0 Kb) Modified Pages: 328 ( 1312 Kb) Modified PF Pages: 328 ( 1312 Kb) NonPagedPool Usage: 11407 ( 45628 Kb) NonPagedPool Max: 32767 ( 131068 Kb) PagedPool 0 Usage: 11733 ( 46932 Kb) PagedPool 1 Usage: 855 ( 3420 Kb) PagedPool 2 Usage: 862 ( 3448 Kb) PagedPool 3 Usage: 868 ( 3472 Kb) PagedPool 4 Usage: 849 ( 3396 Kb) PagedPool Usage: 15167 ( 60668 Kb) PagedPool Maximum: 40960 ( 163840 Kb) Shared Commit: 3128 ( 12512 Kb) Special Pool: 0 ( 0 Kb) Shared Process: 25976 ( 103904 Kb) PagedPool Commit: 15197 ( 60788 Kb) Driver Commit: 1427 ( 5708 Kb) Committed pages: 432175 ( 1728700 Kb) Commit limit: 2562551 (10250204 Kb)

In this particular instance we can clearly see that we have a low PTE condition.  In looking at the Virtual Memory Usage summary, we can see that the server is most likely using the /3GB switch, since the NonPaged Pool Maximum is only 130MB.  In this scenario we would want to investigate using the USERVA switch to fine tune the memory and recover some more PTE’s,  If USERVA is already in place and set to 2800, then it is time to think about scaling the environment to spread the server load.  For more granular troubleshooting, where we suspect a PTE leak that we cannot explain using Performance Monitor data, we can modify the registry to enable us to track down the PTE leak.  The registry value that we need to add to the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management key is as follows:

Value Name: TrackPtes
Value Type: REG_DWORD
Value Data: 1
Radix: Hex

Once we implement this registry modification we need to reboot the system to enable the PTE Tracking.  Once PTE Tracking is in place, we would need to capture a new memory dump the next time the issue occurs and analyze that dump to identify the cause of the leak.

To wrap up our post, we are going to take a quick look at a dump file of a server that is experiencing a low physical memory condition.  Below is the output of the !vm command (with a couple of comments that we’ve added in)

3: kd> !vm *** Virtual Memory Usage *** Physical Memory: 851843 ( 3407372 Kb) <----- Server has 3.4 GB physical RAM Page File: \??\C:\pagefile.sys Current: 3072000Kb Free Space: 2377472Kb Minimum: 3072000Kb Maximum: 3072000Kb Page File: \??\D:\pagefile.sys Current: 4193280Kb Free Space: 3502716Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\E:\pagefile.sys Current: 4193280Kb Free Space: 3506192Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\F:\pagefile.sys Current: 4193280Kb Free Space: 3454596Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\G:\pagefile.sys Current: 4193280Kb Free Space: 3459764Kb Minimum: 4193280Kb Maximum: 4193280Kb Available Pages: 1198 ( 4792 Kb) <-------- Almost no free physical memory ResAvail Pages: 795226 ( 3180904 Kb) Modified Pages: 787 ( 3148 Kb) NonPagedPool Usage: 6211 ( 24844 Kb) NonPagedPool Max: 37761 ( 151044 Kb) PagedPool 0 Usage: 11824 ( 47296 Kb) PagedPool 1 Usage: 895 ( 3580 Kb) PagedPool 2 Usage: 881 ( 3524 Kb) PagedPool 3 Usage: 916 ( 3664 Kb) PagedPool 4 Usage: 886 ( 3544 Kb) PagedPool Usage: 15402 ( 61608 Kb) PagedPool Maximum: 65536 ( 262144 Kb) Shared Commit: 771713 ( 3086852 Kb) Special Pool: 0 ( 0 Kb) Free System PTEs: 7214 ( 28856 Kb) Shared Process: 7200 ( 28800 Kb) PagedPool Commit: 15402 ( 61608 Kb) Driver Commit: 1140 ( 4560 Kb) Committed pages: 2161007 ( 8644028 Kb) <------ Total committed pages is 8.6GB.  This amount is far larger than physical RAM, paging will be high. Commit limit: 5777995 (23111980 Kb)

 

 

 

Total Private: 1363369 ( 5453476 Kb)

In this particular instance, the server simply did not have enough memory to keep up with the demands of the processes and the OS.  Paged and NonPaged Pool resources are not experiencing any issues.  The number of available PTE’s is somewhat lower than our target of 10,000.  However, if you recall from our earlier posts, if a server is under load, the number of Free PTE’s may drop below 10,000 temporarily.  In this case, as a result of the low memory condition on this server there were several threads in a WAIT state – which caused the server to hang. The solution for this particular issue was to add more physical memory to the server to ease the low physical memory condition.

And with that, we come to the end of this post.  Hopefully you’ve found the information in our last few posts useful.

- Sakthi Ganesh

Share this post :

Two Minute Drill: LOGMAN.EXE

Tue, 05/13/2008 - 11:00

Today we are continuing on with our Two Minute Drill series.  Our topic in this post is one that we discuss quite frequently with customers - namely the automation of creating Performance Monitor and Trace Logs.  Most administrators are comfortable creating Local and Remote Performance Monitor logs using the Performance Monitor MMC and the GUI tools.  However, there are some extremely powerful command line utilities that can be used to configure and capture Performance data.  Today we will be discussing the LOGMAN.ExE utility.  So without further ado ...

The LOGMAN.EXE utility can be used to create and manage Event Trace Session and Performance logs.  Many functions of Performance Monitor are supported and can be invoked using this command line utility.  Before we look at some examples of how to configure Performance logs using this utility, let's quickly cover some of the syntax.  Running LOGMAN /? from a command prompt brings up the first level of context sensitive help:

Basic Usage:  LOGMAN [create | query | start | stop | delete | update | import | export] [options].  The verbs specified determine what actions are being performed:

Verb Name Description CREATE Create a new data collector QUERY Query data collector properties.  All data collectors are listed if no specific name is provided START Start an existing data collector STOP Stop and existing data collector DELETE Delete an existing data collector UPDATE Update the properties of an existing data collector IMPORT Import a data collector set from an XML file EXPORT Export a data collector set to an XML file

Running LOGMAN <verb> /? brings up context sensitive help for the verb specified.  There are also some options to be aware of:

Option Description -? Display context sensitive help -s <computer> Perform the command on the specified remote system -ets Send the command directly to an Event Tracing Session without saving or scheduling

So now that we have our basic commands, let's take a look at how we can use LOGMAN.EXE for one of our most common scenarios - capturing baseline Performance data for a system.  We've discussed the importance of capturing baseline server performance data in several previous posts.  In our example, we are going to capture a binary circular performance monitor log that has a maximum size of 500MB.  The reason we are going to use a binary circular log is that we can record the data continuously to the same log file, overwriting previous records with new data once the log file reaches its maximum size.  Since this will be a baseline performance log that will be constantly running, we want to ensure that we can capture a significant data sample, and not have the log file being overwritten in such a short timeframe that useful data is lost.  Put another way, we want to set our capture interval up so that we do not overwrite our data too quickly.  For the purposes of this example, we'll set up our log to capture data every two hours.  We want to save our data to a log file, so we will need to specify a log file location.  Given that we want to capture baseline data, there is a good possibility that we want to use the same settings on multiple servers so we'll need to ensure that we can repeat this process with a minimum of administrative fuss ...

So, to recap, we are going to capture our baseline performance log that is:

  • a binary circular log that will be a maximum of 500MB in size
  • configured with a capture interval of two hours
  • saved to a file location
  • configured with standard counters so that we can capture consistent baseline data across multiple servers if needed

The one piece of this equation that we have not specified is which counters we need to capture.  One of the key reasons to use LOGMAN.EXE is that we can specify which counters we want to capture in a standard configuration file and then use that configuration across to configure our capture for multiple servers.  Creating the configuration file is fairly simple - we are going to create a .CONFIG file that enumerates the counters that we want to capture, one per line.  An example is shown below:

"\Memory\Available MBytes" "\Memory\Pool Nonpaged Bytes" "\Memory\Pool Paged Bytes" "\PhysicalDisk(*)\Current Disk Queue Length" "\PhysicalDisk(*)\Disk Reads/sec" "\PhysicalDisk(*)\Disk Read Bytes/sec" "\PhysicalDisk(*)\Disk Writes/sec" "\PhysicalDisk(*)\Disk Write Bytes/sec" "\Process(*)\% Processor Time" "\Process(*)\Private Bytes" "\Process(*)\Virtual Bytes"

These are some fairly standard Performance Counters.  Let's save this file as Baseline.config on a folder on one of our file servers.  Now we have all of the pieces that we need to configure and capture our baseline.

logman create counter BASELINE -f bincirc -max 500 -si 2 --v -o "e:\perflogs\SERVERBASELINE" –cf "\\<FILESERVER>\Baseline\Baseline.config"

Let's quickly examine the different elements of this command:
  • logman create counter BASELINE: This creates the BASELINE Data Collector on the local machine
  • -f bincirc -max 500 -si 2: This piece of the command specifies that we are creating a Binary Circular file, sets the Maximum Log file size to 500MB, sets the Capture Interval at 2 hours
  • --v -o "e:\perflogs\SERVERBASELINE": In this part of the command, we turn off the versioning information, and set the Output Location and Filename.  The Performance Monitor log will be created with a .BLG extension
  • –cf \\<FILESERVER>\Baseline\Baseline.config: Finally, we point the LOGMAN utility to the location of our standard counter configuration file

Once we run this command, we can run LOGMAN.EXE and use the QUERY verb to ensure that our Data Collector has been created:

The last thing we need to do is start our Data Collector set.  There are a couple of options here - the first is to run LOGMAN.EXE START BASELINE from the command line.  This will launch the Data Collector.  However, when we reboot our system, the Data Collector will not run.  If you create a startup script to run the command above to start the Data Collector set, then you can capture your performance data from the time that the server starts.

Before we wrap up our post, here is another common scenario.  You can create a Data Collector set on a full installation of Windows Server 2008 or Windows Vista.  Then export that Data Collector Set configuration to an XML Template.  You can then use the LOGMAN.EXE command with the IMPORT verb to import that Data Collector set configuration on a Windows Server 2008 Server Core system, then use the LOGMAN.EXE command with the START verb to start the Data Collector Set.  The commands are below:

  • LOGMAN IMPORT -n <Data Collector Set Name> -xml <XML template that you exported>:  This will create the Data Collector Set named whatever name you choose when passing the -n parameter
  • LOGMAN START <Data Collector Set Name>: Start the Data Collectiion process.

Finally, here are two more sample commands where we use LOGMAN.EXE for gathering Performance Monitor data for troubleshooting:

High CPU Issue

logman.exe create counter High-CPU-Perf-Log -f bincirc -v mmddhhmm -max 250 -c "\LogicalDisk(*)\*" "\Memory\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Process(*)\*" "\Redirector\*" "\Server\*" "\System\*" "\Thread(*)\*"   -si 00:00:05

In this example, we have a capture interval of five seconds, with a Maximum Log size of 250MB.  The Performance Counters that we are capturing are fairly generic.

Generic Performance Monitor Logging

logman.exe create counter Perf-Counter-Log -f bincirc -v mmddhhmm -max 250 -c "\LogicalDisk(*)\*" "\Memory\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Process(*)\*" "\Redirector\*" "\Server\*" "\System\*"  -si 00:05:00

In this example, we are using a five minute capture interval - the rest of the parameters are fairly straightforward.  Remember that in both of these cases, you will need to use LOGMAN.EXE with the START verb and specifying the name of the Data Collector Set to begin the capture.  These samples work on all Windows Operating Systems from Windows XP onwards.

And with that, we come to the end of this Two Minute drill.  Until next time ...

- CC Hameed

Share this post :

Troubleshooting Server Hangs – Part Three

Fri, 05/09/2008 - 11:00

In our last post on Server Hangs, we discussed using the Debugging Tools to examine a dump file to analyze pool depletion.  Today we are going to look at using our troubleshooting tools to examine a server hang caused by a handle leak.  Issues where there are an abnormal number of handles for a process are very common and result in kernel memory depletion.  A quick way to find the number of handles for each process by checking the Task Manager > Processes.  You may have to add the handles column from View > Select columns.  Generally if a process has more than 10,000 then we probably want to take a look at what is going on.  That does not necessarily mean that it is the offending process, just a suspect.  However, there are instances where the process may be for a database or some other memory intensive application.  The most common instance of this is the STORE.EXE process for Exchange Server which routinely has well over 10,000 handles.  On the other hand if our Print Spooler process has 10,000 (or more) handles then we most likely have an issue.

Once we know there is a handle leak in a particular process, we can dump out all the handles and figure out why it is leaking.  If we want to find out from a dump if there is a process that has an abnormally large number of handles, we first have to list out all the processes and then examine the number of handles being used by the processes.  To list out all the processes that are running on the box using the Debugging Tools, we use the !process 0 0 command.  This will give us an output similar to what we see below:

0: kd> !process 0 0 **** NT ACTIVE PROCESS DUMP **** PROCESS 8a5295f0 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000 DirBase: 0acc0020 ObjectTable: e1002e68 HandleCount: 1056. Image: System PROCESS 897e6c00 SessionId: none Cid: 04fc Peb: 7ffd4000 ParentCid: 0004 DirBase: 0acc0040 ObjectTable: e1648628 HandleCount: 21. Image: smss.exe PROCESS 89a26da0 SessionId: 0 Cid: 052c Peb: 7ffdf000 ParentCid: 04fc DirBase: 0acc0060 ObjectTable: e37a7f68 HandleCount: 691. Image: csrss.exe PROCESS 890f0da0 SessionId: 0 Cid: 0548 Peb: 7ffde000 ParentCid: 04fc DirBase: 0acc0080 ObjectTable: e1551138 HandleCount: 986. Image: winlogon.exe PROCESS 89a345a0 SessionId: 0 Cid: 0574 Peb: 7ffd9000 ParentCid: 0548 DirBase: 0acc00a0 ObjectTable: e11d8258 HandleCount: 396. Image: services.exe

The important piece of information here is the HandleCount.  For the purposes of this post, let’s assume that there is a problem with SMSS.EXE and that there is an unusually high HandleCount.  To view all of the handles for the process, the first thing we need to do is switch to the context of the process and then dump out all of the handles as shown below.  The relevant commands are:

  • .process –p –r <processaddress> – this switches us to the context of the process
  • !handle – this dumps out all of the handles
0: kd> .process –p –r 897e6c00 Implicit process is now 897e6c00 0: kd> !handle processor number 0, process 897e6c00 PROCESS 897e6c00 SessionId: none Cid: 04fc Peb: 7ffd4000 ParentCid: 0004 DirBase: 0acc0040 ObjectTable: e1648628 HandleCount: 21. Image: smss.exe Handle table at e1674000 with 21 Entries in use 0004: Object: e1009568 GrantedAccess: 000f0003 Entry: e1674008 Object: e1009568 Type: (8a5258b8) KeyedEvent ObjectHeader: e1009550 (old version) HandleCount: 53 PointerCount: 54 Directory Object: e10030a8 Name: CritSecOutOfMemoryEvent 0008: Object: 8910b370 GrantedAccess: 00100020 (Inherit) Entry: e1674010 Object: 8910b370 Type: (8a54c730) File ObjectHeader: 8910b358 (old version) HandleCount: 1 PointerCount: 1 Directory Object: 00000000 Name: \WINDOWS {HarddiskVolume1} 000c: Object: e1af9828 GrantedAccess: 001f0001 Entry: e1674018 Object: e1af9828 Type: (8a512ae0) Port ObjectHeader: e1af9810 (old version) HandleCount: 1 PointerCount: 12 Directory Object: e1002388 Name: SmApiPort

At this point we can continue to dig into the handles to determine if there is something amiss.  More often than not, this would be an issue for which systems administrators would be contacting Microsoft Support.  However, by using this method you can quickly determine whether the problem lies with a third-party component and engage that vendor directly.  Being able to provide them with a dump file that shows that their component is consuming an excessive number of handles can assist them in providing you with a quicker resolution.

That’s it for today.  In our next post on Server Hangs, we’ll look at how a lack of Available System PTE’s can cause server hangs.

- Sakthi Ganesh

Share this post :

Troubleshooting Server Hangs – Part Two

Tue, 05/06/2008 - 11:00

Several months ago, we wrote a post on Troubleshooting Server Hangs.  At the end of that post, we provided some basic steps to follow with respect to server hangs.  The last step in the list was following the steps in KB Article 244139 to prepare the system to capture a complete memory dump for analysis.  Now that you have the memory dump, what exactly are you supposed to do with it?  That will be the topic of today’s post – more specifically, dealing with server hangs due to resource depletion.  We discussed various aspects of resource depletion including Paged and NonPaged pool depletion and System PTE’s.  Today we’re going to look at Pool Resource depletion, and how to use the Debugging Tools to troubleshoot the issue.

If the server is experiencing Non paged pool (NPP) memory leak or a Paged pool (PP) memory leak you are most likely to see the following event id’s respectively in the System Event log:

Type: Error Date: <date> Time: <time> Event ID: 2019 Source: Srv User: N/A Computer: <ComputerName> Details: The server was unable to allocate from the system nonpaged pool because the pool was empty. Type: Error Date: <date> Time: <time> Event ID: 2020 Source: Srv User: N/A Computer: <ComputerName> Details: The server was unable to allocate from the system Paged pool because the pool was empty

Let’s load up our memory dump file in the Windows Debugging tool (WINDBG.EXE).  If you have never set up the Debugging Tools and configured the symbols, you can find instructions on the Debugging Tools for Windows Overview page.  Once we have our dump file loaded type !vm in the prompt to display the Virtual Memory Usage for the system.  The output will be similar to what is below:

kd> !vm *** Virtual Memory Usage *** Physical Memory: 917085 ( 3668340 Kb) Page File: \??\C:\pagefile.sys Current: 4193280 Kb Free Space: 4174504 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Page File: \??\D:\pagefile.sys Current: 4193280 Kb Free Space: 4168192 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Available Pages: 777529 ( 3110116 Kb) ResAvail Pages: 864727 ( 3458908 Kb) Locked IO Pages: 237 ( 948 Kb) Free System PTEs: 17450 ( 69800 Kb) Free NP PTEs: 952 ( 3808 Kb) Free Special NP: 0 ( 0 Kb) Modified Pages: 90 ( 360 Kb) Modified PF Pages: 81 ( 324 Kb) NonPagedPool Usage: 30294 ( 121176 Kb) NonPagedPool Max: 32640 ( 130560 Kb)

********** Excessive NonPaged Pool Usage *****

PagedPool 0 Usage: 4960 ( 19840 Kb) PagedPool 1 Usage: 642 ( 2568 Kb) PagedPool 2 Usage: 646 ( 2584 Kb) PagedPool 3 Usage: 648 ( 2592 Kb) PagedPool 4 Usage: 653 ( 2612 Kb) PagedPool Usage: 7549 ( 30196 Kb) PagedPool Maximum: 62464 ( 249856 Kb) Shared Commit: 3140 ( 12560 Kb) Special Pool: 0 ( 0 Kb) Shared Process: 5468 ( 21872 Kb) PagedPool Commit: 7551 ( 30204 Kb) Driver Commit: 1766 ( 7064 Kb) Committed pages: 124039 ( 496156 Kb) Commit limit: 2978421 ( 11913684 Kb)

As you can see, this command provides details about the usage of Paged and NonPaged Pool Memory, Free System PTE’s and Available Physical Memory.  As we can see from the output above, this system is suffering from excessive NonPaged Pool usage.  There is a maximum of 128MB of NonPaged Pool available and 121MB of this NonPaged Pool is in use:

NonPagedPool Usage: 30294 ( 121176 Kb) NonPagedPool Max: 32640 ( 130560 Kb)

Our next step is to determine what is consuming the NonPaged Pool.  Within the debugger, there is a very useful command called !poolused.  We use this command to find the Pool Tag that is consuming our NonPaged Pool.  The !poolused 2 command will list out NonPaged Pool consumption, and !poolused 4 lists the Paged Pool consumption.  A quick note here; the output from the !poolused commands could be very lengthy as they will list all of the tags in use.  To limit the display to the Top 10 consumers, we can use the /t10 switch:  !poolused /t10 2.

0: kd> !poolused 2 Sorting by NonPaged Pool Consumed Pool Used: NonPaged Paged Tag Allocs Used Allocs Used R100 3 9437184 15 695744 UNKNOWN pooltag 'R100', please update pooltag.txt MmCm 34 3068448 0 0 Calls made to MmAllocateContiguousMemory , Binary: nt!mm LSwi 1 2584576 0 0 initial work context TCPt 28 1456464 0 0 TCP/IP network protocol , Binary: TCP File 7990 1222608 0 0 File objects Pool 3 1134592 0 0 Pool tables, etc. Thre 1460 911040 0 0 Thread objects , Binary: nt!ps Devi 337 656352 0 0 Device objects Even 12505 606096 0 0 Event objects naFF 300 511720 0 0 UNKNOWN pooltag 'naFF', please update pooltag.txt

Once the tag is identified we can use the steps that we outlined in our previous post, An Introduction to Pool Tags to identify which driver is using that tag.  If the driver is out of date, then we can update it.  However, there may be some instances where we have the latest version of the driver, and we will need to engage the software vendor directly for additional assistance.

That brings us to the end on this post – in Part Three, we will discuss using Task Manager and the Debugging Tools to troubleshoot Handle Leaks which may be causing Server Hangs.

- Sakthi Ganesh

Share this post :

Internet Explorer 8 - First Look

Fri, 03/14/2008 - 11:00

Last week, at the MIX conference in Las Vegas, the Internet Explorer team made several announcements regarding IE8, the first of which was that a Developer Beta (emphasis on the Developer) is now available.  You can download the beta from the IE8 Beta Site.  The beta is available today for Windows Vista (“Gold” and SP1), Windows Server 2008, Windows Server 2003 SP2, and Windows XP SP2 and SP3, both in 32- and 64-bit versions.  We will release the developer beta in German, and Simplified Chinese shortly.

There were seven other developer-oriented areas of discussion that were covered at MIX by the IE Team.  For those of you not familiar with the MIX conference, MIX is an opportunity for technical, creative and business strategists to engage Microsoft in a conversation about the future of the web.  You can find out more about MIX '08 by clicking on the MIX logo on the right, including viewing the MIX sessions and keynotes.  So what were the seven developer-focused areas?

  1. Our goal is to deliver complete, full CSS 2.1 support in the final IE8 product
  2. Microsoft has contributed over 700 test cases to the W3C CSS working group
  3. Delivery of better scripting performance
  4. Support for HTML5
  5. Delivery of the first installment of built-in developer tools
  6. A better way for Web Services to integrate into the user's workflow
  7. A better way for Web Services to enable their users to keep an eye on interesting parts of a web page within the browser with "WebSlices"

The items above do not represent everything that will be in the final product by any means.  The folks over at the IE Blog are going to be keeping us all up to date with what is going on the IE8 world.  However, here are some quick tidbits:

Internet Explorer 8 and the ACID2 test: IE8 Beta 1 passes the official ACID2 test.  However, there are a number of copies of this test posted at various Internet locations and IE8 is failing the test at the copy sites due to the cross domain security checks performed for ActiveX controls

Activities and WebSlices in Internet Explorer 8: There are two new features in IE8, Activities and WebSlices.  With Activities you can access your services from any web page.  For example, let's say I want to map the address for Microsoft.  I can highlight the address from the "Contact Us" page on the Microsoft.com website (http://support.microsoft.com/contactus/?WS=mscorp) and select the option to Map with Live Maps (as shown below) which will open up a new tab and map the address selected.

So what are WebSlices?  WebSlices allow you to subscribe to a portion of a web page to get updates and view the changes without having to go back to the site.  If a web site supports WebSlice, you will see a new icon in the IE Command Bar:

Clicking on the button adds the WebSlice to the Favorites bar.  IE then checks for updates on a schedule.  When IE finds an update, the item on the Favorites bar bolds.  You can click on the item to view the details.  eBay has an IE8 site up and running (http://ie8.ebay.com), and you can also try out WebSlices on StumbleUpon and Facebook.

We also mentioned improved scripting - the folks over at the JScript Blog have written a post regarding this.  There's a lot more information regarding the IE8 Developer Beta - check out the following posts over at the IE Blog:

As you can see, there are lots of new features and some very cool functionality in IE8!  Until next time ...

- CC Hameed

Share this post :

Disk Fragmentation and System Performance

Fri, 03/14/2008 - 11:00

When addressing system performance issues, a key element that is often overlooked is Disk Fragmentation.  Even on a brand new system with plenty of RAM and high-end processors, the performance of the hard disk may be a bottleneck causing system performance issues.  It takes time to load large data files into memory - issues become particularly noticeable when dealing with movies, video clips, database files or .ISO image files which may easily be several gigabytes in size.  On a freshly formatted disk, these files load fairly quickly.  Over time, however you may start to notice performance degradation - caused by disk fragmentation.

We touched on disk fragmentation when we were discussing the Page File a couple of months ago, but we never really got into the nuts and bolts of it.  To understand disk fragmentation though, you need to understand the basic structure of hard disks.  When you format a hard disk, the formatting process divides the disk into sectors, each of which contains space for 512 bytes of data.  The file system then combines groups of sectors into clusters.  A cluster is the smallest unit of space available for holding a single file - or part of a file.  On NTFS disks, the cluster sizes are determined based on the drive size as shown below (this information is also available in Microsoft KB 314878).  When formatting disks it is possible to change the cluster size, however this may cause additional performance issues.

Drive Size (Logical Volume) Cluster Size Sectors 512MB or less 512 bytes 1 513MB - 1,024MB (1GB) 1,024 bytes (1kb) 2 1,025MB - 2,048MB (2GB) 2,048 bytes (2kb) 4 2,049MB + 4,096 bytes (4kb) 8

Using the information above, if you were to take a 100MB video file, the file would be divided into roughly 25,000 pieces.  If you save this 100MB file onto a freshly formatted disk, the information would be written in contiguous clusters.  Since all of the clusters holding the data for this file are physically adjacent to each other, the mechanical components of the hard disk work very efficiently, pulling the data in one operation.  In addition, the hard disk's cache and the Windows disk cache can anticipate data requests and fetch data from nearby clusters.  This data can then be retrieved by an application from cached memory which is faster than retrieving the information from the disk itself. 

Seems pretty straightforward, right?  The problem is that the hard disks don't stay neatly organized for very long.  Whenever you add data to an existing file, the file system has to allocate more clusters for storage.  Typically, these clusters wind up being in a different physical location on the disk.  As you delete files, you create gaps in the arrangement of the contiguously stored files.  As you save new files (and this is especially true for large files), the file system uses up all of these bits of free space - resulting in the new files being scattered all over the disk in noncontiguous pieces.  And thus we end up with fragmented disks and system performance issues because the disk heads have to spend time moving from cluster to cluster before they can read or write the data.

Enter Disk Defragmenter.  This utility physically rearranges the files so that they are stored (as much as possible) in physically contiguous clusters.  In addition to the consolidation of files and folders, the Defragmenter utility also consolidates free space - meaning that it is less likely for new files to be fragmented when you save them.  For operating systems prior to Windows Vista, you had to manually run the utility or schedule automatic defragmentation via a scheduled task.  On Windows Vista, Disk Defragmenter runs as a low-priority background task that is automatically run on a weekly basis without requiring user intervention.  On Windows Server 2008, which uses the same Disk Defragmenter, the automatic defragmentation is not enabled by default.  Also, the color-coded display that was part of earlier versions of the utility has been retired (believe it or not, more than a few people have asked about that!).  Aside from the GUI version of the tool, you can also use a command-line version that enables some more granular control over the process.  The utility name is DEFRAG.EXE and does require administrative privileges to run.  The basic operation of the utility involves passing it a driver letter, for example: defrag.exe c: would perform a defragmentation of the C: drive.  You can also specify other options through the use of command-line switches:

  • -c: Defragments all volumes on the system.  You can use this switch without needing to specify a drive letter or mount point
  • -a: Perform an analysis of the selected drive and provides a summary output (shown below):

  • -r: Performs a partial defragmentation by consolidating only file fragments that are less than 64MB in size.  This is the default setting
  • -w: Performs a full defragmentation by consolidating all file fragments regardless of size
  • -f: Force defragmentation of the volume even if the amount of free space is lower than normally required.  When running this, be aware that it can result in slow system performance while the defragmentation is occurring
  • -v: Displays verbose reports.  When used in combination with the -a switch, only the analysis report is displayed.  When used alone, both the analysis and defragmentation reports are shown.
  • -i: Runs the defragmentation in the background and only if the system is idle
  • -b: Optimizes boot files and applications, but leaves the rest of the drive untouched

So now that we've covered what disk fragmentation is and how to address it, there are some caveats.  You must have at least 15 percent free space on the disk volume before Disk Defragmenter can completely defragment the volume.  If you have less free space, then a partial defragmentation will occur (unless you force the defragmentation with the -f switch).  Also, you cannot defragment a volume that has been marked by the OS as possibly containing errors.  This is where you would need to use the CHKDSK.EXE utility to ensure that there are no underlying disk issues.  Some other things to look out for:

  • Empty the Recycle Bin before defragmenting.  Disk Defragmenter does not defragment the Recycle Bin
  • As we discussed in our Page File post, if you want to defragment the page file, you need to zero it out first and then defragment the disk
  • By default, fragments that are greater than 64MB in size are ignored by Disk Defragmenter.  Fragments of this size (which already contain at least 16,000 contiguous clusters) have a negligible impact on performance
  • Disk Defragmenter will not defragment files that are in use.  For best results, shut down all running programs, or log off and log back in as an administrative account before defragmenting the disk

And with that, it's time to wrap up this post.  Until next time ...

- CC Hameed

Share this post :

Group Policy Logging on Windows Vista

Tue, 03/11/2008 - 19:00

Although the bulk of Group Policy Processing and Troubleshooting is handled by our Directory Services team, we often collaborate on these issues - mainly when the issue relates to a user logging in and not being presented with their desktop environment as they would expect.  Instead they are simply presented with a blank background (usually blue!) with no icons.  It's not the dreaded "Blue Screen of Death" - it's a blue screen of, well ... nothing.  Usually we will troubleshoot this by turning on debug logging for Group Policies to capture a Userenv.log to figure out if the basic shell (explorer.exe) is even being called.

However, in Windows Vista, the Group Policy engine no longer records information in the userenv.log.  Instead, detailed logging of Group Policies can be located using Event Viewer.  The log for group policy processing can be found in the Event Viewer under Applications and Services Logs\Microsoft\Windows\Group Policy\Operational - a sample is shown below.

As you can see, each of the policy processing events that occur on the client are logged in this event viewer channel.  This is an administrator-friendly replacement for the userenv.log.  When looking at these events in the event viewer, there are some event ranges to be aware of:

Range Meaning 4000 - 4299 Scenario Start Events 5000 - 5299 Corresponding Success Scenario End Events (scenario start event + 1000) 5300 - 5999 Informational Events 6000 - 6299 Corresponding Warning Scenario End Events (scenario start event + 2000) 6300 - 6999 Warning Events (Corresponding Informational Event + 1000) 7000 - 7299 Corresponding Error Scenario End Events (Scenario Start Event + 3000) 7300 - 7999 Error Events (Corresponding Informational Event + 2000) 8000 - 8999 Policy Scenario Success Events

Administrative events relating to Group Policy are still logged in the System Event Log, similar to pre-Windows Vista platforms.  The difference is that the event source for the event is now Group Policy instead of USERENV.  In Windows Vista, the Group Policy script processing errors are also now logged through the same mechanism as the rest of the Group Policy errors.

And that brings us to the end of this quick post on Group Policy Logging on Windows Vista.  Until next time ...

Additional Resources:

- CC Hameed

Share this post :

EDIT:

3/11: Removed last paragraph (applied to server, not client OS), added additional Technet links and re-published article

Key Principles of Security

Fri, 03/07/2008 - 12:00

OK, so today's isn't really something "Performance" related, but nevertheless, I think we can all safely agree that this is something that all administrators should be aware of.  During our Windows Vista and Windows Server 2008 posts we've been talking about "reducing the attack surface" and other security enhancements.  So today we're going to go over some security concepts at a very high level.  If you have read through the Windows 2003 Resource Kit or the Windows Security Resource Kit, then this information will be quite familiar to you.

The basic skill in securing your environment is to understand the big picture.  In other words, not only how to secure your computers and networks, but also what your limitations might be.  We've all heard of the principle of least privilege.  If an application or user has privileges beyond what they really require to perform their tasks, then the potential exists for an attacker to take advantage of that fact to compromise your environment.  In the past, many domain administrators only had one account that they used for everything - reading email, administering the domain, writing documentation etc.  So if that administrator's account was somehow used to launch an attack, the attack was carried out with all of the domain administrator's privileges - often to devastating effect.  Many environments now separate the accounts based on the work being done.  For reading email etc, a domain administrator would have a normal user account.  However they would have a second account that they would use for administrative tasks.  By separating the roles, the you reduce the risks of widespread compromise.

Another key phrase that we're used to hearing is "Defense in Depth".  What does this mean?  If you use the analogy of the onion, then each layer that you peel away gets you closer to your critical asset(s).  At each layer you should protect your assets as if that was the outermost layer.  The net result is an aggregated security model.  The most common example of this is when dealing with email - incoming mail is filtered by the server for spam and malware, as well as on the client when email attachments are scanned before they are opened.

We mentioned the "Attack Surface" in the first paragraph.  What exactly does that mean?  If you think about it, an attacker only needs to know about a single vulnerability in your environment.  As the administrator, you have to know about all of your potential weaknesses - your attack surface.  The smaller the attack surface, the fewer potential targets for an attacker to exploit.  Reducing the attack surfaces takes a number of forms, such as limiting access to a machine, not installing unnecessary software, and disabling unneeded services.  One of the offerings in the Windows Server 2008 family, Server Core, dramatically reduces the attack surfaces by providing a minimal environment to run specific server roles.  We discussed this in an earlier post, called "Getting Started with Server Core."

One of the keys to security in an environment is the design.  Security should be an integral component of network and infrastructure design - the old adage, "an ounce of prevention is worth a pound of cure" is perhaps the best way to express this.  Beyond the initial design however, the actual deployment and ongoing maintenance of the environment have a major impact on security.  One example of where you may run into problems is if you attempt to secure a database application after it is implemented.  The very real risk in this scenario is that the application may not work after you secure it - and oftentimes, the pressure to maintain the application availability will trump the need to secure the application - or at least push the task of securing the application lower on the priority list.

So before we wrap up, there are a couple of very good articles to refer you to that discuss some of the principles we've talked about above.  Both of them were written by Scott Culp of the Microsoft Security Response Center.  The first article discusses "The 10 Immutable Laws of Security".  Very briefly, the 10 laws are:

  1. If a bad guy can persuade you to run his program on your computer, it's not your computer anymore
  2. If a bad guy can alter the operating system on your computer, it's not your computer anymore
  3. If a bad guy has unrestricted physical access to your computer, it's not your computer anymore
  4. If you allow a bad guy to upload programs to your website, it's not your website anymore
  5. Weak passwords trump strong security
  6. A computer is only as secure as the administrator is trustworthy
  7. Encrypted data is only as secure as the decryption key
  8. An out of date virus scanner is only marginally better than no virus scanner at all
  9. Absolute anonymity is not practical, in real life or on the web
  10. Technology is not a panacea

Scott's other article is titled "The 10 Immutable Laws of Security Administration" - and is a listing of ten basic laws regarding the nature of security.

Well, that's it for this post.  This was a little departure from what we normally cover, but hopefully you found this information useful!  Until next time ...

Additional Resources:

- CC Hameed

Share this post :