Monday, March 5, 2012

Working with Perfmon CSV logs in Powershell


A few months back, I was working on some performance issues which required me to go through huge PerfMon logs. The files were around 200MB in size, and since, manually doing this was time consuming I wrote a couple of PowerShell one-liners which can get this information. I thought, hey this is great; why not share my experiences working with CSV files in PowerShell with the folks here.
While I was trying to automate this solution, I broke down the problem into 3 different components and kept the approach straight forward:
  1. Read the CSV file to get the headers and see which columns we would be interested in.
  2. Select a range of values, determined by the time slots we are interested in.
  3. Summarize the results using Measure-Object cmdlet.
Now, the details:

1. How to read only the header line of a CSV file and get the list of columns which we are interested in:

#read the header            
(Get-Content .\Perf.csv|Select-Object -First 1).Split(",")
What is happening here is that we are using the Get-Content cmdlet and the Select-Object cmdlet to return the first row of a CSV file and then splitting the result using the delimiter comma(“,”).
Digging a little deeper you will find that the Get-Content cmdlet, returns the content of an item at a given path. The cmdlet works on each line and returns an object for each line read. This works in our favor since the header of a well formed CSV file is always on the first line.
Also, the Select-Object cmdlet accepts some interesting arguments like FirstLast and Skip. Each of these arguments is self explanatory and tremendously helpful when working with large log files.
Let us say for the sake of example that you want to read the last 10 lines of a file then, you would do something like:
#read the last 10 lines of a file.            
Get-Content .\Perf.csv|Select-Object -Last 10
If you want to skip the first line in a file and then read the next 10 lines then, you would do something like:
#skips the first line and then reads the next 10 lines            
Get-Content .\Perf.csv|Select-Object -Skip 1 -First 10            
Getting back to reading the header of the CSV file once, we have the first line all we have to do is Split the line on comma (",") which is the delimiter of choice in CSV files.

(Get-Content .\perf.csv | Select-Object -First 1).Split(",")            
            
"(PDH-CSV 4.0) (Eastern Standard Time)(300)"            
"\\computer\LogicalDisk(C:)\Disk Reads/sec"            
"\\computer\LogicalDisk(C:)\Disk Writes/sec"            
"\\computer\LogicalDisk(C:)\Disk Transfers/sec"            

2. How to read each line in a CSV file and select only the range of values we are interested in

Now that we have the columns we are interested in, the next logical thing to do is use them to access what is inside the file. But, we are not always interested in selecting all the data. We may be interested in, let’s say only what happened between 9:34 AM and 10:06 AM. So, we can use these boundaries of interest to restrict the selection of data as shown below.
If you look at the code, we are using the Where-Object cmdlet to pipe only the values that fall within our range.
#First define the boundaries of the range we are interested in.            
$startTime = Get-Date "05/07/2011 09:34:51.421" -Format G            
$endTime = Get-Date "05/07/2011 10:06:51.421" -Format G             
            
#Next get the data.            
Import-Csv .\perf.csv| Where-Object {            
    ((Get-Date $_.'(PDH-CSV 4.0) (Central Daylight Time)(300)') -gt $startTime) -and              
    ((Get-Date $_.'(PDH-CSV 4.0) (Central Daylight Time)(300)') -lt $endTime)            
    }| Select-Object '(PDH-CSV 4.0) (Central Daylight Time)(300)',            
        '\\computer\LogicalDisk(C:)\Disk Reads/sec',`
        '\\computer\LogicalDisk(C:)\Disk Writes/sec',`
        '\\computer\LogicalDisk(C:)\Disk Transfers/sec'
Note:
When working with large CSV files, instead of piping the output of Import-Csv cmdlet is a little faster and does not consume a whole lot of memory.
As we saw in the step above, it is easy to select only a few values to look at from a CSV file. This definitely, helps when you have to work through a bunch of counters and get the averages or minimum and maximum values for each counter.  

3. You want to get the min, max and average values of the selected columns just, throw in a Measure-Object at the end

Let us look at how we can measure the conters in the CSV files. The code is very much similar to what we saw above. We add the Measure-Object cmdlet to the code from step2. and try to calculate various properties.
 The code first.
#First define the boundaries of the range we are interested in.            
$startTime = Get-Date "05/07/2011 09:34:51.421" -Format G            
$endTime = Get-Date "05/07/2011 10:06:51.421" -Format G             
            
#Next get the data.            
Import-Csv .\perf.csv| Where-Object {            
    ((Get-Date $_.'(PDH-CSV 4.0) (Central Daylight Time)(300)') -gt $startTime) -and              
    ((Get-Date $_.'(PDH-CSV 4.0) (Central Daylight Time)(300)') -lt $endTime)            
    }| Select-Object '(PDH-CSV 4.0) (Central Daylight Time)(300)',            
        '\\computer\LogicalDisk(C:)\Disk Reads/sec',`
        '\\computer\LogicalDisk(C:)\Disk Writes/sec',`
        '\\computer\LogicalDisk(C:)\Disk Transfers/sec'|`
       Measure-Object '\\computer\LogicalDisk(C:)\Disk Reads/sec',`
        '\\computer\LogicalDisk(C:)\Disk Writes/sec',`
        '\\computer\LogicalDisk(C:)\Disk Transfers/sec' -Average -Minimum -Maximum
Since, I did not have any sample csv files hand. I created a small example CSV file to work with. In the below code block you will see that we have selected a column of interest from the CSV file and got the average of the values in the column without actually having to go through the whole CSV.
As you can see the, output is in the form of a list.You can actually make the output more readable if you use Format-Table cmdlet.
Import-Csv .\perf.csv |            
Select-Object             
    '\\computer\LogicalDisk(C:)\Avg. Disk sec/Read'|             
Measure-Object '\\computer\LogicalDisk(C:)\Avg. Disk sec/Read'  -Average            
#output         
Count    : 20            
Average  : 0.200153773650827            
Sum      :             
Maximum  :             
Minimum  :             
Property : \\computer\LogicalDisk(C:)\Avg. Disk sec/Read            

Conclusion

I used this approach of munging through CSV logs to get quick details about individual counters or a bunch of counters when troubleshooting performance issues. The reason why this is helpful is because you can concentrate only on a subset of counters that maybe helpful to you instead of going through all the counters that are being collected in your work place.
The solution I have described here may not be very flexible in its approach but, if you can create a here-string with variables for machine name and instance name and have a list of commonly looked at counters; then it does offer a quick and dirty way to summarize big logs.
Also, I usually add a Format-Table at the end so that the data is a little more readable.

Source:

No comments:

Post a Comment