Friday, September 12, 2008

How to log Site Usage Data for Audit Purpose on Sharepoint 2007

My company users come up with this task when we started exploring the Site Audit feature on Sharepoint. Sharepoint has Built-in Site Auditing features by Microsoft which site administrator can turn it on, on the sharepoint site and monitoring user's activities such as creating post, editing, deleting etc. But it's simply too much information when it is turn on and retrieving of data can be quite tedious on the live server with alot of users activities going on. All we wanted to do is just monitoring the user access on some of the sites. So we tried out Site Usage Analysis features, which is another alternative by Microsoft to monitor the Site Usage for audit. But Site Usage mainly focused on showing which pages are being access by which user for site administrator to monitor the usage. Everything seems fine. But we hit 1 problem. We don't like the default presentation by Microsoft. So we tried to use Sharepoint API to retrieve the Site Usage. There's 1 great article here.

http://www.15seconds.com/issue/050623.htm

But this doesn't meet our requirement. The weaknest or rather Microsoft conservative way of implementing API is that, Sharepoint is only caching 1 month data inside database and you can only retrieve the 1 month usage data using Sharepoint API provided by Microsoft. But for our audit requirement we need more than 1 month. We wanted to keep about 1 yr site usage data for audit.

I've made some consultation with Microsoft Product Support team and we decided to parse the site usage data from the Sharepoint 12 hives log file and put it inside the database. This is the first time we decided to venture beyond what's provided by Microsoft API.

I'm creating a small console program. What's this console program did is, parsing 1 day of Site Usage Data directly from Sharepoint Log and save it inside the database. And we schedule to run it everyday at 1:30 Am. Sharepoint is always writing usage analysis to the log at 1:00 am on our central administration site. So we run our own Site Usage Parser at 1:30 am -- giving sharepoint enough time to finish writing the Site Usage Log to 12 hive.

And my console program is designed to read from 1 of the table called SiteCollections. It is the table where I defined which sites and subsites we wanted to log the user access. We designed this way because logging the whole of Sharepoint Server will make our database occupied alot of space. We designed to selectively log only those sites we wanted to monitor.

Hope you all fine my program useful. This will help you store the Site Usage inside database from Sharepoint log. And you will be able to store data selectively from the sites or subsites you want to monitor. As for front end presentation of how you want to show the usage to your users or auditor, I will leave it to your creativity. =)

Database Design

SiteCollection (Table)

Columns - URL(String) 'This is for specifying which site or subsites you want to parse the site usage. If you don't specify any site URL nothing will be parse from log file.

SiteUsageLog(Table) 'This is the main table where you will log all the parsed data from log file

Columns - LoginDate(DateTime) 'Store the date and time user log in
User_Name(String) 'domain\user
URL 'Site URL e.g http://yourmosssite/sites/yoursubsite
Referral 'Referral URL
Folder 'Which folder user access
Page 'Which page e.g default.aspx



MainConsole.vb

Imports System
Imports System.Collections.Generic
Imports System.Text
Imports System.Data.SqlClient
Imports Microsoft.SharePoint
Imports System.Collections
Imports System.Data
Imports System.IO
Imports System.Runtime.InteropServices
Imports System.ComponentModel
Imports Microsoft.Win32
Imports System.Diagnostics


Namespace setDB4Log
Module MainConsole
Dim writer As StreamWriter
Dim logger As StreamWriter
Public sysLog As String = ""
'Public tracelog As String = ""
Dim dir As String = "D:\Log Files\SiteUsage\"
Dim logFile As String = dir & "SiteUsageConsole " & Now().ToString("yyyyMMdd_hhmmss") & ".log"
'Dim traceFile As String = dir & "TraceLog " & Now().ToString("yyyyMMdd_hhmmss") & ".log"
Sub Main(ByVal args As String())
'Connect String
'tracelog += Now().ToString("yyyyMMdd_hhmmss") & vbCrLf
Dim strConneectionSting As String = "Data Source=SQL2005;Initial Catalog=WSS_Usage_Log;User Id=User;Password=Password;"
Dim myconnection As New SqlConnection(strConneectionSting)

'Get log data for sites that are defined in SiteCollections table. Any other sites usage data won't be processed.
Dim strCmd As String = "SELECT URL from SiteCollections"
Dim adapter As New SqlDataAdapter(strCmd, myconnection)
Dim dtUrl As New DataTable()
adapter.Fill(dtUrl)


Dim parser As New Parser()
'fill in any url which belongs to your web application
Dim url As String = "http:\\yourmosssite"
Dim urlitem As String
Dim LogDir As String
Dim webappid As String
Dim [date] As String = ""
If args.Length = 0 Then
[date] = DateTime.Now.AddDays(-1).ToString("yyyy-MM-dd")
Else
For Each backDate As String In args
'To run back date logs
[date] = DateAndTime.DateValue(backDate).ToString("yyyy-MM-dd")
Next
End If
Console.WriteLine("Parsing Site Usage Data for " & [date])
'get yesterday
Dim insertRow As String
'Dim seekUrlList As String

'Delete data from database if the data is already exist in selected date
Dim deleterow As String = "DELETE from Site_Usage_Log where LoginDate >='" + [Date] + "' And LoginDate<'"+ [Date]+"'" 'Dim subUrl As String Dim insertCmd As New SqlCommand() Dim deleteCmd As New SqlCommand() Dim seekUrlCmd As New SqlCommand() Try 'get log file location Dim logRegkey As RegistryKey = Registry.LocalMachine logRegkey = logRegkey.OpenSubKey("SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\WSS") Dim val As Object = logRegkey.GetValue("LogDir") LogDir = val.ToString() Dim web As String Dim folder As String Dim item As String Dim doc As String Dim index As Integer 'Dim num As Integer Dim site As New SPSite(url) webappid = site.WebApplication.Id.ToString() LogDir = LogDir + webappid + "\" + [Date] + "\00.log" 'open the connection myconnection.Open() insertCmd.Connection = myconnection deleteCmd.Connection = myconnection seekUrlCmd.Connection = myconnection deleteCmd.CommandText = deleterow deleteCmd.ExecuteNonQuery() 'delete the overage date Dim ds As DataSet = parser.GetLogDataSet(LogDir) 'fill the database with the parsed data For Each row As DataRow In ds.Tables(0).Rows doc = row("Document").ToString() index = doc.LastIndexOf("/") web = row("Web").ToString() If (index < item =" doc" folder = "" folder =" web" item =" doc.Substring(index"> "") Then
urlitem = row("SiteUrl") + "/" + web
Else
urlitem = row("SiteUrl").ToString()
End If
'tracelog += urlitem & vbCrLf
'Console.WriteLine(urlitem)
For Each UrlRow As DataRow In dtUrl.Rows
If urlitem.ToLower().Contains(UrlRow(0).ToString().ToLower()) Then
If row("UserName").ToString() <> "sharepoint\system" Then
insertRow = "insert into Site_Usage_Log(LoginDate,User_Name,URL,Referral,Folder,Page) Values('" + [date] + " " + row("TimeStamp").ToString().Trim() + "','" + row("UserName").ToString().Trim() + "','" + urlitem + "','" + row("Referral").ToString().Trim() + "','" + folder.Trim() + "','" + item.Trim() + "')"
'tracelog += insertRow & vbCrLf
Console.WriteLine(insertRow)
insertCmd.CommandText = insertRow
insertCmd.ExecuteNonQuery()
End If
Exit For
End If
Next

Next
Catch ex As Exception
Console.WriteLine(ex.Message)
sysLog += ex.Message & vbCrLf
Finally
myconnection.Close()
End Try
If sysLog = "" Then
sysLog += "The log for " & [Date] & " is successfully created."
End If
Try
If Directory.Exists(dir) = False Then
Directory.CreateDirectory(dir)
End If
Catch ex As Exception
Console.WriteLine(ex.Message)
End Try

writer = New StreamWriter(logFile)
writer.Write(sysLog)
writer.Close()
End Sub
End Module
End Namespace

Parser.vb -- This code does most of the parsing logic

Imports System
Imports System.Collections.Generic
Imports System.Text
Imports System.Data.SqlClient
Imports Microsoft.SharePoint
Imports System.Collections
Imports System.Data
Imports System.IO
Imports System.Runtime.InteropServices
Imports System.ComponentModel
Imports Microsoft.Win32
Imports System.Diagnostics
Namespace setDB4Log
Public Class Parser
Private Const FILEHEADERLENGTH As Integer = 300
Private Const RECORDHEADERLENGTH As Integer = 50

Private _logRecords As New List(Of LogRecord)()
Private Sub ReadLogFile(ByVal logFilePath As String)
_logRecords.Clear()

' Open a stream to the log file
Using s As Stream = File.OpenRead(logFilePath)
Dim br As New BinaryReader(s)

' Read the file header
Dim fileHeaderData As Byte() = br.ReadBytes(FILEHEADERLENGTH)

While s.Position < s.Length
' Read the record header array
Dim recordHeader As Byte() = br.ReadBytes(RECORDHEADERLENGTH)
Dim headerData As New LogRecordHeader(recordHeader)

' Get the length of the current record and read the array
Dim recordLength As Integer = headerData.RecordLength()
Dim recordData As Byte() = br.ReadBytes(recordLength)
If recordData(0) = 0 Then
Exit While
End If
_logRecords.Add(New LogRecord(headerData, recordData))
s.Flush()
End While
s.Flush()
br.Close()
s.Close()
End Using
End Sub

Public Function GetLogDataSet(ByVal logFilePath As String) As DataSet
ReadLogFile(logFilePath)

Dim dsSTSLogs As New DataSet("dsSTSLogs")
Dim dtSTSLogs As New DataTable("dtSTSLogs")

Dim dcTimeStamp As New DataColumn("TimeStamp")
'DataColumn dcSiteGuid = new DataColumn("SiteGUID");
Dim dcSiteUrl As New DataColumn("SiteUrl")
Dim dcWeb As New DataColumn("Web")
Dim dcDocument As New DataColumn("Document")
Dim dcUserName As New DataColumn("UserName")
'DataColumn dcQueryString = new DataColumn("QueryString");
Dim dcReferral As New DataColumn("Referral")
'DataColumn dcUserAgent = new DataColumn("UserAgent");
'DataColumn dcCommand = new DataColumn("Command");
'DataColumn dcHttpStatus = new DataColumn("HttpStatus");
'DataColumn dcBytesSent = new DataColumn("BytesSent");

dtSTSLogs.Columns.Add(dcTimeStamp)
'dtSTSLogs.Columns.Add(dcSiteGuid);
dtSTSLogs.Columns.Add(dcSiteUrl)
dtSTSLogs.Columns.Add(dcWeb)
dtSTSLogs.Columns.Add(dcDocument)
dtSTSLogs.Columns.Add(dcUserName)
'dtSTSLogs.Columns.Add(dcQueryString);
dtSTSLogs.Columns.Add(dcReferral)
'dtSTSLogs.Columns.Add(dcUserAgent);
'dtSTSLogs.Columns.Add(dcCommand);
'dtSTSLogs.Columns.Add(dcHttpStatus);
'dtSTSLogs.Columns.Add(dcBytesSent);

dsSTSLogs.Tables.Add(dtSTSLogs)

For Each record As LogRecord In _logRecords
Dim drRecord As DataRow = dtSTSLogs.NewRow()

drRecord("TimeStamp") = record.TimeStamp
'drRecord["SiteGUID"] = record.SiteGUID;
drRecord("SiteUrl") = record.SiteUrl
drRecord("Web") = record.Web
drRecord("Document") = record.Document
drRecord("UserName") = record.UserName
'drRecord["QueryString"] = record.QueryString;
drRecord("Referral") = record.Referral
'drRecord["UserAgent"] = record.UserAgent;
'drRecord["Command"] = record.Command;
'drRecord["HttpStatus"] = record.Status;
'drRecord["BytesSent"] = record.BytesSent;

dtSTSLogs.Rows.Add(drRecord)
Next

Return dsSTSLogs
End Function
Public Sub ConvertToCSVFile(ByVal logFilePath As String)
Dim defaultPath As String = Path.Combine(Path.GetDirectoryName(logFilePath), Path.GetFileNameWithoutExtension(logFilePath)) + ".csv"

ConvertToCSVFile(logFilePath, defaultPath)
End Sub

Public Sub ConvertToCSVFile(ByVal logFilePath As String, ByVal csvFilePath As String)
ReadLogFile(logFilePath)

Dim recordLineFormat As String = "{0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11}"

Using sw As StreamWriter = File.CreateText(csvFilePath)
For Each record As LogRecord In _logRecords
sw.WriteLine([String].Format(recordLineFormat, record.TimeStamp, record.SiteGUID, record.SiteUrl, record.Web, record.Document, _
record.UserName, record.QueryString, record.Referral, record.UserAgent, record.Command, record.BytesSent, _
record.Status))
Next

sw.Close()
End Using
End Sub
End Class
Public Class LogRecordHeader
Public Const STANDARD_GUID_LENGTH As Integer = 36
Public Const STANDARD_TIMESTAMP_LENGTH As Integer = 8

Private _recordHeader As Byte()

Private _siteUrlLength As Integer
Public ReadOnly Property SiteUrlLength() As Integer
Get
Return _siteUrlLength
End Get
End Property

Private _webLength As Integer
Public ReadOnly Property WebLength() As Integer
Get
Return _webLength
End Get
End Property

Private _docLength As Integer
Public ReadOnly Property DocLength() As Integer
Get
Return _docLength
End Get
End Property

Private _userNameLength As Integer
Public ReadOnly Property UserNameLength() As Integer
Get
Return _userNameLength
End Get
End Property

Private _bytesSent As Long
Public ReadOnly Property BytesSent() As Long
Get
Return _bytesSent
End Get
End Property

Private _httpStatus As Integer
Public ReadOnly Property HttpStatus() As Integer
Get
Return _httpStatus
End Get
End Property

Private _queryStringLength As Integer
Public ReadOnly Property QueryStringLength() As Integer
Get
Return _queryStringLength
End Get
End Property

Private _referralLength As Integer
Public ReadOnly Property ReferralLength() As Integer
Get
Return _referralLength
End Get
End Property

Private _userAgentLength As Integer
Public ReadOnly Property UserAgentLength() As Integer
Get
Return _userAgentLength
End Get
End Property

Private _commandLength As Integer
Public ReadOnly Property CommandLength() As Integer
Get
Return _commandLength
End Get
End Property

Public Sub New(ByVal recordHeader As Byte())
Me._recordHeader = recordHeader
Me._siteUrlLength = BitConverter.ToUInt16(recordHeader, SITEURL_OFFSET)
Me._webLength = BitConverter.ToUInt16(recordHeader, WEB_OFFSET)
Me._docLength = BitConverter.ToUInt16(recordHeader, DOC_OFFSET)
Me._bytesSent = BitConverter.ToUInt32(recordHeader, BYTESSENT_OFFSET)
Me._httpStatus = BitConverter.ToUInt16(recordHeader, HTTPSTATUS_OFFSET)
Me._userNameLength = BitConverter.ToUInt16(recordHeader, USERNAME_OFFSET)
Me._queryStringLength = BitConverter.ToUInt16(recordHeader, QUERYSTRING_OFFSET)
Me._referralLength = BitConverter.ToUInt16(recordHeader, REFERRAL_OFFSET)
Me._userAgentLength = BitConverter.ToUInt16(recordHeader, USERAGENT_OFFSET)
Me._commandLength = BitConverter.ToUInt16(recordHeader, COMMAND_OFFSET)
End Sub

Private Const SITEURL_OFFSET As Integer = 12
Private Const WEB_OFFSET As Integer = 14
Private Const DOC_OFFSET As Integer = 16
Private Const BYTESSENT_OFFSET As Integer = 20
Private Const HTTPSTATUS_OFFSET As Integer = 24
Private Const USERNAME_OFFSET As Integer = 26
Private Const QUERYSTRING_OFFSET As Integer = 28
Private Const REFERRAL_OFFSET As Integer = 30
Private Const USERAGENT_OFFSET As Integer = 32
Private Const COMMAND_OFFSET As Integer = 34

Public Function RecordLength() As Integer
Return STANDARD_GUID_LENGTH + 1 + STANDARD_TIMESTAMP_LENGTH + 1 + SiteUrlLength + 1 + WebLength + 1 + DocLength + 1 + UserNameLength + 1 + QueryStringLength + 1 + ReferralLength + 1 + UserAgentLength + 1 + STANDARD_GUID_LENGTH + 1 + CommandLength + 1
End Function
End Class

Public Class LogRecord
Private _siteGuid As String
Public ReadOnly Property SiteGUID() As String
Get
Return _siteGuid
End Get
End Property

Private _timeStamp As String
Public ReadOnly Property TimeStamp() As String
Get
Return _timeStamp
End Get
End Property

Private _siteUrl As String
Public ReadOnly Property SiteUrl() As String
Get
Return _siteUrl
End Get
End Property

Private _web As String
Public ReadOnly Property Web() As String
Get
Return _web
End Get
End Property

Private _document As String
Public ReadOnly Property Document() As String
Get
Return _document
End Get
End Property

Private _userName As String
Public ReadOnly Property UserName() As String
Get
Return _userName
End Get
End Property

Private _queryString As String
Public ReadOnly Property QueryString() As String
Get
Return _queryString
End Get
End Property

Private _referral As String
Public ReadOnly Property Referral() As String
Get
Return _referral
End Get
End Property

Private _userAgent As String
Public ReadOnly Property UserAgent() As String
Get
Return _userAgent
End Get
End Property

Private _command As String
Public ReadOnly Property Command() As String
Get
Return _command
End Get
End Property

Private _status As Integer
Public ReadOnly Property Status() As Integer
Get
Return _status
End Get
End Property

Private _bytesSent As Long
Public ReadOnly Property BytesSent() As Long
Get
Return _bytesSent
End Get
End Property

Public Sub New(ByVal headerData As LogRecordHeader, ByVal recordData As Byte())
Dim timeStampOffset As Integer = LogRecordHeader.STANDARD_GUID_LENGTH + 1
Dim siteUrlOffset As Integer = timeStampOffset + LogRecordHeader.STANDARD_TIMESTAMP_LENGTH + 1
Dim webOffset As Integer = siteUrlOffset + headerData.SiteUrlLength + 1
Dim documentOffset As Integer = webOffset + headerData.WebLength + 1
Dim userNameOffset As Integer = documentOffset + headerData.DocLength + 1
Dim queryStringOffset As Integer = userNameOffset + headerData.UserNameLength + 1
Dim referralOffset As Integer = queryStringOffset + headerData.QueryStringLength + 1
Dim userAgentOffset As Integer = referralOffset + headerData.ReferralLength + 1
Dim commandOffset As Integer = userAgentOffset + headerData.UserAgentLength + LogRecordHeader.STANDARD_GUID_LENGTH + 2

Dim enc As Encoding = UTF8Encoding.[Default]

_siteGuid = enc.GetString(recordData, 0, LogRecordHeader.STANDARD_GUID_LENGTH)
_timeStamp = enc.GetString(recordData, timeStampOffset, LogRecordHeader.STANDARD_TIMESTAMP_LENGTH)
_siteUrl = enc.GetString(recordData, siteUrlOffset, headerData.SiteUrlLength)
_web = enc.GetString(recordData, webOffset, headerData.WebLength)
_document = enc.GetString(recordData, documentOffset, headerData.DocLength)
_userName = enc.GetString(recordData, userNameOffset, headerData.UserNameLength)
_queryString = enc.GetString(recordData, queryStringOffset, headerData.QueryStringLength)
_referral = enc.GetString(recordData, referralOffset, headerData.ReferralLength)
_userAgent = enc.GetString(recordData, userAgentOffset, headerData.UserAgentLength)
_command = enc.GetString(recordData, commandOffset, headerData.CommandLength)
_status = headerData.HttpStatus
_bytesSent = headerData.BytesSent
End Sub
End Class
End Namespace

No comments: