Event Log Monitoring

Hi All,

Thought I would ask here before looking on the internet,

Is there any specific event log IDs that are considered best practice to be looking for?

At the moment my monitoring policy contains just critical events but was wondering what everyone else was monitoring on this front?

Thanks,

Critical events:





Event - Level Critical Error Equal

Security:






























Event - ID 642 Equal
Event - ID 4738 Equal
Event - ID 624 Equal
Event - ID 4720 Equal
Event - ID 644 Equal
Event - ID 4740 Equal

642 or 4738 - psw reset, 624 or 4720 - new user account, 644 or 4740 - multiply wrong psw enter

Perfomance:















CPU Usage More than 75% for 60 min
RAM Usage More than 90% for 60 min
Network Usage More than 5% for 50 min

Network 5% because most customers have slow inet-channels, but monitoring counting local network speed.

HDD:




















Free Space Left on System Drive Less than 7%
Event - ID 51 Equal
Event - ID 50 Equal
Event - ID 55 Equal

Event 51 sometimes mean HDD bad sectors. Need HDD surface scan.

“Looking forward” :slight_smile: and waiting SMART monitoring.

Thanks @Sergey that is a fantastic list :smiley:

How about yours? :slight_smile:

My list is pretty basic, main reason I asked the question

Space less than 5% on System Drive
And Critical Error events

I am every day get critical errors… And what should I do with them all? :slight_smile:

I am starting to think about free disk space monitoring and waiting for SMART sensors.

I have a number of devices reporting a false positive for the critical events because when I check the computers there aren’t any in the logs, I have an open support job but am moving so one of the computers with the issue that I normally have access is not really accessible at the moment, but yes the smart sensor would be really good to have, I have been looking at using a third party tool and a script for checking smart status

Look SIV tool. It can gather ALL info about PC and can produce formated text log file. Early I used it with Zabbix

thanks I have found the website, I might have a look at it.
I have also started to go through some Event ID websites to look for codes that are worth monitoring, seems there isn’t an easy to find best practices to look for that I could find

DIsk error is mostly non usable. When Disk error occurs - that in 3-5% really mean disk surface error. I dont understand what Windows mean when register this error.

WAITING FULL FUNCTIONAL HDD SMART MONITORING FOR ITSM MODULE

Hi @Sergey

I’m also waiting for this but in the meantime with the new custom python script monitoring option I’m using this SMART script that runs every 5 min check to see if this helps you

import os
command = “wmic diskdrive get Caption, Status”
out=os.popen(command).read()
if “Pred Fail” in (out):
alert(1)
print (out)
else:
alert(0)
print (out)

Hello @Sergey ,

You can also use the script below to: Get SMART status of disk drive and Raise Error if Status is not OK

import os
command = “wmic diskdrive get Status”
out=os.popen(command).readlines()
count=0

for j in out:
print j

if out[1]!='OK
':
raise Exception(‘Disk SMART status changed from “OK”’)

Please tell us if this answers your question or if you need further assistance.

You can also refer to this link: ITarian Forum - ITarian Forum

20170309-Get-SMART–status-of-disk-drive-and-Raise-error-if-status-not-OK.json (602 Bytes)

@Marveltec Thank you for sharing your new custom python script monitoring option. We appreciate your help.

Hi @Cristina

just checking that script as it is will it work with the new custom monitoring script. I mean no need to add alert(1) or anything else?

Hello @Marveltec ,

You still need to add an alert if you want to be notified, the result for this script will be viewed on Logs if there’s no alert set to it.