NetVigil Contact Info for Devices in Server Farm
The Network Operations Center (NOC) needs this information for every machine in the CIT server farm.
Administrative Contact (System Owner):
- Is responsible for application-level filesystems and application tests.
- Should be familiar with all applications that are running on the device.
- Will be contacted by the CIT Systems Administration group or the NOC regarding any hardware problems, security issues or maintenance scheduling.
- Is responsible for providing an up-to-date on-call list for the applications responsibilities described under "Technical Contact (On-Call)."
- Will ultimately be responsible for the device if attempts to contact other personnel using on-call lists fail.
Technical Contact (On-Call):
- Is responsible for being familiar with all customer-owned applications running on this device.
- Will be contacted in the event that the NOC or systems administrators have a reported problem or issue with any applications on the device that may not be tested by a Netvigil application test or is not in alarm in Netvigil.
System Administrators:
The CIT system administrators of a device will be responsible for system-level filesystems and other device hardware related tests such as CPU utilization, network connectivity/utilization and memory utilization.
NOC Response Hours:
The system owner will designate the hours that they expect the NOC to respond to any critical Netvigil alarms or reports from campus of service outages pertaining to this device.
- If this device houses a critical service the NOC Response Hours should be designated as 24 x 7 (24 hours a day, 7 days a week).
- If this device houses a service that is driven by Cornell business hours, the NOC Response Hours may be designated as 8 am - 5 pm, 5 days a week Monday through Friday.
During the designated NOC Response Hours, the Network Operations Center (NOC) will make contact to the appropriate on-call list for any Netvigil alarms that go into a critical state. If an alarm goes into a critical state outside of the designated hours, the NOC will suppress the test (thus, acknowledging the fact that this test is in a critical state) and will then contact the appropriate on-call list at the beginning of the next designated response hours if the Netvigil alarm is still in this critical state.
Description:
A description of the overall purpose of the device and any applications that are housed on the device.
Impact of System or Service Outages:
Specify what impact this device or service has for the Cornell campus. Will an outage or problem with the device or applications running on this device affect a campus-wide audience, a particular department, just CIT, or will it have no effect?
Announcement Text:
This is where you provide the NOC with detailed information about your service or application with wording that may be used when posting notifications about this device or service in the event that this service becomes unavailable or requires maintenance.
- If your device or service has a university-wide impact, the NOC posts notifications to the net-announce-l mailing list and Network Status Page.
- If your device/service device has a CIT-only impact, the NOC posts notifications to the cit-all-l mailling list.
