Agenda: exercises-tickets-cacti-nagios-smokeping.txt

File exercises-tickets-cacti-nagios-smokeping.txt, 14.6 KB (added by admin, 8 years ago)
Line 
1Network Monitoring and Management
2
3Cacti, Nagios and Smokeping Ticket Creation with Request Tracker
4----------------------------------------------------------------
5
6Notes:
7------
8* Commands preceded with "$" imply that you should execute the command as
9  a general user - not as root.
10* Commands preceded with "#" imply that you should be working as root.
11* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
12  imply that you are executing commands on remote equipment, or within
13  another program.
14
15Exercises
16---------
17
18At this point in th week you should have Cacti, Nagios and Smokeping
19installed on your PCs. These exercises show you how to set up each
20of these programs to send alerts to the RT (Request Tracker) ticketing
21system to generate tickets.
22
23
24Exercises Part I
25----------------
26
270. Log in to your PC or open a terminal window as the sysadm user.
28
291. Verify that you have configured rt-mailgate to work with your MTA
30---------------------------------------------------------------------
31
32Open the file /etc/aliases:
33
34        $ sudo editor /etc/aliases
35
36In the file /etc/aliases you should have the following two lines:
37
38
39net-comment: "|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/"
40net:        "|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/"
41
42
43If these lines are not in /etc/aliases, then be sure to add them. When you are done save
44the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some
45new aliases to be used:
46
47        $ sudo newaliases
48       
49       
502. Configure Cacti to send emails to net@localhost to generate tickets in RT
51----------------------------------------------------------------------------
52
53This is the hardest of the three. If you have not installed the Plugin Architecture for
54Cacti, then skip this exercise.
55
56You can view how this work by logging in on the Cacti instance running on the noc
57box as this has the Cacti Plugin Architecture installed and the two plugins called,
58"Settings" and "Threshold".
59
60To see how Cacti can generate a ticket first go to:
61
62        http://noc.ws.nsrc.org/cacti/
63       
64Log in as "admin" (system password). The do:
65
66        * Click on the Console tab (upper-left)
67        * Click on "Settings" (lower-left)
68        * Click on the "Mail / DNS" tab (upper-right)
69        * Verify that the fields for email are properly filled in:
70                - Test Email                    (sysadm or net @ localhost)
71                - Mail Services                 (PHP Mail() Function)
72                - From Email Address    (cacti@localhost)
73                - From Name                             (Cacti System Monitor)
74                - SMTP Hostname                 (localhost)
75                - SMTP Port                             (25)
76               
77Now we need to create a threshold that we'll use to trigger an email that, in turn, will
78create a ticket in RT:
79
80        * Click on "Thresholds" (middle-left)
81        * Click on the "Add" option (upper-right)
82        * Select a Host (localhost, for example)
83        * Select a Graph (Processes)
84        * Select the Data Source (proc)
85
86Now you will be presented with a detailed screen where you can specify what should
87happen if the threshhold is reached. Verify or do the following:
88
89        * Threshold Name:               Something Descriptive
90        * Threshold Enabled is checked
91        * Threshold Type:               High / Low Values (for Processes)
92        * High Threshold:               50 (this will cause the threshold to trip)
93        * Breach Duration:              5 minutes (this will give us ticket in 5 to 10 minutes)
94        * Data Type:                    Exact Value
95        * Re-Alert Cycle:               Never
96        * Extra Alert Emails:   net@localhost,sysadm@localhost
97       
98This will send an email to net@localhost within 5 or 10 minutes. This will create a
99new ticket in RT. In addition an email will go to sysadm@localhost. You can view the
100email as sysadm by doing:
101
102        $ mutt -f /var/mail/sysadm
103       
104You can create all types of threshold states that can be tripped, which will result in
105ticket creation. Feel free to play around with the cacti instance on the Noc to create
106new thresholds. You can see if they are working by logging in on the Noc instance of
107Request Tracker (RT) at:
108
109        http://noc.ws.nsrc.org/rt/
110       
111Username "sysadm" and password is the class password.
112
113       
1143. Configure Smokeping
115----------------------
116
117In the file:
118
119        /etc/smokeping/config.d/Alerts
120       
121You can tell Smokeping where alert outputs should go. Edit the file:
122
123        $ sudo vi /etc/smokeping/config.d/Alerts
124       
125And Update the top of the file to be:
126
127        *** Alerts ***
128        to = net@localhost
129        from = smokealert@localhost
130
131    At the end of the file, add another alert like this:
132
133    +anydelay
134    type = rtt
135    # in milliseconds
136    pattern = >1
137    comment = Just for testing
138
139Now exit and save the file.
140
141Notice the pattern in this alert. It means that an alert will be triggered
142as soon as a sample measurement has "ANY" delay, that is, more than one
143millisecond. This is just for testing. In reality, you will want to create
144an alert based on your observed baseline. For example, if your DNS servers'
145delay suddendly goes from under 10 ms to over 100ms.
146
147Next, be sure you have this test alert defined for some of your Targets.
148You can either turn on alerts by defining alerts for a probe in
149the /etc/smokeping/config.d/Probes file, or by individual Targets
150entries.
151
152In our case let's edit the Targets file and turn on alerts for our
153DNS Latency checks.
154   
155    $ sudo vi /etc/smokeping/config.d/Targets
156
157Find (or add if necessary) the following section in the file:
158
159        +DNS
160        probe = DNS
161                ...
162               
163And find the entry for our Local DNS server:
164
165        ++LocalDNS1
166        menu = 10.10.0.241
167        title = DNS Dela for local DNS Server s1
168        host = s1
169       
170And add the following alerts line after the "host = s1" line:
171
172        alerts = anydelay
173
174Save and exit from the file, then restart smokeping:
175
176    $ sudo service smokeping restart
177
178Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes
179for a new ticket to appear.
180
181
182Note - If you have not already configured the DNS Latency checks for Smokeping you may need to
183edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this:
184
185        $ sudo vi /etc/smokeping/config.d/Probes
186       
187And, at the bottom of the file add:
188
189        + DNS
190        binary = /usr/bin/dig
191        pings = 5
192        step = 180
193        lookup = www.nsrc.org
194
195Save and exit from the file and restart Smokeping:
196
197        $ sudo service smokeping restart
198
199
2004. Nagios and Request Tracker Ticket Creation
201----------------------------------------------
202
203To configure RT and Nagios so that alerts from Nagios automatically
204create tickets requires a few steps:
205
206* Create a proper contact entry for Nagios in
207  /etc/nagios3/conf.d/contacts_nagios2.cfg
208
209* Create the proper command in Nagios to use the rt-mailgate
210  interface. The command is defined in /etc/nagios3/commands.cfg
211
212These next two items should already be done in RT if you have
213finished the RT exercises.
214
215* Install the rt-mailgate software and configure it properly
216  in your /etc/aliases file for your MTA in use.
217
218* Configure the appropriate queues in RT to receive emails
219  passed to it from Nagios via the rt-mailgate software.
220
221
2225. Configure a Contact in Nagios
223---------------------------------
224
225   - Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg
226
227   # vi /etc/nagios3/conf.d/contacts_nagios2.cfg
228
229   - In this file we will first add a new contact name under
230     the default root contact entry. The new contact should
231     look like this:
232
233define contact{
234        contact_name                    net
235        alias                           RT Alert Queue
236        service_notification_period     24x7
237        host_notification_period        24x7
238        service_notification_options    c
239        host_notification_options       d
240        service_notification_commands   notify-service-ticket-by-email
241        host_notification_commands      notify-host-ticket-by-email
242        email                           net@localhost
243        }
244
245   - _DO NOT_ remote the "root" contact_name entry! This entry goes
246     below the "root" contact.
247
248   - the service_notification_option of "c" means only notify once a
249     service is considered "critical" by Nagios (i.e. down). The
250     host_notification_option of "d" means down. By specify only "c"
251     and "d" this means that notifications will not be sent for other
252     states.
253
254   - Note the email address in use "net@localhost" - this is important
255     as this was previously defined for RT.
256
257   - Now we must create a Contact Group that contains this contact.
258     We will call this group "tickets." Do this at the end of the file:
259
260define contactgroup{
261        contactgroup_name       tickets
262        alias                   email to ticket system for RT
263        members                 net,root
264        }
265
266   - You could leave off "root" as a member, but we've left this on to
267     have another user that receives email to help us troubleshoot if
268     there are issues.
269
270   - Now that your contact has been created you need to create the commands
271     that were referenced in the initial contact creation above, these are
272     "notify-service-ticket-by-email" and "notify-host-ticket-by-email"
273
274
2756. Update Nagios Commands
276-------------------------
277
278   - To create the notify-service-ticket-by-email and notify-host-ticket-by-email
279     commands we need to edit the file /etc/nagios3/commands.cfg.
280
281   # vi /etc/nagios3/commands.cfg
282
283  - In this file you already have two command definitions that we are using. These are
284    called notify-host-by-email and notify-service-by-email. We are going to add two
285    new commands.
286
287  - We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible
288    to type it without errors.
289
290  - Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email
291    command entries. Do not remove the old one.
292
293  - NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as
294    COPY and PASTE between some editors and environments may insert line breaks.
295
296################################################################
297# Additional commands created for network management workshop #
298################################################################
299
300# 'notifiy-host-ticket-by-email' command definition
301define command{
302        command_name    notify-host-ticket-by-email
303        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
304        }
305
306# 'notify-service-ticket-by-email' command definition
307define command{
308        command_name    notify-service-ticket-by-email
309        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
310        }
311
312
3137. Choose a Service to Monitor with RT Tickets
314----------------------------------------------
315
316
317   - The final step is to tell Nagios that you wish to notify the contact "tickets" for a
318     particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the
319     default contact_groups is "admins". To override this for a service edit the file
320     /etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the
321     service definitions.
322
323    - To send email to generate tickets in RT if HTTP goes down on a box you would edit the
324      HTTP service check so that it looks like this:
325
326# check that web services are running
327define service {
328        hostgroup_name                  http-servers
329        service_description             HTTP
330        check_command                   check_http
331        use                             generic-service
332        notification_interval           0 ; set > 0 if you want to be renotified
333        contact_groups                  tickets
334}
335
336     Note the additional item that we now have, "contact_groups." You can do this for other
337     entries as well if you wish.
338
339   - When you are done, save the file and exit.
340
341   - Now restart Nagios to verify your changes are correct.
342
343   # /etc/init.d/nagios3 stop
344   # /etc/init.d/nagios3 start
345
346
3474.) Generate RT Tickets for Hosts
348---------------------------------
349
350   - To do this you must either specify "contact_groups tickets" for individual host
351     definitions, or you must update the template file for all hosts and change the
352     default contact_groups entry to tickets. This file is generic-host_nagios2.cfg.
353
354   - If you wish to do this go ahead. Tickets will be generated if a host goes down
355     and you have specified the contact_groups for that host as being "tickets"
356
3575. See Nagios Tickets in RT
358---------------------------
359
360To verify your changes have worked we can be sure to monitor for HTTP one of our
361servers that is not running HTTP. Let's pick the second Mac Mini in our class
362or the box known as "s2.ws.nsrc.org" (see the network diagram for details).
363     
364If you do not have an entry for this machine add on to the file where your PCs
365are defined. If this is in a file called pcs.cfg you would do:
366     
367        # vi /etc/nagios3/conf.d/pcs.cfg
368
369In this file add (or verify you have) an entry that looks like this:
370
371define host {
372    use         generic-host
373    host_name   s2
374    alias       s2
375    address     10.10.0.242
376    parents     sw
377}
378
379Save and exit from the file.
380   
381Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup
382for HTTP service checks:
383
384        # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
385       
386Look for the "hostgroup_name http-servers" entry and update it so that it looks like this:
387
388
389# A list of your web servers
390define hostgroup {
391        hostgroup_name  http-servers
392                alias           HTTP servers
393                members         localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12,
394                                pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24,
395                                pc25,pc26,pc28,s2
396        }
397
398
399_REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s2"
400has been entered on the end of the line.
401
402Now save the file and exit and restart Nagios:
403
404        # service nagios3 stop
405        # service nagios3 start
406
407
408   - It will take a while (up to 10 minutes) for Nagios to report that HTTP is
409     "critical", but once that happens a new ticket should appear in your RT instance
410     in the net queue generated by Nagios.
411
412   - Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin"
413     with the password you chose when you created the RT sysadmin account. The new
414     ticket should appear in the "10 newest unowned tickets" box in the main log in
415     page in RT.