Agenda: exercises-tickets-cacti-nagios-smokeping.txt

File exercises-tickets-cacti-nagios-smokeping.txt, 14.6 KB (added by admin, 8 years ago)

Line
1	Network Monitoring and Management
2
3	Cacti, Nagios and Smokeping Ticket Creation with Request Tracker
4	----------------------------------------------------------------
5
6	Notes:
7	------
8	* Commands preceded with "$" imply that you should execute the command as
9	a general user - not as root.
10	* Commands preceded with "#" imply that you should be working as root.
11	* Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>")
12	imply that you are executing commands on remote equipment, or within
13	another program.
14
15	Exercises
16	---------
17
18	At this point in th week you should have Cacti, Nagios and Smokeping
19	installed on your PCs. These exercises show you how to set up each
20	of these programs to send alerts to the RT (Request Tracker) ticketing
21	system to generate tickets.
22
23
24	Exercises Part I
25	----------------
26
27	0. Log in to your PC or open a terminal window as the sysadm user.
28
29	1. Verify that you have configured rt-mailgate to work with your MTA
30	---------------------------------------------------------------------
31
32	Open the file /etc/aliases:
33
34	$ sudo editor /etc/aliases
35
36	In the file /etc/aliases you should have the following two lines:
37
38
39	net-comment: "\|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/"
40	net: "\|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/"
41
42
43	If these lines are not in /etc/aliases, then be sure to add them. When you are done save
44	the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some
45	new aliases to be used:
46
47	$ sudo newaliases
48
49
50	2. Configure Cacti to send emails to net@localhost to generate tickets in RT
51	----------------------------------------------------------------------------
52
53	This is the hardest of the three. If you have not installed the Plugin Architecture for
54	Cacti, then skip this exercise.
55
56	You can view how this work by logging in on the Cacti instance running on the noc
57	box as this has the Cacti Plugin Architecture installed and the two plugins called,
58	"Settings" and "Threshold".
59
60	To see how Cacti can generate a ticket first go to:
61
62	http://noc.ws.nsrc.org/cacti/
63
64	Log in as "admin" (system password). The do:
65
66	* Click on the Console tab (upper-left)
67	* Click on "Settings" (lower-left)
68	* Click on the "Mail / DNS" tab (upper-right)
69	* Verify that the fields for email are properly filled in:
70	- Test Email (sysadm or net @ localhost)
71	- Mail Services (PHP Mail() Function)
72	- From Email Address (cacti@localhost)
73	- From Name (Cacti System Monitor)
74	- SMTP Hostname (localhost)
75	- SMTP Port (25)
76
77	Now we need to create a threshold that we'll use to trigger an email that, in turn, will
78	create a ticket in RT:
79
80	* Click on "Thresholds" (middle-left)
81	* Click on the "Add" option (upper-right)
82	* Select a Host (localhost, for example)
83	* Select a Graph (Processes)
84	* Select the Data Source (proc)
85
86	Now you will be presented with a detailed screen where you can specify what should
87	happen if the threshhold is reached. Verify or do the following:
88
89	* Threshold Name: Something Descriptive
90	* Threshold Enabled is checked
91	* Threshold Type: High / Low Values (for Processes)
92	* High Threshold: 50 (this will cause the threshold to trip)
93	* Breach Duration: 5 minutes (this will give us ticket in 5 to 10 minutes)
94	* Data Type: Exact Value
95	* Re-Alert Cycle: Never
96	* Extra Alert Emails: net@localhost,sysadm@localhost
97
98	This will send an email to net@localhost within 5 or 10 minutes. This will create a
99	new ticket in RT. In addition an email will go to sysadm@localhost. You can view the
100	email as sysadm by doing:
101
102	$ mutt -f /var/mail/sysadm
103
104	You can create all types of threshold states that can be tripped, which will result in
105	ticket creation. Feel free to play around with the cacti instance on the Noc to create
106	new thresholds. You can see if they are working by logging in on the Noc instance of
107	Request Tracker (RT) at:
108
109	http://noc.ws.nsrc.org/rt/
110
111	Username "sysadm" and password is the class password.
112
113
114	3. Configure Smokeping
115	----------------------
116
117	In the file:
118
119	/etc/smokeping/config.d/Alerts
120
121	You can tell Smokeping where alert outputs should go. Edit the file:
122
123	$ sudo vi /etc/smokeping/config.d/Alerts
124
125	And Update the top of the file to be:
126
127	* Alerts *
128	to = net@localhost
129	from = smokealert@localhost
130
131	At the end of the file, add another alert like this:
132
133	+anydelay
134	type = rtt
135	# in milliseconds
136	pattern = >1
137	comment = Just for testing
138
139	Now exit and save the file.
140
141	Notice the pattern in this alert. It means that an alert will be triggered
142	as soon as a sample measurement has "ANY" delay, that is, more than one
143	millisecond. This is just for testing. In reality, you will want to create
144	an alert based on your observed baseline. For example, if your DNS servers'
145	delay suddendly goes from under 10 ms to over 100ms.
146
147	Next, be sure you have this test alert defined for some of your Targets.
148	You can either turn on alerts by defining alerts for a probe in
149	the /etc/smokeping/config.d/Probes file, or by individual Targets
150	entries.
151
152	In our case let's edit the Targets file and turn on alerts for our
153	DNS Latency checks.
154
155	$ sudo vi /etc/smokeping/config.d/Targets
156
157	Find (or add if necessary) the following section in the file:
158
159	+DNS
160	probe = DNS
161	...
162
163	And find the entry for our Local DNS server:
164
165	++LocalDNS1
166	menu = 10.10.0.241
167	title = DNS Dela for local DNS Server s1
168	host = s1
169
170	And add the following alerts line after the "host = s1" line:
171
172	alerts = anydelay
173
174	Save and exit from the file, then restart smokeping:
175
176	$ sudo service smokeping restart
177
178	Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes
179	for a new ticket to appear.
180
181
182	Note - If you have not already configured the DNS Latency checks for Smokeping you may need to
183	edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this:
184
185	$ sudo vi /etc/smokeping/config.d/Probes
186
187	And, at the bottom of the file add:
188
189	+ DNS
190	binary = /usr/bin/dig
191	pings = 5
192	step = 180
193	lookup = www.nsrc.org
194
195	Save and exit from the file and restart Smokeping:
196
197	$ sudo service smokeping restart
198
199
200	4. Nagios and Request Tracker Ticket Creation
201	----------------------------------------------
202
203	To configure RT and Nagios so that alerts from Nagios automatically
204	create tickets requires a few steps:
205
206	* Create a proper contact entry for Nagios in
207	/etc/nagios3/conf.d/contacts_nagios2.cfg
208
209	* Create the proper command in Nagios to use the rt-mailgate
210	interface. The command is defined in /etc/nagios3/commands.cfg
211
212	These next two items should already be done in RT if you have
213	finished the RT exercises.
214
215	* Install the rt-mailgate software and configure it properly
216	in your /etc/aliases file for your MTA in use.
217
218	* Configure the appropriate queues in RT to receive emails
219	passed to it from Nagios via the rt-mailgate software.
220
221
222	5. Configure a Contact in Nagios
223	---------------------------------
224
225	- Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg
226
227	# vi /etc/nagios3/conf.d/contacts_nagios2.cfg
228
229	- In this file we will first add a new contact name under
230	the default root contact entry. The new contact should
231	look like this:
232
233	define contact{
234	contact_name net
235	alias RT Alert Queue
236	service_notification_period 24x7
237	host_notification_period 24x7
238	service_notification_options c
239	host_notification_options d
240	service_notification_commands notify-service-ticket-by-email
241	host_notification_commands notify-host-ticket-by-email
242	email net@localhost
243	}
244
245	- _DO NOT_ remote the "root" contact_name entry! This entry goes
246	below the "root" contact.
247
248	- the service_notification_option of "c" means only notify once a
249	service is considered "critical" by Nagios (i.e. down). The
250	host_notification_option of "d" means down. By specify only "c"
251	and "d" this means that notifications will not be sent for other
252	states.
253
254	- Note the email address in use "net@localhost" - this is important
255	as this was previously defined for RT.
256
257	- Now we must create a Contact Group that contains this contact.
258	We will call this group "tickets." Do this at the end of the file:
259
260	define contactgroup{
261	contactgroup_name tickets
262	alias email to ticket system for RT
263	members net,root
264	}
265
266	- You could leave off "root" as a member, but we've left this on to
267	have another user that receives email to help us troubleshoot if
268	there are issues.
269
270	- Now that your contact has been created you need to create the commands
271	that were referenced in the initial contact creation above, these are
272	"notify-service-ticket-by-email" and "notify-host-ticket-by-email"
273
274
275	6. Update Nagios Commands
276	-------------------------
277
278	- To create the notify-service-ticket-by-email and notify-host-ticket-by-email
279	commands we need to edit the file /etc/nagios3/commands.cfg.
280
281	# vi /etc/nagios3/commands.cfg
282
283	- In this file you already have two command definitions that we are using. These are
284	called notify-host-by-email and notify-service-by-email. We are going to add two
285	new commands.
286
287	- We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible
288	to type it without errors.
289
290	- Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email
291	command entries. Do not remove the old one.
292
293	- NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as
294	COPY and PASTE between some editors and environments may insert line breaks.
295
296	################################################################
297	# Additional commands created for network management workshop #
298	################################################################
299
300	# 'notifiy-host-ticket-by-email' command definition
301	define command{
302	command_name notify-host-ticket-by-email
303	command_line /usr/bin/printf "%b" "*** Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" \| /usr/bin/mail -s " $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
304	}
305
306	# 'notify-service-ticket-by-email' command definition
307	define command{
308	command_name notify-service-ticket-by-email
309	command_line /usr/bin/printf "%b" "*** Nagios *\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" \| /usr/bin/mail -s " $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
310	}
311
312
313	7. Choose a Service to Monitor with RT Tickets
314	----------------------------------------------
315
316
317	- The final step is to tell Nagios that you wish to notify the contact "tickets" for a
318	particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the
319	default contact_groups is "admins". To override this for a service edit the file
320	/etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the
321	service definitions.
322
323	- To send email to generate tickets in RT if HTTP goes down on a box you would edit the
324	HTTP service check so that it looks like this:
325
326	# check that web services are running
327	define service {
328	hostgroup_name http-servers
329	service_description HTTP
330	check_command check_http
331	use generic-service
332	notification_interval 0 ; set > 0 if you want to be renotified
333	contact_groups tickets
334	}
335
336	Note the additional item that we now have, "contact_groups." You can do this for other
337	entries as well if you wish.
338
339	- When you are done, save the file and exit.
340
341	- Now restart Nagios to verify your changes are correct.
342
343	# /etc/init.d/nagios3 stop
344	# /etc/init.d/nagios3 start
345
346
347	4.) Generate RT Tickets for Hosts
348	---------------------------------
349
350	- To do this you must either specify "contact_groups tickets" for individual host
351	definitions, or you must update the template file for all hosts and change the
352	default contact_groups entry to tickets. This file is generic-host_nagios2.cfg.
353
354	- If you wish to do this go ahead. Tickets will be generated if a host goes down
355	and you have specified the contact_groups for that host as being "tickets"
356
357	5. See Nagios Tickets in RT
358	---------------------------
359
360	To verify your changes have worked we can be sure to monitor for HTTP one of our
361	servers that is not running HTTP. Let's pick the second Mac Mini in our class
362	or the box known as "s2.ws.nsrc.org" (see the network diagram for details).
363
364	If you do not have an entry for this machine add on to the file where your PCs
365	are defined. If this is in a file called pcs.cfg you would do:
366
367	# vi /etc/nagios3/conf.d/pcs.cfg
368
369	In this file add (or verify you have) an entry that looks like this:
370
371	define host {
372	use generic-host
373	host_name s2
374	alias s2
375	address 10.10.0.242
376	parents sw
377	}
378
379	Save and exit from the file.
380
381	Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup
382	for HTTP service checks:
383
384	# vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg
385
386	Look for the "hostgroup_name http-servers" entry and update it so that it looks like this:
387
388
389	# A list of your web servers
390	define hostgroup {
391	hostgroup_name http-servers
392	alias HTTP servers
393	members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12,
394	pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24,
395	pc25,pc26,pc28,s2
396	}
397
398
399	_REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s2"
400	has been entered on the end of the line.
401
402	Now save the file and exit and restart Nagios:
403
404	# service nagios3 stop
405	# service nagios3 start
406
407
408	- It will take a while (up to 10 minutes) for Nagios to report that HTTP is
409	"critical", but once that happens a new ticket should appear in your RT instance
410	in the net queue generated by Nagios.
411
412	- Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin"
413	with the password you chose when you created the RT sysadmin account. The new
414	ticket should appear in the "10 newest unowned tickets" box in the main log in
415	page in RT.