| 1 | Network Monitoring and Management |
|---|
| 2 | |
|---|
| 3 | Cacti, Nagios and Smokeping Ticket Creation with Request Tracker |
|---|
| 4 | ---------------------------------------------------------------- |
|---|
| 5 | |
|---|
| 6 | Notes: |
|---|
| 7 | ------ |
|---|
| 8 | * Commands preceded with "$" imply that you should execute the command as |
|---|
| 9 | a general user - not as root. |
|---|
| 10 | * Commands preceded with "#" imply that you should be working as root. |
|---|
| 11 | * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") |
|---|
| 12 | imply that you are executing commands on remote equipment, or within |
|---|
| 13 | another program. |
|---|
| 14 | |
|---|
| 15 | Exercises |
|---|
| 16 | --------- |
|---|
| 17 | |
|---|
| 18 | At this point in the week you should have Cacti, Nagios and Smokeping |
|---|
| 19 | installed on your PCs. These exercises show you how to set up each |
|---|
| 20 | of these programs to send alerts to the RT (Request Tracker) ticketing |
|---|
| 21 | system to generate tickets. |
|---|
| 22 | |
|---|
| 23 | |
|---|
| 24 | Exercises Part I |
|---|
| 25 | ---------------- |
|---|
| 26 | |
|---|
| 27 | 0. Log in to your PC or open a terminal window as the sysadm user. |
|---|
| 28 | |
|---|
| 29 | 1. Verify that you have configured rt-mailgate to work with your MTA |
|---|
| 30 | --------------------------------------------------------------------- |
|---|
| 31 | |
|---|
| 32 | Open the file /etc/aliases: |
|---|
| 33 | |
|---|
| 34 | $ sudo editor /etc/aliases |
|---|
| 35 | |
|---|
| 36 | In the file /etc/aliases you should have the following two lines: |
|---|
| 37 | |
|---|
| 38 | net-comment: "|/usr/bin/rt-mailgate --queue net --action comment --url http://localhost/rt/" |
|---|
| 39 | net: "|/usr/bin/rt-mailgate --queue net --action correspond --url http://localhost/rt/" |
|---|
| 40 | |
|---|
| 41 | If these lines are not in /etc/aliases, then be sure to add them. When you are done save |
|---|
| 42 | the file and exit. Then you need to tell the MTA (Mail Transfer Agent) that there are some |
|---|
| 43 | new aliases to be used: |
|---|
| 44 | |
|---|
| 45 | $ sudo newaliases |
|---|
| 46 | |
|---|
| 47 | |
|---|
| 48 | 2. Configure Smokeping |
|---|
| 49 | ---------------------- |
|---|
| 50 | |
|---|
| 51 | In the file: |
|---|
| 52 | |
|---|
| 53 | /etc/smokeping/config.d/Alerts |
|---|
| 54 | |
|---|
| 55 | You can tell Smokeping where alert outputs should go. Edit the file: |
|---|
| 56 | |
|---|
| 57 | $ sudo vi /etc/smokeping/config.d/Alerts |
|---|
| 58 | |
|---|
| 59 | And Update the top of the file to be: |
|---|
| 60 | |
|---|
| 61 | *** Alerts *** |
|---|
| 62 | to = net@localhost |
|---|
| 63 | from = smokealert@localhost |
|---|
| 64 | |
|---|
| 65 | At the end of the file, add another alert like this: |
|---|
| 66 | |
|---|
| 67 | +anydelay |
|---|
| 68 | type = rtt |
|---|
| 69 | # in milliseconds |
|---|
| 70 | pattern = >1 |
|---|
| 71 | comment = Just for testing |
|---|
| 72 | |
|---|
| 73 | Be sure that all text is flush left in the file. |
|---|
| 74 | |
|---|
| 75 | Now exit and save the file. |
|---|
| 76 | |
|---|
| 77 | Notice the pattern in this alert. It means that an alert will be triggered |
|---|
| 78 | as soon as a sample measurement has "ANY" delay, that is, more than one |
|---|
| 79 | millisecond. This is just for testing. In reality, you will want to create |
|---|
| 80 | an alert based on your observed baseline. For example, if your DNS servers' |
|---|
| 81 | delay suddendly goes from under 10 ms to over 100ms. |
|---|
| 82 | |
|---|
| 83 | Next, be sure you have this test alert defined for some of your Targets. |
|---|
| 84 | You can either turn on alerts by defining alerts for a probe in |
|---|
| 85 | the /etc/smokeping/config.d/Probes file, or by individual Targets |
|---|
| 86 | entries. |
|---|
| 87 | |
|---|
| 88 | In our case let's edit the Targets file and turn on alerts for our |
|---|
| 89 | DNS Latency checks. |
|---|
| 90 | |
|---|
| 91 | $ sudo vi /etc/smokeping/config.d/Targets |
|---|
| 92 | |
|---|
| 93 | Find (or add if necessary) the following section in the file: |
|---|
| 94 | |
|---|
| 95 | +DNS |
|---|
| 96 | probe = DNS |
|---|
| 97 | ... |
|---|
| 98 | |
|---|
| 99 | Now let's add an entry for a global DNS server that responds recursively. |
|---|
| 100 | |
|---|
| 101 | ++GoogleA |
|---|
| 102 | menu = 8.8.8.8 |
|---|
| 103 | title = DNS Latency for google-public-dns-a.google.com |
|---|
| 104 | host = google-public-dns-a.google.com |
|---|
| 105 | alerts = anydelay |
|---|
| 106 | |
|---|
| 107 | Notice the line that says, "alerts=anydelay". |
|---|
| 108 | |
|---|
| 109 | So, in summary - you should have in your Targets file the following section near |
|---|
| 110 | the bottom of the file: |
|---|
| 111 | |
|---|
| 112 | +DNS |
|---|
| 113 | probe = DNS |
|---|
| 114 | menu = DNS Latency |
|---|
| 115 | title = DNS Latency Probes |
|---|
| 116 | |
|---|
| 117 | ++GoogleA |
|---|
| 118 | menu = 8.8.8.8 |
|---|
| 119 | title = DNS Latency for google-public-dns-a.google.com |
|---|
| 120 | host = google-public-dns-a.google.com |
|---|
| 121 | alerts = anydelay |
|---|
| 122 | |
|---|
| 123 | (items should be flush left in the file). |
|---|
| 124 | |
|---|
| 125 | Save and exit from the file, then restart smokeping: |
|---|
| 126 | |
|---|
| 127 | $ sudo service smokeping restart |
|---|
| 128 | |
|---|
| 129 | Now check RT to see if you have received anything from Smokeping. It may take up to 5 minutes |
|---|
| 130 | for a new ticket to appear. |
|---|
| 131 | |
|---|
| 132 | NOTE: - If you have not already configured the DNS Latency checks for Smokeping you may need to |
|---|
| 133 | edit the file /etc/smokeping/config.d/Probes and add in the entry for DNS like this: |
|---|
| 134 | |
|---|
| 135 | $ sudo vi /etc/smokeping/config.d/Probes |
|---|
| 136 | |
|---|
| 137 | And, at the bottom of the file add: |
|---|
| 138 | |
|---|
| 139 | + DNS |
|---|
| 140 | binary = /usr/bin/dig |
|---|
| 141 | pings = 5 |
|---|
| 142 | step = 180 |
|---|
| 143 | lookup = www.nsrc.org |
|---|
| 144 | |
|---|
| 145 | Save and exit from the file and restart Smokeping: |
|---|
| 146 | |
|---|
| 147 | $ sudo service smokeping restart |
|---|
| 148 | |
|---|
| 149 | |
|---|
| 150 | 3. Nagios and Request Tracker Ticket Creation |
|---|
| 151 | ---------------------------------------------- |
|---|
| 152 | |
|---|
| 153 | To configure RT and Nagios so that alerts from Nagios automatically |
|---|
| 154 | create tickets requires a few steps: |
|---|
| 155 | |
|---|
| 156 | * Create a proper contact entry for Nagios in |
|---|
| 157 | /etc/nagios3/conf.d/contacts_nagios2.cfg |
|---|
| 158 | |
|---|
| 159 | * Create the proper command in Nagios to use the rt-mailgate |
|---|
| 160 | interface. The command is defined in /etc/nagios3/commands.cfg |
|---|
| 161 | |
|---|
| 162 | These next two items should already be done in RT if you have |
|---|
| 163 | finished the RT exercises. |
|---|
| 164 | |
|---|
| 165 | * Install the rt-mailgate software and configure it properly |
|---|
| 166 | in your /etc/aliases file for your MTA in use. |
|---|
| 167 | |
|---|
| 168 | * Configure the appropriate queues in RT to receive emails |
|---|
| 169 | passed to it from Nagios via the rt-mailgate software. |
|---|
| 170 | |
|---|
| 171 | |
|---|
| 172 | 5. Configure a Contact in Nagios |
|---|
| 173 | --------------------------------- |
|---|
| 174 | |
|---|
| 175 | - Edit the file /etc/nagios3/conf.d/contacts_nagios2.cfg |
|---|
| 176 | |
|---|
| 177 | $ sudo bash |
|---|
| 178 | # vi /etc/nagios3/conf.d/contacts_nagios2.cfg |
|---|
| 179 | |
|---|
| 180 | - In this file we will first add a new contact name under |
|---|
| 181 | the default root contact entry. The new contact should |
|---|
| 182 | look like this: |
|---|
| 183 | |
|---|
| 184 | define contact{ |
|---|
| 185 | contact_name net |
|---|
| 186 | alias RT Alert Queue |
|---|
| 187 | service_notification_period 24x7 |
|---|
| 188 | host_notification_period 24x7 |
|---|
| 189 | service_notification_options c |
|---|
| 190 | host_notification_options d |
|---|
| 191 | service_notification_commands notify-service-ticket-by-email |
|---|
| 192 | host_notification_commands notify-host-ticket-by-email |
|---|
| 193 | email net@localhost |
|---|
| 194 | } |
|---|
| 195 | |
|---|
| 196 | - _DO NOT_ remote the "root" contact_name entry! This entry goes |
|---|
| 197 | below the "root" contact. |
|---|
| 198 | |
|---|
| 199 | - the service_notification_option of "c" means only notify once a |
|---|
| 200 | service is considered "critical" by Nagios (i.e. down). The |
|---|
| 201 | host_notification_option of "d" means down. By specify only "c" |
|---|
| 202 | and "d" this means that notifications will not be sent for other |
|---|
| 203 | states. |
|---|
| 204 | |
|---|
| 205 | - Note the email address in use "net@localhost" - this is important |
|---|
| 206 | as this was previously defined for RT. |
|---|
| 207 | |
|---|
| 208 | - Now we must create a Contact Group that contains this contact. |
|---|
| 209 | We will call this group "tickets." Do this at the end of the file: |
|---|
| 210 | |
|---|
| 211 | define contactgroup{ |
|---|
| 212 | contactgroup_name tickets |
|---|
| 213 | alias email to ticket system for RT |
|---|
| 214 | members net,root |
|---|
| 215 | } |
|---|
| 216 | |
|---|
| 217 | - You could leave off "root" as a member, but we've left this on to |
|---|
| 218 | have another user that receives email to help us troubleshoot if |
|---|
| 219 | there are issues. |
|---|
| 220 | |
|---|
| 221 | - Now that your contact has been created you need to create the commands |
|---|
| 222 | that were referenced in the initial contact creation above, these are |
|---|
| 223 | "notify-service-ticket-by-email" and "notify-host-ticket-by-email" |
|---|
| 224 | |
|---|
| 225 | |
|---|
| 226 | 6. Update Nagios Commands |
|---|
| 227 | ------------------------- |
|---|
| 228 | |
|---|
| 229 | - To create the notify-service-ticket-by-email and notify-host-ticket-by-email |
|---|
| 230 | commands we need to edit the file /etc/nagios3/commands.cfg. |
|---|
| 231 | |
|---|
| 232 | # vi /etc/nagios3/commands.cfg |
|---|
| 233 | |
|---|
| 234 | - In this file you already have two command definitions that we are using. These are |
|---|
| 235 | called notify-host-by-email and notify-service-by-email. We are going to add two |
|---|
| 236 | new commands. |
|---|
| 237 | |
|---|
| 238 | - We _strongly_ suggest that you COPY and PASTE the text below. It is almost impossible |
|---|
| 239 | to type it without errors. |
|---|
| 240 | |
|---|
| 241 | - Put these two new entries _BELOW_ the current notify-host-by-email and notify-service-by-email |
|---|
| 242 | command entries. Do not remove the old one. |
|---|
| 243 | |
|---|
| 244 | - NOTE: The "commands below do not contain breaks. They are a single line. Be aware of this as |
|---|
| 245 | COPY and PASTE between some editors and environments may insert line breaks. |
|---|
| 246 | |
|---|
| 247 | ################################################################ |
|---|
| 248 | # Additional commands created for network management workshop # |
|---|
| 249 | ################################################################ |
|---|
| 250 | |
|---|
| 251 | # 'notifiy-host-ticket-by-email' command definition |
|---|
| 252 | define command{ |
|---|
| 253 | command_name notify-host-ticket-by-email |
|---|
| 254 | command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ |
|---|
| 255 | } |
|---|
| 256 | |
|---|
| 257 | # 'notify-service-ticket-by-email' command definition |
|---|
| 258 | define command{ |
|---|
| 259 | command_name notify-service-ticket-by-email |
|---|
| 260 | command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ |
|---|
| 261 | } |
|---|
| 262 | |
|---|
| 263 | |
|---|
| 264 | 7. Choose a Service to Monitor with RT Tickets |
|---|
| 265 | ---------------------------------------------- |
|---|
| 266 | |
|---|
| 267 | |
|---|
| 268 | - The final step is to tell Nagios that you wish to notify the contact "tickets" for a |
|---|
| 269 | particular service. If you look in /etc/nagios3/conf.d/generic-service_nagios2.cfg the |
|---|
| 270 | default contact_groups is "admins". To override this for a service edit the file |
|---|
| 271 | /etc/nagios3/conf.d/services_nagios2.cfg and a contact_groups entry for one of the |
|---|
| 272 | service definitions. |
|---|
| 273 | |
|---|
| 274 | - To send email to generate tickets in RT if HTTP goes down on a box you would edit the |
|---|
| 275 | HTTP service check so that it looks like this: |
|---|
| 276 | |
|---|
| 277 | # check that web services are running |
|---|
| 278 | define service { |
|---|
| 279 | hostgroup_name http-servers |
|---|
| 280 | service_description HTTP |
|---|
| 281 | check_command check_http |
|---|
| 282 | use generic-service |
|---|
| 283 | notification_interval 0 ; set > 0 if you want to be renotified |
|---|
| 284 | contact_groups tickets |
|---|
| 285 | } |
|---|
| 286 | |
|---|
| 287 | Note the additional item that we now have, "contact_groups." You can do this for other |
|---|
| 288 | entries as well if you wish. |
|---|
| 289 | |
|---|
| 290 | - When you are done, save the file and exit. |
|---|
| 291 | |
|---|
| 292 | - Now restart Nagios to verify your changes are correct. |
|---|
| 293 | |
|---|
| 294 | # /etc/init.d/nagios3 stop |
|---|
| 295 | # /etc/init.d/nagios3 start |
|---|
| 296 | |
|---|
| 297 | |
|---|
| 298 | 4.) Generate RT Tickets for Hosts |
|---|
| 299 | --------------------------------- |
|---|
| 300 | |
|---|
| 301 | - To do this you must either specify "contact_groups tickets" for individual host |
|---|
| 302 | definitions, or you must update the template file for all hosts and change the |
|---|
| 303 | default contact_groups entry to tickets. This file is generic-host_nagios2.cfg. |
|---|
| 304 | |
|---|
| 305 | - If you wish to do this go ahead. Tickets will be generated if a host goes down |
|---|
| 306 | and you have specified the contact_groups for that host as being "tickets" |
|---|
| 307 | |
|---|
| 308 | 5. See Nagios Tickets in RT |
|---|
| 309 | --------------------------- |
|---|
| 310 | |
|---|
| 311 | To verify your changes have worked we can be sure to monitor for HTTP one of our |
|---|
| 312 | servers that is not running HTTP. Let's pick the second Mac Mini in our class |
|---|
| 313 | or the box known as "s1.ws.nsrc.org" (see the network diagram for details). |
|---|
| 314 | |
|---|
| 315 | If you do not have an entry for this machine add on to the file where your PCs |
|---|
| 316 | are defined. If this is in a file called pcs.cfg you would do: |
|---|
| 317 | |
|---|
| 318 | # vi /etc/nagios3/conf.d/pcs.cfg |
|---|
| 319 | |
|---|
| 320 | In this file add (or verify you have) an entry that looks like this: |
|---|
| 321 | |
|---|
| 322 | define host { |
|---|
| 323 | use generic-host |
|---|
| 324 | host_name s1 |
|---|
| 325 | alias s1 |
|---|
| 326 | address 10.10.0.241 |
|---|
| 327 | parents sw |
|---|
| 328 | } |
|---|
| 329 | |
|---|
| 330 | Save and exit from the file. |
|---|
| 331 | |
|---|
| 332 | Now edit the file named /etc/nagios3/conf.d/hostgroups_nagios2.cfg and add s2 to the hostgroup |
|---|
| 333 | for HTTP service checks: |
|---|
| 334 | |
|---|
| 335 | # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg |
|---|
| 336 | |
|---|
| 337 | Look for the "hostgroup_name http-servers" entry and update it so that it looks like this: |
|---|
| 338 | |
|---|
| 339 | |
|---|
| 340 | # A list of your web servers |
|---|
| 341 | define hostgroup { |
|---|
| 342 | hostgroup_name http-servers |
|---|
| 343 | alias HTTP servers |
|---|
| 344 | members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9,pc10,pc11,pc12, |
|---|
| 345 | pc13,pc14,pc15,pc16,pc17,pc18,pc19,pc20,pc21,pc22,pc23,pc24, |
|---|
| 346 | pc25,pc26,pc28,pc29,pc30,pc31,pc32,pc35,pc37,pc39,s1 |
|---|
| 347 | } |
|---|
| 348 | |
|---|
| 349 | |
|---|
| 350 | _REMEMBER_ that the line with all the "members" must not have any line breaks. Notice that "s1" |
|---|
| 351 | has been entered on the end of the line. |
|---|
| 352 | |
|---|
| 353 | Now save the file and exit and restart Nagios: |
|---|
| 354 | |
|---|
| 355 | # service nagios3 stop |
|---|
| 356 | # service nagios3 start |
|---|
| 357 | |
|---|
| 358 | |
|---|
| 359 | - It will take a while (up to 10 minutes) for Nagios to report that HTTP is |
|---|
| 360 | "critical", but once that happens a new ticket should appear in your RT instance |
|---|
| 361 | in the net queue generated by Nagios. |
|---|
| 362 | |
|---|
| 363 | - Remember to see this go to http://pcX.ws.nsrc.org/rt/ and log in as Username "sysadmin" |
|---|
| 364 | with the password you chose when you created the RT sysadmin account. The new |
|---|
| 365 | ticket should appear in the "10 newest unowned tickets" box in the main log in |
|---|
| 366 | page in RT. |
|---|
| 367 | |
|---|
| 368 | 6. Configure Cacti to send emails to net@localhost to generate tickets in RT |
|---|
| 369 | ---------------------------------------------------------------------------- |
|---|
| 370 | |
|---|
| 371 | If you have not installed the Plugin Architecture for Cacti, then please be sure to |
|---|
| 372 | attempt this exercise last. |
|---|
| 373 | |
|---|
| 374 | You can view how this work by logging in on the Cacti instance running on the noc |
|---|
| 375 | box as this has the Cacti Plugin Architecture installed and the two plugins called, |
|---|
| 376 | "Settings" and "Threshold". |
|---|
| 377 | |
|---|
| 378 | To see how Cacti can generate a ticket first go to: |
|---|
| 379 | |
|---|
| 380 | http://noc.ws.nsrc.org/cacti/ |
|---|
| 381 | |
|---|
| 382 | Log in as "admin" (system password). The do: |
|---|
| 383 | |
|---|
| 384 | * Click on the Console tab (upper-left) |
|---|
| 385 | * Click on "Settings" (lower-left) |
|---|
| 386 | * Click on the "Mail / DNS" tab (upper-right) |
|---|
| 387 | * Verify that the fields for email are properly filled in: |
|---|
| 388 | - Test Email (sysadm or net @ localhost) |
|---|
| 389 | - Mail Services (PHP Mail() Function) |
|---|
| 390 | - From Email Address (cacti@localhost) |
|---|
| 391 | - From Name (Cacti System Monitor) |
|---|
| 392 | - SMTP Hostname (localhost) |
|---|
| 393 | - SMTP Port (25) |
|---|
| 394 | |
|---|
| 395 | Now we need to create a threshold that we'll use to trigger an email that, in turn, will |
|---|
| 396 | create a ticket in RT: |
|---|
| 397 | |
|---|
| 398 | * Click on "Thresholds" (middle-left) |
|---|
| 399 | * Click on the "Add" option (upper-right) |
|---|
| 400 | * Select a Host (localhost, for example) |
|---|
| 401 | * Select a Graph (Processes) |
|---|
| 402 | * Select the Data Source (proc) |
|---|
| 403 | * Click on the "create" button |
|---|
| 404 | |
|---|
| 405 | Now you will be presented with a detailed screen where you can specify what should |
|---|
| 406 | happen if the threshhold is reached. Verify or do the following: |
|---|
| 407 | |
|---|
| 408 | * Threshold Name: Something Descriptive |
|---|
| 409 | * Very that "Threshold Enabled" is checked |
|---|
| 410 | * Threshold Type: High / Low Values (for Processes) |
|---|
| 411 | * High Threshold: 50 (this will cause the threshold to trip) |
|---|
| 412 | * Breach Duration: 5 minutes (this will give us ticket in 5 to 10 minutes) |
|---|
| 413 | * Data Type: Exact Value |
|---|
| 414 | * Re-Alert Cycle: Never |
|---|
| 415 | * Extra Alert Emails: net@localhost,sysadm@localhost |
|---|
| 416 | |
|---|
| 417 | This will send an email to net@localhost within 5 or 10 minutes. This will create a |
|---|
| 418 | new ticket in RT. In addition an email will go to sysadm@localhost. You can view the |
|---|
| 419 | email as the sysadm user by doing: |
|---|
| 420 | |
|---|
| 421 | $ mutt -f /var/mail/sysadm |
|---|
| 422 | |
|---|
| 423 | You can create all types of threshold states that can be tripped, which will result in |
|---|
| 424 | ticket creation. Feel free to play around with the cacti instance on the Noc to create |
|---|
| 425 | new thresholds. You can see if they are working by logging in on the Noc instance of |
|---|
| 426 | Request Tracker (RT) at: |
|---|
| 427 | |
|---|
| 428 | http://noc.ws.nsrc.org/rt/ |
|---|
| 429 | |
|---|
| 430 | Username "sysadm" and password is the class password. |
|---|
| 431 | |
|---|
| 432 | |
|---|
| 433 | +-----+ |
|---|
| 434 | Last update 2jun2011 |
|---|
| 435 | Hervey Allen |
|---|