Registry Operations Curriculum Nagios Installation and Configuration Notes: ------ * Commands preceded with "$" imply that you should execute the command as a general user - not as root. * Commands preceded with "#" imply that you should be working as root. * Commands with more specific command lines (e.g. "RTR-GW>" or "mysql>") imply that you are executing commands on remote equipment, or within another program. Exercises --------- Exercises Part I ---------------- 0. Log in to your PC or open a terminal window as the tladmain user. 1. You could nstall Nagios version 3. You would do this as root or as the tldadmin user and use the "sudo" command: # apt-get install nagios3 Nagios version 3 is already installed, but you can still run the command. 2. Create the Web user password file: # htpasswd -c /etc/nagios3/htpasswd.users nagiosadmin New password: Re-type new password: We suggest you use your standard user password used in class. 2. You should already have a working Nagios! - Open a browser, and go to http://localhost/nagios3/ - At the login prompt, login as: user: nagiosadmin pass: 3. Let's look at the interface together... # cd /etc/nagios3/ # ls -l -rw-r--r-- 1 root root 1882 2008-12-18 13:42 apache2.conf -rw-r--r-- 1 root root 10524 2008-12-18 13:44 cgi.cfg -rw-r--r-- 1 root root 2429 2008-12-18 13:44 commands.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:33 conf.d -rw-r--r-- 1 root root 26 2009-02-14 12:36 htpasswd.users -rw-r--r-- 1 root root 42539 2008-12-18 13:44 nagios.cfg -rw-r----- 1 root nagios 1293 2008-12-18 13:42 resource.cfg drwxr-xr-x 2 root root 4096 2009-02-14 12:32 stylesheets # ls -l conf.d/ -rw-r--r-- 1 root root 1695 2008-12-18 13:42 contacts_nagios2.cfg -rw-r--r-- 1 root root 418 2008-12-18 13:42 extinfo_nagios2.cfg -rw-r--r-- 1 root root 1152 2008-12-18 13:42 generic-host_nagios2.cfg -rw-r--r-- 1 root root 1803 2008-12-18 13:42 generic-service_nagios2.cfg -rw-r--r-- 1 root root 210 2009-02-14 12:33 host-gateway_nagios3.cfg -rw-r--r-- 1 root root 976 2008-12-18 13:42 hostgroups_nagios2.cfg -rw-r--r-- 1 root root 2167 2008-12-18 13:42 localhost_nagios2.cfg -rw-r--r-- 1 root root 1005 2008-12-18 13:42 services_nagios2.cfg -rw-r--r-- 1 root root 1609 2008-12-18 13:42 timeperiods_nagios2.cfg Notice that the package does not have renamed filenames for the conf.d directory - they are the same files as used for the Nagios version 2 Ubuntu package. There was an update made to the host-gateway configuration file so this has been renamed. PART II Configuring Equipment ----------------------------------------------------------------------------- 0. Order of configuration Conceptually we will build our configuration files from the "top" of our network down. That is we define entries for our gateway router and swith first, then our group routers and switches. Once we have these entries we will add an entry for our NOC machine, then pc1, pc2, pc3, etc... By going in this order you will have defined the devices that act as parents for other devices. 1. Let's configure Nagios to start monitoring our classroom gateway router: # cd /etc/nagios3/conf.d/ # vi routers.cfg define host { use generic-host host_name bb-gw alias cctld border router address 192.168.17.2 } Now define entries for our two group routers: define host { use generic-host host_name pc1-pc9-gw alias cctld group 1 router address 192.168.5.129 } define host { use generic-host host_name pc10-pc18-gw alias cctld group 2 router address 192.168.5.161 } Save and exit from the file /etc/nagios3/conf.d/routers.cfg 2. Configure our classroom switches Now that we have our routers configured we can configure our switches. Note that each switch will have a parent relationship with the router next to them. # vi switches.cfg First the switch on our classroom backbone: define host { use generic-host host_name bb-sw alias cctld backbone switch address 192.168.17.4 parents bb-gw } Notice the "parents" entry. You can only add the parent entry once you have a definition for "bb-gw". We did this in our routers.cfg file, so this will work. Now add in the switches for the two groups: define host { use generic-host host_name pc1-pc9-sw alias cctld group 1 switch address 192.168.5.130 parents pc1-pc9-gw } define host { use generic-host host_name pc10-pc18-sw alias cctld group 2 switch address 192.168.5.162 parents pc10-pc18-gw } Save and exit from the file switches.cfg 3. Update the file routers.cfg with parents The border router does not have a parent for purposes of our class. In reality it does, but you have to stop your monitoring somewhere. Our two group routers, however, now have a parent defined. This is the backbone switch. We need to update our group router entries to look like this: define host { use generic-host host_name pc1-pc9-gw alias cctld group 1 router address 192.168.5.129 parents bb-sw } define host { use generic-host host_name pc10-pc18-gw alias cctld group 2 router address 192.168.5.161 parents bb-sw } Save and exit from the file routers.cfg 4. Create entries for each PC in the classroom Now that we have our routers and switches defined it is quite easy to create entries for all our PCs. Think about the parent relationships. The parent of the NOC is the backbone switch. The parent of pc1 through pc9 is the switch for group 1. The parent for pc10-18 is the switch for group 2. Below are three sample entries. One for the NOC, one for pc1 and one for pc10. You should be able to use this example to create entries for all 18 classroom pcs plus the NOC: We could put these entries in to separate files, but as our network is small we'll use a single file called pcs.cfg. NOTE! You do not add in an entry for your PC. This has already been defined in the file /etc/nagios3/conf.d/localhost_nagios2.cfg. This definition is what define's the Nagios network viewpoint. So, when you come to the spot where you might add an entry for your PC you should skip this and go on to the next PC in the list (i.e. from pc1 to pc18). # vi pcs.cfg # Our classroom NOC define host { use generic-host host_name noc alias aroc cctld NOC machine address 192.168.17.5 parents bb-sw } # Group 1 PCs define host { use generic-host host_name pc1 alias pc1 group 1 aroc cctld address 192.168.5.131 parents pc1-pc9-sw } # Group 2 PCs define host { use generic-host host_name pc10 alias pc10 group 2 aroc cctld address 192.168.5.170 parents pc10-pc18-sw } Take the three entries above and now expand this to create the remaining entries for pc1-pc9 and for pc10-pc18. If you have any questions about IP addresses, etc. you can review the Network Diagram for the class linked off the classroom wiki main page at http://localhost/trac/. Exit and save the file pcs.cfg Now let's verify that our initial Nagios configuration is working: 5. Verify that your configuration files are OK: # nagios3 -v /etc/nagios3/nagios.cfg ... You should get : Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the check. 5. Reload/Restart Nagios # /etc/init.d/nagios3 restart Not always 100% reliable to use the "restart" option due to a bug in the Nagios init script. To be sure you may want to get used to doing: # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start 6. Go to the web interface (http://localhost/nagios3) and check that the hosts you just added are now visible in the interface. Click on the "Host Detail" item on the left of the Nagios screen to see this. PART III Configure Service check for your the classroom NOC ----------------------------------------------------------------------------- 0. Configuring Now that we have our hardware configured we can start telling Nagios what services to monitor on the configured hardware, how to group the hardware in interesting ways, how to group services, etc. 1. Associate a service check for our classroom NOC # vi hostgroups_nagios2.cfg - Find the hostgroup named "ssh-servers". In the members section of the defintion change the line: members localhost to members localhost,noc Exit and save the file. Verify that your changes are OK: # nagios3 -v /etc/nagios3/nagios.cfg Restart Nagios to see the new service assocation with your host: # /etc/init.d/nagios3 Restart Click on the "Service Detail" link in the Nagios web interface to see your new entry. PART IV Defining Services for all PCs ----------------------------------------------------------------------------- 1. Determine what services to define for what devices - This is core to how you use Nagios and network monitoring tools in general. So far we are simply using ping to verify that physical hosts are up on our network and we have started monitoring a single service on a single host (your PC). The next step is to decide what services you wish to monitor for each host in the classroom. - In this particular class we have: routers: running ssh and snmp switches: running telnet and possibly ssh as well as snmp pcs: All PCs are running ssh and http and should be running snmp The NOC is currently running an snmp daemon So, let's configure Nagios to check for these services for these devices. 2.) Verify that SSH is running on the routers and workshop PCs images - In the file services_nagios2.cfg there is already an entry for the SSH service check, so you do not need to create this step. Instead, you simply need to re-define the "ssh-servers" entry in the file /etc/nagios3/conf./hostgroups_nagios2.cfg. The initial entry in the file looked like: # A list of your ssh-accessible servers define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost } What do you think you should change? Correct, the "members" line. You should add in entries for all the classroom pcs, routers and the switches that run ssh. With this information and the network diagram you should be able complete this entry. The entry will look something like this: define hostgroup { hostgroup_name ssh-servers alias SSH servers members localhost,pc1,pc2,pc3,pc4....,bb-rtr,group1-rtr,goup2-rtr } Note: leave in "localhost" - This is your PC and represents Nagios' network point of view. So, for instance, if you are on "pc3" you would not include "pc3" in the list of all the classroom pcs as it is represented by the "localhost" entry. The "members" entry will be a long line and will likely wrap on the screen. Remember to include all your PCs. - Once you are done, run the pre-flight check: # nagios3 -v /etc/nagios3/nagios.cfg If everything looks good, then restart Nagios # /etc/init.d/nagios3 stop # /etc/init.d/nagios3 start and view your changes in the Nagios web interface. 3.) Check that http is running on all the classroom PCs. - This is almost identical to the previous exercise. Just make the change to the HTTP service adding in each PC (no routers or switches). Remember, you don't need to add your machine as it is already defined as "localhost". PART V Create More Host Groups ----------------------------------------------------------------------------- 1. Update /etc/nagios3/conf.d/hostgroups_nagios2.cfg - For the following exercises it will be very useful if we have created or update the following hostgroups: debian-servers routers switches If you edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg you will see an entry for debian-servers that just contains localhost. Update this entry to include all the classroom PCs, including the noc (this assumes that you created a "noc" entry in your pcs.cfg file). Remember to skip your PC entry as it is represented by the localhost entry. # vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg Update the entry that says: # A list of your Debian GNU/Linux servers define hostgroup { hostgroup_name debian-servers alias Debian GNU/Linux Servers members localhost } So that the "members" parameter contains something like this. Use your classroom network diagram to confirm the exact number of machines and names in your workshop. members localhost,pc1,pc2,pc3,pc4,pc5,pc6,pc7,pc8,pc9 pc10,pc11,pc12,pc13,pc14,pc15,pc16,pc17,pc18 Be sure that the line wraps and is not on two separate lines. Otherwise you will get an error when you go to restart Nagios. - Once you have done this, add in two more entries. One for routers and one for switches. Call these entries "routers" and "switches". - When you are done be sure to verify your work and restart Nagios. PART VI Extended Host Information ("making your graphs pretty") ----------------------------------------------------------------------------- 1. Update extinfo_nagios2.cfg - If you would like to use appropriate icons for your defined hosts in Nagios this is where you do this. We have the three types of devices: Cisco routers Cisco switches Ubuntu servers There is a fairly large repository of icon images available for you to use located here: /usr/share/nagios/htdocs/images/logos/ these were installed by default as dependent packages of the nagios3 package in Ubuntu. In some cases you can find model-specific icons for your hardware, but to make things simpler we will use the following icons for our hardware: /usr/share/nagios/htodcs/images/logos/base/debian.* /usr/share/nagios/htdocs/images/logos/cook/router.* /usr/share/nagios/htdocs/images/logos/cook/switch.* - The next step is to edit the file /etc/nagios3/conf.d/extinfo_nagios2.cfg and tell nagios what image you would like to use to represent your devices. # vi /etc/nagios3/conf.d/extinfo_nagios2.cfg Here is what an entry for your routers looks like (there is already an entry for debian-servers that will work as is). Note that the router model (3600) is not all that important. The image used represents a router in general. define hostextinfo { hostgroup_name routers icon_image cook/router.png icon_image_alt Cisco Routers (3600) vrml_image router.png statusmap_image cook/router.gd2 } Now add an entry for your switches. Once you are done check your work and restart Nagios. Take a look at the Status Map in the web interface. It should be much nicer. PART VII Create Service Groups ----------------------------------------------------------------------------- 1. Create service groups for ssh and http for each set of pcs. - The idea here is to create three service groups. Each service group will be for the group of PCs that are connected to each router xxxxxxx, yyyyyy, zzzzzz, etc. We want to see these PCs grouped together and include status of their ssh and http services. To do this edit and create the file: # vi /etc/nagios3/conf.d/servicegroups.cfg Here is a sample of the service group for group 1: define servicegroup { servicegroup_name group1-servers alias group 1 servers members pc1,SSH,pc1,HTTP,pc2,SSH,pc2,HTTP,pc3,SSH,pc3,HTTP,pc4,SSH,pc4,HTTP,pc5,SSH, pc5,HTTP,pc6,SSH,pc6,HTTP,pc7,SSH,pc7,HTTP,pc8,SSH,pc8,HTTP,pc9,SSH,pc9,HTTP } - Note that the members line should wrap and not be on two lines. - Note that "SSH" and "HTTP" need to be uppercase as this is how the service_description is written in the file /etc/nagios3/conf.d/services_nagios2.cfg - You should create an entry for the group 2 servers as well. - Save your changes, verify your work and restart Nagios. Now if you click on the Servicegroup menu items in the Nagios web interface you should see this information grouped together. - Be sure you to this for TLD1 through TLD8 to create a servicegroup of SSH and HTTP servers for all 8 TLDs in the classroom. PART VIII Configure Guest Access to the Nagios Web Interface ----------------------------------------------------------------------------- 1. Edit /etc/nagios3/cgi.cfg to give read only guest user access to the Nagios web interface. - By default Nagios is configured to give full r/w access via the Nagios web interface to the user nagiosadmin. You can change the name of this user, add other users, change how you authenticate users, what users have access to what resources and more via the cgi.cfg file. - First, lets create a "guest" user and password in the htpasswd.users file. # cd /etc/nagios3 # htpasswd /etc/nagios3/htpasswd.users guest You can use any password you want (or none). A password of "guest" is not a bad choice. - Next, edit the file /etc/nagios3/cgi.cfg and look for what type of access has been given to the nagiosadmin user. By default you will see the following directives (note, there are comments between each directive): authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin Now lets tell Nagios to allow the "guest" user some access to information via the web interface. You can choose whatever you would like, but what is pretty typical is this: authorized_for_system_information=nagiosadmin,guest authorized_for_configuration_information=nagiosadmin,guest authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin,guest authorized_for_all_hosts=nagiosadmin,guest authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin - Once you make the changes, save the file cgi.cfg, verify your work and restart Nagios. - To see if you can log in as the "guest" user you may need to clear the cookies in your web browser. You will not notice any difference in the web interface. The difference is that a number of items that are available via the web interface (forcing a service/host check, scheduling checks, comments, etc.) will not work for the guest user. OPTIONAL -------- 5.) Check that SNMP is running on the classroom NOC - First you will need to add in the appropriate service check for SNMP in the file /etc/nagios3/conf.d/services_nagios2.cfg. This is where Nagios is impressive. There are hundreds, if not thousands, of service checks available via the various Nagios sites on the web. You can see what plugins are installed by Ubuntu in the nagios3 package that we've installed by looking in the following directory: # ls /usr/lib/nagios/plugins As you'll see there is already a check_snmp plugin available to us. If you are interested in the options the plugin takes you can execute the plugin from the command line by typing: # /usr/lib/nagios/plugins/check_snmp to see what options are available, etc. You can use the check_snmp plugin and Nagios to create very complex or specific system checks. - Now to see all the various service/host checks that have been created using the check_snmp plugin you can look in /etc/nagios-plugins/config/snmp.cfg. You will see that there are a lot of preconfigured checks using snmp, including: snmp_load snmp_cpustats snmp_procname snmp_disk snmp_mem snmp_swap snmp_procs snmp_users snmp_mem2 snmp_swap2 snmp_mem3 snmp_swap3 snmp_disk2 snmp_tcpopen snmp_tcpstats snmp_bgpstate check_netapp_uptime check_netapp_cupuload check_netapp_numdisks check_compaq_thermalCondition And, even better, you can create additional service checks quite easily. For the case of verifying that snmpd (the SNMP service on Linux) is running we need to ask SNMP a question. If we don't get an answer, then Nagios can assume that the SNMP service is down on that host. When you use service checks such as check_http, check_ssh and check_telnet this is what they are doing as well. - In our case, let's create a new service check and call it "check_system". This service check will connect with the specified host, use the private community string we have defined in class and ask a question of snmp on that ask - in this case we'll ask about the System Description, or the OID "sysDescr.0" - - To do this start by editing the file /etc/nagios-plugins/config/snmp.cfg: # vi /etc/nagios-plugins/config/snmp.cfg At the top (or the bottom, your choice) add the following entry to the file: # 'check_system' command definition define command{ command_name check_system command_line /usr/lib/nagios/plugins/check_snmp -H '$HOSTADDRESS$' -C '$ARG1$' -o sysDescr.0 } You may wish to copy and past this vs. trying to type this out. Note that "command_line" is a single line. If you copy and paste in vi the line may not wrap properly and you may have to manually add the part: '$ARG1$' -o sysDescr.0 to the end of the line. - Now you need to edit the file /etc/nagios3/conf.d/services_nagios2.cfg and add in this service check. We'll run this check against all our servers in the classroom, or the hostgroup "debian-servers" - Edit the file /etc/nagios3/conf.d/services_nagios2.cfg # vi /etc/nagios3/conf.d/services_nagios2.cfg At the bottom of the file add the following definition: # check that snmp is up on all servers define service { hostgroup_name snmp-servers service_description SNMP check_command check_system!xxxxxx use generic-service notification_interval 0 ; set > 0 if you want to be renotified } The "xxxxxx" is the community string previously (or to be) defined in class. Note that we have included our private community string here vs. hard-coding it in the snmp.cfg file earlier. You must change the "xxxxx" to be the snmp community string given in class or this check will not work. - Now we must create the "snmp-servers" group in our hostgroups_nagios2.cfg file. Edit the file /etc/nagios3/conf.d/hostgroups_nagios2.cfg and go to the end of the file. Add in the following hostgroup definition: # A list of snmp-enabled devices on which we wish to run the snmp service check define hostgroup { hostgroup_name snmp-servers alias snmp servers members noc } - Note that for "members" you could, also, add in the switches and routers for group 1 and 2. But, the particular item (MIB) we are checking for "sysDescr.0" may not be available on the switches and/or routers, so the check would then fail. - Now verify that your changes are correct and restart Nagios. - If you click on the Service Detail menu choice in web interface you should see the SNMP check appear. - After we do the SNMP presentation and exercises in class, then you could come back to this exercise and add in all the classroom PCs to the members list in the hostgroups_nagios2.cfg file, snmp-servers hostgroup definition. Remember to list your PC as "localhost". Last update 25 September, 2010 by HA