There are a lot of tools used for monitoring the performance in Linux.
Here is the list of tools in one image that covers what is monitored by what.
Long ago, while working in one of the previous organization, there were lots of components like services and servers running in production environment. I had deployed all products one by one from scratch and the count kept on increasing. There were components like PLM Servers, DB Server, License Mgmt, internal portal, Cotainer based virtualization system and a lots.
But there was no proper tools to monitor all the components at a time. As the count kept increasing , it becomes difficult to keep an eye on UP/DOWN time of all.
So I decided to deploy Nagios Monitoring system in the Data Center and developed many plugins to use.
I have opensourced few of the plugins, which I thought can help other people in world, those may facing these kind of challenges.
Also I posted them on Nagios Exchange on 4 years ago and now they are huge success. They each are downloaded 50k+ times and I received many thanks from many people from around the world and feel happy.
They can be found from here: https://exchange.nagios.org/directory/Owner/divyaimca/1
Chef provides a lot of flexibility and greater choice for infrastructure automation and I prefer it over others.
We should design our recipe in such a way that the our recipes without being modified can be used in any environment by maximizing the use of attributes.
I was working on a deployment project on Linux x86-64 platform, where I had to automate all the infra components. Oracle 11g R2 EE is one of them. I will share the cookbook here that can help many other. The recipes written here are used for silent installation of the DB using a response file after pulling the media files from a remote system.
Also the recipes are made idempotent, so that rerunning the cookbook again and again never do any damage. It automatically sets an attribute for DB installed / DB running in chef server after a successful compile -> run of the recipes.
Also the username/passwords are pulled stored and pulled from Encrypted Databag to make it more secure.
Here is the cookbook : https://github.com/kumarprd/Ora11gR2-EE-Silent-Install-Chef-Recipe
The recipes involved use below steps in sequence :
NOTE : Here create an encryoted databag with below json props which are accessed inside recipes.
Any issue/suggestion are welcome.
The docker was released keeping in mind, one daemon per container which makes the container lightweight. Like suppose for running a web application, one container will serve database, one container will server as web server, one container will server as caching server connecting to DB.
So while writing a Dockerfile, the limitation is : only one CMD parameter can be be used inside it to run a single foreground process, the exit of which will stop the container.
But sometime we may face situations like to run more than one daemon process in a single container that is to setup the complete stack in a single container.
For this we can have two approaches:
UseCase : I faced a situation where I have to run ssh,httpd,mysql in a single container and here is how I approached it with supervisor.
Also using the stdout of supervisor we can redirect the logs in terminal.
The three config file used here:
These files can be accessed from my gitrepo :
Next run below commands:
Step 1 : FROM oraclelinux:6.8
Step 2 : ENV container docker
—> Running in 8cff18dabcc4
Step 20 : CMD /usr/bin/supervisord -c /etc/supervisor.conf
—> Running in 4ffed54b078f
Removing intermediate container 4ffed54b078f
Successfully built dfb974e07bfb
2. docker-compose up
# docker-compose up
Attaching to supervisord_web_1
web_1 | 2016-10-01 05:57:55,357 CRIT Supervisor running as root (no user in config file)
web_1 | 2016-10-01 05:57:55,357 WARN For [program:sshd], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
web_1 | 2016-10-01 05:57:55,357 WARN For [program:mysqld], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
web_1 | 2016-10-01 05:57:55,357 WARN For [program:httpd], redirect_stderr=true but stderr_logfile has also been set to a filename, the filename has been ignored
web_1 | 2016-10-01 05:57:55,364 INFO supervisord started with pid 1
web_1 | 2016-10-01 05:57:56,369 INFO spawned: ‘httpd’ with pid 7
web_1 | 2016-10-01 05:57:56,373 INFO spawned: ‘sshd’ with pid 8
web_1 | 2016-10-01 05:57:56,377 INFO spawned: ‘mysqld’ with pid 9
web_1 | Could not load host key: /etc/ssh/ssh_host_rsa_key
web_1 | Could not load host key: /etc/ssh/ssh_host_dsa_key
web_1 | 161001 05:57:56 mysqld_safe Logging to ‘/var/log/mysqld.log’.
web_1 | 161001 05:57:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
web_1 | httpd: Could not reliably determine the server’s fully qualified domain name, using 172.18.0.2 for ServerName
web_1 | 2016-10-01 05:57:57,649 INFO success: httpd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1 | 2016-10-01 05:57:57,649 INFO success: sshd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
web_1 | 2016-10-01 05:57:57,649 INFO success: mysqld entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
3. check the ps table
# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
edd870f7e3ca testimg.supervisor “/usr/bin/supervisord” 19 minutes ago Up 19 minutes 0.0.0.0:5002->22/tcp, 0.0.0.0:5000->80/tcp, 0.0.0.0:5001->3306/tcp supervisord_web_1
4. Connect to the container and check the services:
ssh -p 5002 root@<FQDN of the host where docker engine is running>
Last login: Sat Oct 1 06:07:43 2016 from <FQDN>
[root@edd870f7e3ca ~]# /etc/init.d/mysqld status
mysqld (pid 101) is running…
[root@edd870f7e3ca ~]# /etc/init.d/httpd status
httpd (pid 7) is running…
[root@edd870f7e3ca ~]# /etc/init.d/sshd status
openssh-daemon (pid 8) is running…
I recently found some VMs of one OVS node( from 30+ nodes) went down and not able to start with this error :
Xend has probably crashed! Invalid or missing HTTP status code.
There are many reasons behind this. And if you try to restart xend , it will not start.
The first place to look for is :
This log will say where exactly the issue is.
In my case my / filesystem was running out of space because one log file consumed almost 8 GB . So I have to delete that file and now xend started successfully.
Some time we have to read one existing json property file and update some values inplace.
If we don’t use proper approach, the update may lead to breaking the json structure in the file.
We have to hook the json objects by using OrderedDict of collection module in python for remembering the proper order.
Here old_value is updated with new_value :
“name” : “old_value”
from collections import OrderedDict propJson = os.path.dirname(os.path.abspath(__file__))+/props.json if os.path.isfile(propJson): with open(propJson,r+) as f: prop = json.load(f, object_hook=OrderedDict) prop[head][name] = str(new_value) f.seek(0) f.write(json.dumps(prop, f, default=str, indent=4)) f.truncate()