The hardware..
In my Homelab : Highly resilient “datacenter-in-two-boxes” with Centos 7 and Ceph jewel article, I’ve told how to build a low power homelab.
With this hardware, a bunch of low power disks (2,5 5400), you can build a low power virtualized storage system with Ceph, and store all your data with top-level NAS software
The software :
Centos 7.3 (1611) x86-64 “minimal”
Ceph “jewel” x86-64
Puppet (configuration management software)
Topology
Number of MONs
It’s recommended to install at least 3 MONs for resilience reasons.
For me needs, I will install 5 MONs on my 5 hosts
Installing the cluster
Preparing the hardware and the OS
Requirements :
This blog do not cover the OS installation procedure. Before you continue be sure to configure your OS with these additional requirements :
- Use a correct DNS configuration or configure manually each /etc/hosts file of the hosts.
- You will need at least 3 nodes, plus an admin node (for cluster deployment, monitoring, ..)
- You MUST install NTP on all nodes :
root@n0:~# yum install -y ntp
root@n0:~# systemctl enable ntpd
then configure /etc/ntp.conf with your preferred NTP servers
It’s safe and more efficient to have a time source close to your cluster. Wifi AP, DSL routers often provide such services. My configuration uses my ADSL router, based on openWRT (you can setup ntpd on openwrt…)
Then run :
root@n0:~# timedatectl set-ntp true
- disable SELINUX (see /etc/selinux/config)
- disable your firewalld (systemctl disable firewalld.service)
Finally, ensure everything’s ok when rebooting your node..
Create the ceph admin user on each node :
On each node, create a ceph admin user (used for deployment tasks). It’s important to choose a different user than “ceph” (used by the ceph installer..)
Note : you can omit the -s directive of useradd, it’s a personal choice to use bash.
root@n0:~# sudo useradd -d /home/cephadm -m cephadm -s /bin/bash
root@n0:~# sudo passwd cephadm
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
root@n0:~# echo "cephadm ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
root@n0:~# chmod 444 /etc/sudoers.d/ceph
and so on to host admin, and nodes n1, n2 [and n3, …]
Also, install lsb first, it will be useful later.
yum install redhat-lsb-core
Setup the ssh authentication with cryptographic keys
On the admin node :
Create the ssh key for the user cephadm
root@admin:~# su – cephadm
cephadm@admin:~$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/cephadm/.ssh/id_dsa):
Created directory ‘/home/cephadm/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephadm/.ssh/id_dsa.
Your public key has been saved in /home/cephadm/.ssh/id_dsa.pub.
The key fingerprint is:
ec:16:ad:b4:76:e4:32:c6:7c:14:45:bc:c3:78:5a:cf cephadm@admin
The key’s randomart image is:
+—[DSA 1024]—-+
| oo |
| .. |
| .o . |
| . …* |
| S ++ + |
| = B. E |
| % + |
| + = |
| |
+—————–+
cephadm@admin:~$
Then push it on the nodes of the cluster:
[cephadm@admin ~]$ ssh-copy-id cephadm@n0
[cephadm@admin ~]$ ssh-copy-id cephadm@n1
[cephadm@admin ~]$ ssh-copy-id cephadm@n2
[cephadm@admin ~]$ ssh-copy-id cephadm@n3
[cephadm@admin ~]$ ssh-copy-id cephadm@n4
Or better to automate (if you do this a lot of times, 🙂 ):
#!/bin/sh
# sudo yum install moreutils sshpass openssh-clients
echo ‘Enter password:’;
read -s SSHPASS;
export SSHPASS;
for i in {0..4}; do sshpass -e ssh-copy-id -o StrictHostKeyChecking=no cephadm@n$i.int.intra -p 22 ; done
export SSHPASS=”
Install and configure dsh (distributed shell)
[root@admin ~]# yum install -y gcc
[root@admin ~]# yum install -y gcc-c++
[root@admin ~]# yum install -y wget
[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/dsh-0.25.9.tar.gz
[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/libdshconfig-0.20.9.tar.gz
[root@admin ~]# tar xvfz libdshconfig-0.20.9.tar.gz
[root@admin ~]# cd libdshconfig-0.20.9
[root@admin libdshconfig-0.20.9]# ./configure
[root@admin libdshconfig-0.20.9]# make
[root@admin libdshconfig-0.20.9]# make install
[root@admin ~]# tar xvfz dsh-0.25.9.tar.gz
[root@admin ~]# cd dsh-0.25.9
[root@admin dsh-0.25.9]# ./configure
[root@admin dsh-0.25.9]# make
[root@admin dsh-0.25.9]# make install
[root@admin ~]# echo /usr/local/lib > /etc/ld.so.conf.d/dsh.conf
[root@admin ~]# ldconfig
Done. Then configure it :
[root@admin ~]# vi /usr/local/etc/dsh.conf
insert these lines :
remoteshell =ssh
waitshell=1 # whether to wait for execution
[root@admin ~]# su – cephadm
cephadm@admin:~$ cd
cephadm@admin:~$ mkdir .dsh
cephadm@admin:~$ cd .dsh
cephadm@admin:~/.dsh$ for i in {0..4} ; do echo “n$i” >> machines.list ; done
Test…
[cephadm@admin ~]$ dsh -aM uptime
n0: 16:23:21 up 3 min, 0 users, load average: 0.20, 0.39, 0.20
n1: 16:23:22 up 3 min, 0 users, load average: 0.19, 0.40, 0.21
n2: 16:23:23 up 3 min, 0 users, load average: 0.13, 0.38, 0.20
n3: 16:23:24 up 4 min, 0 users, load average: 0.00, 0.02, 0.02
n4: 16:23:25 up 3 min, 0 users, load average: 0.24, 0.38, 0.20
[cephadm@admin ~]$ dsh -aM cat /proc/cpuinfo | grep model\ name
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n3: model name : Intel Core Processor (Broadwell)
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
Good.. !!
Now you’re ready to install your cluster with automated commands from your admin node. Not than several other solutions are good enough, like cssh (clustered…). Choose the best for your needs 😉
Well,
Now I’m assuming you have followed the installation procedure and the requirements above :).
Here’s my configuration :
n0 : 192.168.10.210/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n1 : 192.168.10.211/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n2 : 192.168.10.212/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on Crucial MX200 SSD (journal)
n3 : 192.168.10.213/24 1TB WD Red 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n4 : 192.168.10.214/24 1TB Hitachi 3.5" 7200rpm (data) + 20Gb on Crucial MX100 SSD (journal)
n5 : 192.168.10.215/24 1TB ZFS (on 2x WD Green 5TB) + 20Gb on Crucial MX100 SSD (journal)
admin : 192.168.10.177/24 (VM)
Finally, don’t forget to change your yum repositories if you installed the OSes with a local media. They should now point to a mirror for all updates (security and software).
Reboot your nodes if you wan to be very sure you haven’t forget anything, and test them with dsh, for example for ntp
[cephadm@admin ~]$ dsh -aM timedatectl|grep NTP
[cephadm@admin ~]$ dsh -aM timedatectl status|grep NTP
n0: NTP enabled: yes
n0: NTP synchronized: yes
n1: NTP enabled: yes
n1: NTP synchronized: yes
n2: NTP enabled: yes
n2: NTP synchronized: yes
n3: NTP enabled: yes
n3: NTP synchronized: yes
n4: NTP enabled: yes
n4: NTP synchronized: yes
…
Install your Ceph cluster
Get the software
Ensure you are up to date on each nodes, at the very beginning of this procedure.
Feel free to use dsh from the admin node for each task you would like to apply to the nodes 😉
[cephadm@admin ~]$ dsh -aM “sudo yum -y upgrade”
Install the repos
On the admin node only, configure the ceph repos.
You have the choice : either do it like this if you want to download Ceph packages from the internet :
[cephadm@admin ~]$ sudo yum install https://download.ceph.com/rpm-jewel/el7/noarch/ceph-release-1-1.el7.noarch.rpm
Or if you want a local mirror, look at the section below telling how to setup Puppet (for example) to do that. I prefer this option for myself, because I have a local mirror (for testing purposes, it’s better to download locally)
Install Ceph-deploy
This tool is written in Python.
[cephadm@admin ~]$ sudo yum install ceph-deploy
Install Ceph
Always on the admin node, create a directory that will contain all the config for your cluster :
cephadm@admin:~$ mkdir cluster
cephadm@admin:~$ cd cluster
I have chosen to install 4 monitors (3 would be sufficient at home, but my needs isn’t your needs).
cephadm@admin:~/cluster$ ceph-deploy new n{0,2,4,5}
(It generates a lot of stdout messages)
Now edit ceph.conf (in the “cluster” directory) and tell ceph you want to shard x3, and add the cluster and public networks in the [global] section ; for myself : 10.0.0.0/8 and 192.168.10.0/24
The file ceph.conf should contain the following lines now :
[global]
fsid = 74a80a50-b7f9-4588-baa4-bb242c3d4cf0
mon_initial_members = n0, n1, n3
mon_host = 192.168.10.210,192.168.10.211,192.168.10.213
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
cluster network = 10.1.1.0/24
public network = 192.168.10.0/24
[osd]
osd mkfs options type=btrfs
osd journal size = 20000
Please note that I will use btrfs to store the data. My kernel is at a sufficient level for that (4.9), and i’ve experienced obvious filesystem corruptions sometimes, when simply rebooting my nodes which had kernel 3.10 and an XFS partition for the OSDs.
If you install from a local mirror :
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --repo-url {http mirror} --gpg-url {http gpg url} --release jewel n$i; done
For ex per me : for i in {0..5}; do ceph-deploy install –repo-url http://mirror/ceph/rpm-jewel/el7/ –release jewel n$i; done
Else :
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --release jewel n$i; done
This command generates a lot of logs (downloads, debug messages, warnings…) but should return without error. Otherwise check the error and google it. You can just restart the ceph-deploy program, depending on the error it will work the second time 😉 (I’ve experienced some problems accessing the ceph repository, for ex….)
Create the mons:
cephadm@admin:~/cluster$ ceph-deploy mon create-initial
Idem, a lot of logs… but no error..
Create the OSDs (storage units)
You have to know which device will be used for the data on each node and which device for the journal. If you are building a ceph cluster for a production environment, you should use SSDs for the journal partition.for testing purpose, you can use only one device.
In my case, I took care to make the storage osd disks to be /dev/vdb on all nodes, and journal (SSD) on /dev/vdc..
Important note : if you previously installed ceph on a device, you MUST “zap” (delete) it before. Use the command “ceph-deploy disk zap n3:sdb” for example.
Execute this step if you don’t know anything about the past usage of your disks.
Zap disks If you have a separate partition for SSDs (/dev/vdc, here):
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdb; done
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdc; done
If you use only one device :
cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy disk zap n$i:vdb; done
Create the OSDsNote : use –fs-type btrfs on “osd create” if you want (as me) another filesystem than xfs. I’ve got obvious problems with xfs (corruptions while rebooting..)
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create –fs-type btrfs n$i:vdb:vdc; done
Else use the defaults (xfs)
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create n$i:vdb:vdc; done
And remember, if you have only one device (vdb for ex), use this instead (defaults with xfs):
cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy osd create n$i:vdb; done
Deploy the ceph configuration to all storage nodes
cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy admin n$i; done
And check the permissions. For some reasons the permissions are not correct:
cephadm@admin:~/cluster$ dsh -aM “ls -l /etc/ceph/*key*”
n0: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
To correct this, issue the following command :
cephadm@admin:~/cluster$ dsh -aM “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”
and check :
cephadm@admin:~/cluster$ dsh -aM “ls -l /etc/ceph/*key*”
n0: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
Finally, install the metadata servers
cephadm@admin:~/cluster$ ceph-deploy mds create n0 n1 n5
and the rados gateway
cephadm@admin:~/cluster$ ceph-deploy rgw create n3 n5
(on n3 for me)
One more time 😉 remember to check the ntp status of the nodes :
cephadm@admin:~/cluster$ dsh -aM “timedatectl|grep synchron”
n0: NTP synchronized: yes
n1: NTP synchronized: yes
n2: NTP synchronized: yes
n3: NTP synchronized: yes
n4: NTP synchronized: yes
check the cluster, on one node type :
[cephadm@n0 ~]$ ceph status
cluster 2a663a93-7150-43f5-a8d2-e40e2d9d175f
health HEALTH_OK
monmap e2: 5 mons at {n0=192.168.10.210:6789/0,n1=192.168.10.211:6789/0,n2=192.168.10.212:6789/0,n3=192.168.10.213:6789/0,n4=192.168.10.214:6789/0}
election epoch 8, quorum 0,1,2,3,4 n0,n1,n2,n3,n4
osdmap e32: 5 osds: 5 up, 5 in
flags sortbitwise,require_jewel_osds
pgmap v97: 104 pgs, 6 pools, 1588 bytes data, 171 objects
173 MB used, 3668 GB / 3668 GB avail
104 active+clean
Done !
Test your brand new ceph cluster
You can create a pool to test your new cluster :
[cephadm@n0 ~]$ rados mkpool test
successfully created pool test
[cephadm@n0 ~]$ rados lspools
rbd
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
test
[cephadm@n0 ~]$ rados put -p test .bashrc .bashrc
[cephadm@n0 ~]$ ceph osd map test .bashrc
osdmap e34 pool ‘test’ (6) object ‘.bashrc’ -> pg 6.3d13d849 (6.1) -> up ([2,4,1], p2) acting ([2,4,1], p2)
A quick look at the cluster network, ensure it’s used as it should be (tcpdump -i on n0) :
22:43:17.137802 IP 10.1.1.12.50248 > n0.int.intra.acnet: Flags [P.], seq 646:655, ack 656, win 1424, options [nop,nop,TS val 3831166 ecr 3830943], length 9
22:43:17.177297 IP n0.int.intra.acnet > 10.1.1.12.50248: Flags [.], ack 655, win 235, options [nop,nop,TS val 3831203 ecr 3831166], length 0
22:43:17.205945 IP 10.1.1.13.42810 > n0.int.intra.acnet: Flags [P.], seq 393:515, ack 394, win 1424, options [nop,nop,TS val 4392067 ecr 3829192], length 122
22:43:17.205999 IP n0.int.intra.acnet > 10.1.1.13.42810: Flags [.], ack 515, win 252, options [nop,nop,TS val 3831231 ecr 4392067], length 0
22:43:17.206814 IP n0.int.intra.acnet > 10.1.1.13.42810: Flags [P.], seq 394:525, ack 515, win 252, options [nop,nop,TS val 3831232 ecr 4392067], length 131
22:43:17.207547 IP 10.1.1.13.42810 > n0.int.intra.acnet: Flags [.], ack 525, win 1424, options [nop,nop,TS val 4392069 ecr 3831232], length 0
….
Good !!
Now, “really” test your new cluster
Cf http://docs.ceph.com/docs/giant/rbd/libvirt/ :
First deploy the admin part of ceph on the destination system that will test your cluster
On the admin node :
[cephadm@admin cluster]$ ceph-deploy –overwrite-conf admin hyp03
On an hypervisor, with access to the network of course :
First give permissions for each process to know your cluster
chmod +r /etc/ceph/ceph.client.admin.keyring
[root@hyp03 ~]# ceph osd pool create libvirt-pool 128 128
pool ‘libvirt-pool’ created
[root@hyp03 ~]# ceph auth get-or-create client.libvirt mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool’
[client.libvirt]
key = AQDsdMYVYR0IdmlkKDLKMZYUifn+lvqMH3D7Q==
Create a 16G image on your new cluster
[root@hyp03 ~]# qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 16G
Formatting ‘rbd:libvirt-pool/new-libvirt-image’, fmt=rbd size=17179869184 cluster_size=0
Important : Jewel brings RBD features not compatible with Centos 7.3. Disable it, otherwise you won’t be able to mount your RBD image (either with rbd map or through qemu-img)
rbd feature disable libvirt-pool/new-libvirt-image exclusive-lock object-map fast-diff deep-flatten
Create a secret
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
<usage type='ceph'>
<name>client.libvirt secret</name>
</usage>
</secret>
EOF
[root@hyp03 ~]# sudo virsh secret-define –file secret.xml
[root@hyp03 ~]# ceph auth get-key client.libvirt | sudo tee client.libvirt.key
[root@hyp03 ~]# sudo virsh secret-set-value –secret 89ce37fe-3a9f-4aad-9fdf-9b239b489945 –base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
Replicate the secret on all hosts you want the VM to run (especially for using live migration). repeat the previous steps on each of these hosts, but with a modified secret.xml file that includes the secret UUID created at the first run on the first host.
<secret ephemeral='no' private='no'>
<uuid>89ce37fe-3a9f-4aad-9fdf-9b239b489945</uuid>
<usage type='ceph'>
<name>client.libvirt secret</name>
</usage>
</secret>
Follow the guide http://docs.ceph.com/docs/giant/rbd/libvirt/ for the vm configuration and then
[root@hyp03 ~]# virsh start dv03
Domain dv03 started
You’re done !
Configure the cluster
Crush map
For my needs, I want my cluster to be up even if one of my host is down.
In my home “datacenter”, I’ve two “rack”, two “physical hosts”, and 6 “ceph virtual hosts”, each of these running a 1TB OSD
How do I ensure the data replication occurs in a matter that no data is only on a phycial host ? you will do that bu managing your ceph CRUSH map with rules..
First, organize your ceph hosts in your “dataxenter”.
Because my home is not a really datacenter, for this example, I will name “hosts” the virtial machines hosting centos 7.3/ceph with an OSD for each VM.
I will name “rack” the deux physical hosts that run the “hosts” (VM)
I will call “datacenter” the rack where my two physical hosts are installed.
Create the datacenter, racks, and move them into the right place
ceph osd crush add-bucket rack1 rack
ceph osd crush move n0 rack=rack1
ceph osd crush move n1 rack=rack1
ceph osd crush move n2 rack=rack1
ceph osd crush move n3 rack=rack1
ceph osd crush move rack1 root=default
ceph osd crush add-bucket rack2 rack
ceph osd crush move rack2 root=default
ceph osd crush move n4 rack=rack2
ceph osd crush move n5 rack=rack2
ceph osd crush add-bucket dc datacenter
ceph osd crush move dc root=default
ceph osd crush move rack1 datacenter=dc
ceph osd crush move rack2 datacenter=dc
Look at the results
[root@hyp03 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.54849 root default
-10 5.54849 datacenter dc
-8 3.63879 rack rack1
-2 0.90970 host n0
0 0.90970 osd.0 up 1.00000 1.00000
-3 0.90970 host n1
1 0.90970 osd.1 up 1.00000 1.00000
-4 0.90970 host n2
2 0.90970 osd.2 up 1.00000 1.00000
-5 0.90970 host n3
3 0.90970 osd.3 up 1.00000 1.00000
-9 1.90970 rack rack2
-6 0.90970 host n4
4 0.90970 osd.4 up 1.00000 1.00000
-7 1.00000 host n5
5 1.00000 osd.5 up 1.00000 1.00000
Configuration management : puppet
Now we have to install a configuration management tool. It saves a lot of time..
Master installation
On the admin node, we will install the master :
[root@admin ~]# sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm
[root@admin ~]# sudo yum -y install puppetserver
[root@admin ~]# systemctl enable puppetserver
[root@admin ~]# sudo systemctl start puppetserver
Agents installation :
Use dhs from the admin node :
[root@admin ~]# dsh -aM “sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm”
[root@admin ~]# dsh -aM “sudo yum -y install puppet-agent”
Enable the agent
[root@admin ~]# dsh -aM “systemctl enable puppet”
Configure the agents, you need to set the server name if it’s not “puppet” (default). Use a fqdn, it’s important.
[root@admin ~]# dsh -aM “sudo /opt/puppetlabs/bin/puppet config set server admin.int.intra”
Start the agent
[root@admin ~]# dsh -aM “systemctl start puppet”
Puppet configuration
On the admin node, check all the agents have published their certificates to the server
[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list
“n0.int.intra” (SHA256) 95:6B:A3:07:DA:70:04:D7:9B:18:4D:64:30:39:A1:19:9E:68:B9:6B:9C:92:DC:AB:98:36:16:6D:F3:66:B3:56
“n1.int.intra” (SHA256) 07:E3:1B:1F:6F:80:33:6C:A9:A4:96:88:71:A0:74:19:B0:DE:3A:EA:B2:36:2A:38:43:B1:5D:3E:92:3C:D0:47
“n2.int.intra” (SHA256) 62:2E:7E:91:CE:75:53:0C:DA:16:28:C7:14:EA:05:33:CD:DA:8D:B8:A4:A3:59:1B:B0:78:3B:29:AE:A6:CB:C4
“n3.int.intra” (SHA256) 77:92:0F:75:2F:75:E2:8F:68:22:4A:43:4C:BB:79:C5:24:6D:BB:98:42:D0:87:A5:13:57:52:9C:3D:82:D8:74
“n4.int.intra” (SHA256) 55:F4:15:F3:83:3A:39:99:B6:15:EC:D6:09:24:6D:6D:D2:07:9B:54:F5:73:15:C5:C8:74:9F:8F:BB:A0:E2:43
Sign the certificates
[root@admin ~]# for i in {0..4}; do /opt/puppetlabs/bin/puppet cert sign n$i.int.intra ; done
Finished ! you can check all the nodes with a valid certificate :
[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list –all
+ “admin.int.intra” (SHA256) F5:13:EE:E9:C2:F1:A7:86:01:3C:95:EE:61:EE:53:21:E9:75:15:24:45:FB:67:B8:D9:60:60:FE:DE:93:59:F6 (alt names: “DNS:puppet”, “DNS:admin.int.intra”)
+ “n0.int.intra” (SHA256) 9D:C0:3E:AB:FD:67:00:DB:B5:25:CD:23:71:A4:2F:C5:3F:A6:56:FE:55:CA:5D:27:95:C6:97:79:A9:B2:7F:CB
+ “n1.int.intra” (SHA256) 4F:C6:C1:B9:CD:21:4C:3A:76:B5:CF:E4:56:0D:20:D2:1D:72:35:7B:D9:53:86:D9:CD:CB:8D:3C:E8:39:F4:C2
+ “n2.int.intra” (SHA256) D7:6E:85:63:04:CC:C6:24:79:E3:C2:CE:F2:0F:5B:2E:FA:EE:D9:EF:9C:E3:46:6A:83:9F:AA:DA:5D:3F:F8:52
+ “n3.int.intra” (SHA256) 1C:95:61:C8:F6:E2:AF:4F:A5:52:B3:E0:CE:87:CF:16:02:2B:39:2C:61:EC:20:21:D0:BD:33:70:42:7A:6E:D9
+ “n4.int.intra” (SHA256) E7:B6:4B:1B:0A:22:F8:C4:F1:E5:A9:3B:EA:17:5F:54:41:97:68:AF:D0:EC:A6:DB:74:3E:F9:7E:BF:04:16:FF
You have now a working puppet config management system running fine..
Monitoring
Telegraf
Install Telegraf on the nodes, with a puppet manifest.
vi /etc/puppetlabs/code/environments/production/manifests/site.pp
include this text in the file site.pp :
node ‘n0’, ‘n1’, ‘n2’, ‘n3’, ‘n4’ {
file {‘/etc/yum.repos.d/influxdb.repo’:
ensure => present, # make sure it exists
mode => ‘0644’, # file permissions
content => “[influxdb]\nname = InfluxDB Repository – RHEL \$releasever\nbaseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable\nenabled = 1\ngpgcheck = 1\ngpgkey = https://repos.influxdata.com/influxdb.key\n”,
}
}
Install it on all nodes (we could do that with puppet, too):
dsh -aM “sudo yum install telegraf”
Create a puppet module for telegraf
[root@admin modules]# cd /etc/puppetlabs/code/modules
[root@admin modules]# mkdir -p telegraf_client/{files,manifests,templates}
Create a template for telegraf.conf
[root@admin telegraf_client]# vi templates/telegraf.conf.template
put the following in that file (note the fqdn variable) :
[tags]
# Configuration for telegraf agent
[agent]
debug = false
flush_buffer_when_full = true
flush_interval = “15s”
flush_jitter = “0s”
hostname = “<%= fqdn %>”
interval = “15s”
round_interval = true
Create a template for the inputs :
[root@admin telegraf_client]# vi templates/inputs_system.conf.template
put the following (no variables, yet. customize for your needs..) :
# Read metrics about CPU usage
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = [ “usage*” ]
# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ “inodes*” ]
mount_points=[“/”,”/home”]
# Read metrics about diskio usage
[[inputs.diskio]]
devices = [“sda2″,”sda3”]
skip_serial_number = true
# Read metrics about network usage
[[inputs.net]]
interfaces = [ “eth0” ]
fielddrop = [ “icmp*”, “ip*”, “tcp*”, “udp*” ]
# Read metrics about memory usage
[[inputs.mem]]
# no configuration
# Read metrics about swap memory usage
[[inputs.swap]]
# no configuration
# Read metrics about system load & uptime
[[inputs.system]]
# no configuration
Create a template for the outputs :
[root@admin telegraf_client]# vi templates/outputs.conf.template
and put the following text in the file
[[outputs.influxdb]]
database = “telegraf”
precision = “s”
urls = [ “http://admin:8086” ]
username = “telegraf”
password = “your_pass”
create the manifest for your module
[root@admin ~]# vi /etc/puppetlabs/code/modules/telegraf_client/manifests/init.pp
and add the following contents :
class telegraf_client {
package { ‘telegraf’:
ensure => installed,
}
file { “/etc/telegraf/telegraf.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/telegraf.conf.template”),
}
file { “/etc/telegraf/telegraf.d/outputs.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/outputs.conf.template”),
}
file { “/etc/telegraf/telegraf.d/inputs_system.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/inputs_system.conf.template”),
}
service { ‘telegraf’:
ensure => running,
enable => true,
}
}
And finally, include the module in the global puppet manifest file. Here is mine :
[root@admin ~]# vi /etc/puppetlabs/code/environments/production/manifests/site.pp
(which content is 🙂
node default {
case $facts[‘os’][‘name’] {
‘Solaris’: { include solaris }
‘RedHat’, ‘CentOS’: { include centos }
/^(Debian|Ubuntu)$/: { include debian }
default: { include generic }
}
}
node ‘n0′,’n1′,’n2′,’n3′,’n4’ {
include cephnode
}
class cephnode {
include telegraf_client
}
class centos {
yumrepo { “CentOS-OS-Local”:
baseurl => “http://nas4/centos/\$releasever/os/\$basearch”,
descr => “Centos int.intra mirror (os)”,
enabled => 1,
gpgcheck => 0,
priority => 1
}
yumrepo { “CentOS-Updates-Local”:
baseurl => “http://nas4/centos/\$releasever/updates/\$basearch”,
descr => “Centos int.intra mirror (updates)”,
enabled => 1,
gpgcheck => 0,
priority => 1
}
yumrepo { “InfluxDB”:
baseurl => “https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable”,
descr => “InfluxDB Repository – RHEL $releasever”,
enabled => 1,
gpgcheck => 1,
gpgkey => “https://repos.influxdata.com/influxdb.key”
}
}
Wait for minutes for puppet to apply your work on the nodes or run :
[root@admin ~]# dsh -aM “/opt/puppetlabs/bin/puppet agent –test”
Check telegraf is up and running. And check the measurements in InfluxDB.
Monitoring with InfluxDB/Telegraf