Qemu and OpenvSwitch

January 21, 2017February 25, 2017 / selyanblog / 1 Comment

In this article you will get an overview on how to build a powerful “home lab/datacenter” based on “very cool” open source technologies, in a very reduced footprint : two network racks, 180W power max.

The technical specs are as follows :

– 3 compute nodes

– 256GB RAM max capacity (96GB for now)

– 40 TB of raw storage (30 usable), including redundancy and backups

– 10GBE communication between nodes (performance for storage reasons and backups

– all this services for less than 200W, please. 🙂

for the main following goals :

Main goals

– highly resilient data and capacities (definitely, I don’t want to loose my family photos and videos, my work archives, my working environment and all administrative stuff about my family)

– small power footprint, because power is expensive for me (and for you, and for you)

– Hosting all mails, data (photos, videos, ..) for my family

– Hosting home automation systems

– Manage telecom means (IP phones, LTE and DSL internet access)

Architecture

home-datacenter

Computing and storage :

First node (compute/storage) :

Hardware

Mini ITX Chassis : In Win MS04.265P.SATA, 4 bay hot swap

Supermicro X10SDV-TLNF 8-core/16 threads Xeon D 1540 Mini ITX (TDP = 45W)

unadjustednonraw_thumb_4833

64GB RAM DDR4 ECC

256 GB NVMe SSD SM951 Samsung

2x 1Gb/s “Intel I350 Gigabit”

2x 10GBE “Intel Ethernet Connection X552/X557-AT 10GBASE-T”

Storage Shelf

U-NAS 8 Hard drives (5x4TB WD RED) + others

See original image

Software

Centos 7.3

ZFS, Ceph, KVM, OpenvSwitch

Second node (compute/storage) :

Hardware

Mini ITX Chassis : In Win MS04.265P.SATA, 4 bay hot swap

Supermicro X10SDV-4C+-TLN2F 4 core Xeon D 1518 (TDP = 35W)

32GB RAM DDR4 ECC

2x 1Gb/s “Intel I350 Gigabit”

2x 10GBE “Intel Ethernet Connection X552/X557-AT 10GBASE-T”

4 hard drives (2x5TB WD GREEN + 1TB Hitachi + SSD)

Software

Centos 7.3

ZFS, Ceph, KVM, OpenvSwitch

Third node (compute : management, domotic software..) :

Hardware

Intel NUC5i3RYK

16 GB RAM DDR3

120GB SSD M.2. Kingston

Software

Centos 7.3

KVM, OpenvSwitch

Rack mount

All the hardware is located in a garage, in two smart network racks (APC is located under the racks, near the power outlet)

The two compute/storage nodes in their final location

Network, compute and storage rack :

Network topology between the two main compute nodes : bandwidth optimisation and loop management with RSTP

By having two great supermicro nodes embedding four 10GE network cards, you may want that

1. all the traffic to be routed (level 2) in the 10GB channels (for perf reasons : backup, data transfer, traffic between storage -ceph- nodes…)

2. when a node is shut down (for any reason), all the traffic being routed by the 1 Gbps network card.

To do that, your switch must support RSTP (or other mechanisms, but RSTP works well, is supported in entry levels network switchs, and supported by openvswitch)

Here is my network topology :

Home Datacenter slide 1.jpg

Configuration for vs0 (the same for the two nodes) :

Create the switch

ovs-vsctl add-br vs0

Add the 1Gb/s port

ovs-vsctl add-port vs0 eno1

ovs-vsctl set Bridge vs0 rstp_enable=true

ovs-vsctl set Port eno4 other_config:rstp-port-priority=32

ovs-vsctl set Port eno4 other_config:rstp-path-cost=150

Then, when RSTP is configured (not before 🙂 ), add the 10 Gb/s port

ovs-vsctl add-port vs0 eno3

Should work.

Finally, set the management ip for the internal port of the switch in /etc/sysconfig/network

[root@hyp02 ~]# cat /etc/sysconfig/network-scripts/ifcfg-vs0
DEVICE=vs0
ONBOOT=yes
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
HOTPLUG=no
IPADDR=192.168.10.75
GATEWAY=192.168.10.1
PREFIX=24
DNS1=192.168.10.1
DOMAIN=localdomain

First test snapshot : worked well, huh ?

[root@hyp01 vol05]# iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[ 4] local 192.168.11.1 port 59982 connected to 192.168.11.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr Cwnd
[ 4]   0.00-1.00   sec 1.09 GBytes 9.35 Gbits/sec   39    656 KBytes
[ 4]   1.00-2.00   sec 1.09 GBytes 9.39 Gbits/sec    0    663 KBytes
[ 4]   2.00-3.00   sec 1.09 GBytes 9.40 Gbits/sec    0    686 KBytes
[ 4]   3.00-4.00   sec 1.09 GBytes 9.34 Gbits/sec 635    447 KBytes
[ 4]   4.00-5.00   sec 1.05 GBytes 9.00 Gbits/sec 144    691 KBytes
[ 4]   5.00-6.00   sec 1.09 GBytes 9.36 Gbits/sec    0    707 KBytes
[ 4]   6.00-7.00   sec 1.09 GBytes 9.40 Gbits/sec    0    723 KBytes
[ 4]   7.00-8.00   sec 1.09 GBytes 9.38 Gbits/sec    0    754 KBytes
[ 4]   8.00-9.00   sec 1.09 GBytes 9.35 Gbits/sec 270    632 KBytes
[ 4]   9.00-10.00 sec 1.07 GBytes 9.16 Gbits/sec 176    635 KBytes
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval           Transfer     Bandwidth       Retr
[ 4]   0.00-10.00 sec 10.8 GBytes 9.31 Gbits/sec 1264             sender
[ 4]   0.00-10.00 sec 10.8 GBytes 9.31 Gbits/sec                  receiver

iperf Done.
[root@hyp01 vol05]# iperf3 -s
———————————————————–
Server listening on 5201
———————————————————–
Accepted connection from 192.168.11.2, port 33226
[ 5] local 192.168.11.1 port 5201 connected to 192.168.11.2 port 33228
[ ID] Interval           Transfer     Bandwidth
[ 5]   0.00-1.00   sec 1.04 GBytes 8.97 Gbits/sec
[ 5]   1.00-2.00   sec 1.06 GBytes 9.13 Gbits/sec
[ 5]   2.00-3.00   sec 1.07 GBytes 9.22 Gbits/sec
[ 5]   3.00-4.00   sec 1.06 GBytes 9.10 Gbits/sec
[ 5]   4.00-5.00   sec 1.07 GBytes 9.16 Gbits/sec
[ 5]   5.00-6.00   sec 1.06 GBytes 9.15 Gbits/sec
[ 5]   6.00-7.00   sec 1.08 GBytes 9.31 Gbits/sec
[ 5]   7.00-8.00   sec 1.07 GBytes 9.22 Gbits/sec
[ 5]   8.00-9.00   sec 1.05 GBytes 8.98 Gbits/sec
[ 5]   9.00-10.00 sec 1.08 GBytes 9.31 Gbits/sec
[ 5] 10.00-10.04 sec 40.6 MBytes 9.43 Gbits/sec
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval           Transfer     Bandwidth
[ 5]   0.00-10.04 sec 0.00 Bytes 0.00 bits/sec                  sender
[ 5]   0.00-10.04 sec 10.7 GBytes 9.16 Gbits/sec                  receiver
———————————————————–
Server listening on 5201
———————————————————–

Overall costs

Here is the total amount of money (not time 🙂 ) for building such a lab :

… working on it..

Low power/good performance Ceph (jewel) cluster monitored with grafana/influxdb/telegraf on Centos 7.3

December 16, 2016February 4, 2017 / selyanblog / Leave a comment

The hardware..

In my Homelab : Highly resilient “datacenter-in-two-boxes” with Centos 7 and Ceph jewel article, I’ve told how to build a low power homelab.

With this hardware, a bunch of low power disks (2,5 5400), you can build a low power virtualized storage system with Ceph, and store all your data with top-level NAS software

The software :

Centos 7.3 (1611) x86-64 “minimal”

Ceph “jewel” x86-64

Puppet (configuration management software)

Topology

Number of MONs

It’s recommended to install at least 3 MONs for resilience reasons.

For me needs, I will install 5 MONs on my 5 hosts

Installing the cluster

Preparing the hardware and the OS

Requirements :

This blog do not cover the OS installation procedure. Before you continue be sure to configure your OS with these additional requirements :

Use a correct DNS configuration or configure manually each /etc/hosts file of the hosts.
You will need at least 3 nodes, plus an admin node (for cluster deployment, monitoring, ..)
You MUST install NTP on all nodes :

root@n0:~# yum install -y ntp
root@n0:~# systemctl enable ntpd

then configure /etc/ntp.conf with your preferred NTP servers

It’s safe and more efficient to have a time source close to your cluster. Wifi AP, DSL routers often provide such services. My configuration uses my ADSL router, based on openWRT (you can setup ntpd on openwrt…)

Then run :

root@n0:~# timedatectl set-ntp true

disable SELINUX (see /etc/selinux/config)
disable your firewalld (systemctl disable firewalld.service)

Finally, ensure everything’s ok when rebooting your node..

Create the ceph admin user on each node :

On each node, create a ceph admin user (used for deployment tasks). It’s important to choose a different user than “ceph” (used by the ceph installer..)

Note : you can omit the -s directive of useradd, it’s a personal choice to use bash.

root@n0:~# sudo useradd -d /home/cephadm -m cephadm -s /bin/bash
root@n0:~# sudo passwd cephadm
 Enter new UNIX password:
 Retype new UNIX password:
 passwd: password updated successfully

root@n0:~# echo "cephadm ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
root@n0:~# chmod 444 /etc/sudoers.d/ceph

and so on to host admin, and nodes n1, n2 [and n3, …]

Also, install lsb first, it will be useful later.

yum install redhat-lsb-core

Setup the ssh authentication with cryptographic keys

On the admin node :

Create the ssh key for the user cephadm

root@admin:~# su – cephadm
cephadm@admin:~$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/cephadm/.ssh/id_dsa):
Created directory ‘/home/cephadm/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephadm/.ssh/id_dsa.
Your public key has been saved in /home/cephadm/.ssh/id_dsa.pub.
The key fingerprint is:
ec:16:ad:b4:76:e4:32:c6:7c:14:45:bc:c3:78:5a:cf cephadm@admin
The key’s randomart image is:
+—[DSA 1024]—-+
|           oo    |
|           ..    |
|          .o .   |
|       . …*    |
|        S ++ +   |
|       = B.   E |
|        % +      |
|       + =       |
|                 |
+—————–+
cephadm@admin:~$

Then push it on the nodes of the cluster:

[cephadm@admin ~]$ ssh-copy-id cephadm@n0

[cephadm@admin ~]$ ssh-copy-id cephadm@n1

[cephadm@admin ~]$ ssh-copy-id cephadm@n2

[cephadm@admin ~]$ ssh-copy-id cephadm@n3

[cephadm@admin ~]$ ssh-copy-id cephadm@n4

Or better to automate (if you do this a lot of times, 🙂 ):

#!/bin/sh
# sudo yum install moreutils sshpass openssh-clients
echo ‘Enter password:’;
read -s SSHPASS;
export SSHPASS;
for i in {0..4}; do sshpass -e ssh-copy-id -o StrictHostKeyChecking=no cephadm@n$i.int.intra -p 22 ; done
export SSHPASS=”

Install and configure dsh (distributed shell)

[root@admin ~]# yum install -y gcc

[root@admin ~]# yum install -y gcc-c++

[root@admin ~]# yum install -y wget

[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/dsh-0.25.9.tar.gz

[root@admin ~]# wget https://www.netfort.gr.jp/~dancer/software/downloads/libdshconfig-0.20.9.tar.gz

[root@admin ~]# tar xvfz libdshconfig-0.20.9.tar.gz

[root@admin ~]# cd libdshconfig-0.20.9

[root@admin libdshconfig-0.20.9]# ./configure

[root@admin libdshconfig-0.20.9]# make

[root@admin libdshconfig-0.20.9]# make install

[root@admin ~]# tar xvfz dsh-0.25.9.tar.gz

[root@admin ~]# cd dsh-0.25.9

[root@admin dsh-0.25.9]# ./configure

[root@admin dsh-0.25.9]# make

[root@admin dsh-0.25.9]# make install

[root@admin ~]# echo /usr/local/lib > /etc/ld.so.conf.d/dsh.conf
[root@admin ~]# ldconfig

Done. Then configure it :

[root@admin ~]# vi /usr/local/etc/dsh.conf

insert these lines :

remoteshell =ssh
waitshell=1 # whether to wait for execution

[root@admin ~]# su – cephadm

cephadm@admin:~$ cd
cephadm@admin:~$ mkdir .dsh
cephadm@admin:~$ cd .dsh
cephadm@admin:~/.dsh$ for i in {0..4} ; do echo “n$i” >> machines.list ; done

Test…

[cephadm@admin ~]$ dsh -aM uptime
n0: 16:23:21 up 3 min, 0 users, load average: 0.20, 0.39, 0.20
n1: 16:23:22 up 3 min, 0 users, load average: 0.19, 0.40, 0.21
n2: 16:23:23 up 3 min, 0 users, load average: 0.13, 0.38, 0.20
n3: 16:23:24 up 4 min, 0 users, load average: 0.00, 0.02, 0.02
n4: 16:23:25 up 3 min, 0 users, load average: 0.24, 0.38, 0.20

[cephadm@admin ~]$ dsh -aM cat /proc/cpuinfo | grep model\ name
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)
n4: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n4: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz

Good.. !!

Now you’re ready to install your cluster with automated commands from your admin node. Not than several other solutions are good enough, like cssh (clustered…). Choose the best for your needs 😉

Well,

Now I’m assuming you have followed the installation procedure and the requirements above :).

Here’s my configuration :

n0 : 192.168.10.210/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n1 : 192.168.10.211/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n2 : 192.168.10.212/24 1TB HGST 2.5" 5400rpm (data) + 20Gb on Crucial MX200 SSD (journal)
n3 : 192.168.10.213/24 1TB WD Red 2.5" 5400rpm (data) + 20Gb on SM951 NVMe SSD (journal)
n4 : 192.168.10.214/24 1TB Hitachi 3.5" 7200rpm (data) + 20Gb on Crucial MX100 SSD (journal) 
n5 : 192.168.10.215/24 1TB ZFS (on 2x WD Green 5TB) + 20Gb on Crucial MX100 SSD (journal)
admin : 192.168.10.177/24 (VM)

Finally, don’t forget to change your yum repositories if you installed the OSes with a local media. They should now point to a mirror for all updates (security and software).

Reboot your nodes if you wan to be very sure you haven’t forget anything, and test them with dsh, for example for ntp

[cephadm@admin ~]$ dsh -aM timedatectl|grep NTP

[cephadm@admin ~]$ dsh -aM timedatectl status|grep NTP
n0:      NTP enabled: yes
n0: NTP synchronized: yes
n1:      NTP enabled: yes
n1: NTP synchronized: yes
n2:      NTP enabled: yes
n2: NTP synchronized: yes
n3:      NTP enabled: yes
n3: NTP synchronized: yes
n4:      NTP enabled: yes
n4: NTP synchronized: yes

…

Install your Ceph cluster

Get the software

Ensure you are up to date on each nodes, at the very beginning of this procedure.

Feel free to use dsh from the admin node for each task you would like to apply to the nodes 😉

[cephadm@admin ~]$ dsh -aM “sudo yum -y upgrade”

Install the repos

On the admin node only, configure the ceph repos.

You have the choice : either do it like this if you want to download Ceph packages from the internet :

[cephadm@admin ~]$ sudo yum install https://download.ceph.com/rpm-jewel/el7/noarch/ceph-release-1-1.el7.noarch.rpm

Or if you want a local mirror, look at the section below telling how to setup Puppet (for example) to do that. I prefer this option for myself, because I have a local mirror (for testing purposes, it’s better to download locally)

Install Ceph-deploy

This tool is written in Python.

[cephadm@admin ~]$ sudo yum install ceph-deploy

Install Ceph

Always on the admin node, create a directory that will contain all the config for your cluster :

cephadm@admin:~$ mkdir cluster

cephadm@admin:~$ cd cluster

I have chosen to install 4 monitors (3 would be sufficient at home, but my needs isn’t your needs).

cephadm@admin:~/cluster$ ceph-deploy new n{0,2,4,5}

(It generates a lot of stdout messages)

Now edit ceph.conf (in the “cluster” directory) and tell ceph you want to shard x3, and add the cluster and public networks in the [global] section ; for myself : 10.0.0.0/8 and 192.168.10.0/24

The file ceph.conf should contain the following lines now :

[global]
fsid = 74a80a50-b7f9-4588-baa4-bb242c3d4cf0
mon_initial_members = n0, n1, n3
mon_host = 192.168.10.210,192.168.10.211,192.168.10.213
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3
cluster network = 10.1.1.0/24
public network = 192.168.10.0/24

[osd]
osd mkfs options type=btrfs

osd journal size = 20000

Please note that I will use btrfs to store the data. My kernel is at a sufficient level for that (4.9), and i’ve experienced obvious filesystem corruptions sometimes, when simply rebooting my nodes which had kernel 3.10 and an XFS partition for the OSDs.

If you install from a local mirror :

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --repo-url {http mirror} --gpg-url {http gpg url} --release jewel n$i; done

For ex per me : for i in {0..5}; do ceph-deploy install –repo-url http://mirror/ceph/rpm-jewel/el7/ –release jewel n$i; done

Else :

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy install --release jewel n$i; done

This command generates a lot of logs (downloads, debug messages, warnings…) but should return without error. Otherwise check the error and google it. You can just restart the ceph-deploy program, depending on the error it will work the second time 😉 (I’ve experienced some problems accessing the ceph repository, for ex….)

Create the mons:

cephadm@admin:~/cluster$ ceph-deploy mon create-initial

Idem, a lot of logs… but no error..

Create the OSDs (storage units)

You have to know which device will be used for the data on each node and which device for the journal. If you are building a ceph cluster for a production environment, you should use SSDs for the journal partition.for testing purpose, you can use only one device.

In my case, I took care to make the storage osd disks to be /dev/vdb on all nodes, and journal (SSD) on /dev/vdc..

Important note : if you previously installed ceph on a device, you MUST “zap” (delete) it before. Use the command “ceph-deploy disk zap n3:sdb” for example.

Execute this step if you don’t know anything about the past usage of your disks.

Zap disks If you have a separate partition for SSDs (/dev/vdc, here):

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdb; done

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy disk zap n$i:vdc; done

If you use only one device :

cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy disk zap n$i:vdb; done

Create the OSDsNote : use –fs-type btrfs on “osd create” if you want (as me) another filesystem than xfs. I’ve got obvious problems with xfs (corruptions while rebooting..)

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create –fs-type btrfs n$i:vdb:vdc; done

Else use the defaults (xfs)

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy osd create n$i:vdb:vdc; done

And remember, if you have only one device (vdb for ex), use this instead (defaults with xfs):

cephadm@admin:~/cluster$ for i in {0..4}; do ceph-deploy osd create n$i:vdb; done

Deploy the ceph configuration to all storage nodes

cephadm@admin:~/cluster$ for i in {0..5}; do ceph-deploy admin n$i; done

And check the permissions. For some reasons the permissions are not correct:

cephadm@admin:~/cluster$ dsh -aM “ls -l /etc/ceph/*key*”
n0: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring

To correct this, issue the following command :

cephadm@admin:~/cluster$ dsh -aM “sudo chmod +r /etc/ceph/ceph.client.admin.keyring”

and check :

cephadm@admin:~/cluster$ dsh -aM “ls -l /etc/ceph/*key*”
n0: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring

Finally, install the metadata servers

cephadm@admin:~/cluster$ ceph-deploy mds create n0 n1 n5

and the rados gateway

cephadm@admin:~/cluster$ ceph-deploy rgw create n3 n5

(on n3 for me)

One more time 😉 remember to check the ntp status of the nodes :

cephadm@admin:~/cluster$ dsh -aM “timedatectl|grep synchron”
n0: NTP synchronized: yes
n1: NTP synchronized: yes
n2: NTP synchronized: yes
n3: NTP synchronized: yes
n4: NTP synchronized: yes

check the cluster, on one node type :

[cephadm@n0 ~]$ ceph status
cluster 2a663a93-7150-43f5-a8d2-e40e2d9d175f
health HEALTH_OK
monmap e2: 5 mons at {n0=192.168.10.210:6789/0,n1=192.168.10.211:6789/0,n2=192.168.10.212:6789/0,n3=192.168.10.213:6789/0,n4=192.168.10.214:6789/0}
election epoch 8, quorum 0,1,2,3,4 n0,n1,n2,n3,n4
osdmap e32: 5 osds: 5 up, 5 in
flags sortbitwise,require_jewel_osds
pgmap v97: 104 pgs, 6 pools, 1588 bytes data, 171 objects
173 MB used, 3668 GB / 3668 GB avail
104 active+clean

Done !

Test your brand new ceph cluster

You can create a pool to test your new cluster :

[cephadm@n0 ~]$ rados mkpool test
successfully created pool test
[cephadm@n0 ~]$ rados lspools
rbd
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
test

[cephadm@n0 ~]$ rados put -p test .bashrc .bashrc
[cephadm@n0 ~]$ ceph osd map test .bashrc
osdmap e34 pool ‘test’ (6) object ‘.bashrc’ -> pg 6.3d13d849 (6.1) -> up ([2,4,1], p2) acting ([2,4,1], p2)

A quick look at the cluster network, ensure it’s used as it should be (tcpdump -i on n0) :

22:43:17.137802 IP 10.1.1.12.50248 > n0.int.intra.acnet: Flags [P.], seq 646:655, ack 656, win 1424, options [nop,nop,TS val 3831166 ecr 3830943], length 9
22:43:17.177297 IP n0.int.intra.acnet > 10.1.1.12.50248: Flags [.], ack 655, win 235, options [nop,nop,TS val 3831203 ecr 3831166], length 0
22:43:17.205945 IP 10.1.1.13.42810 > n0.int.intra.acnet: Flags [P.], seq 393:515, ack 394, win 1424, options [nop,nop,TS val 4392067 ecr 3829192], length 122
22:43:17.205999 IP n0.int.intra.acnet > 10.1.1.13.42810: Flags [.], ack 515, win 252, options [nop,nop,TS val 3831231 ecr 4392067], length 0
22:43:17.206814 IP n0.int.intra.acnet > 10.1.1.13.42810: Flags [P.], seq 394:525, ack 515, win 252, options [nop,nop,TS val 3831232 ecr 4392067], length 131
22:43:17.207547 IP 10.1.1.13.42810 > n0.int.intra.acnet: Flags [.], ack 525, win 1424, options [nop,nop,TS val 4392069 ecr 3831232], length 0

….

Good !!

Now, “really” test your new cluster

Cf http://docs.ceph.com/docs/giant/rbd/libvirt/ :

First deploy the admin part of ceph on the destination system that will test your cluster

On the admin node :

[cephadm@admin cluster]$ ceph-deploy –overwrite-conf admin hyp03

On an hypervisor, with access to the network of course :

First give permissions for each process to know your cluster

chmod +r /etc/ceph/ceph.client.admin.keyring

[root@hyp03 ~]# ceph osd pool create libvirt-pool 128 128
pool ‘libvirt-pool’ created

[root@hyp03 ~]# ceph auth get-or-create client.libvirt mon ‘allow r’ osd ‘allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool’
[client.libvirt]
key = AQDsdMYVYR0IdmlkKDLKMZYUifn+lvqMH3D7Q==

Create a 16G image on your new cluster

[root@hyp03 ~]# qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 16G

Formatting ‘rbd:libvirt-pool/new-libvirt-image’, fmt=rbd size=17179869184 cluster_size=0

Important : Jewel brings RBD features not compatible with Centos 7.3. Disable it, otherwise you won’t be able to mount your RBD image (either with rbd map or through qemu-img)

rbd feature disable libvirt-pool/new-libvirt-image exclusive-lock object-map fast-diff deep-flatten

Create a secret

cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
        <usage type='ceph'>
                <name>client.libvirt secret</name>
        </usage>
</secret>
EOF
[root@hyp03 ~]# sudo virsh secret-define –file secret.xml

[root@hyp03 ~]# ceph auth get-key client.libvirt | sudo tee client.libvirt.key

[root@hyp03 ~]# sudo virsh secret-set-value –secret 89ce37fe-3a9f-4aad-9fdf-9b239b489945 –base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml

Replicate the secret on all hosts you want the VM to run (especially for using live migration). repeat the previous steps on each of these hosts, but with a modified secret.xml file that includes the secret UUID created at the first run on the first host.

<secret ephemeral='no' private='no'>
        <uuid>89ce37fe-3a9f-4aad-9fdf-9b239b489945</uuid>
        <usage type='ceph'>
                <name>client.libvirt secret</name>
        </usage>
</secret>

Follow the guide http://docs.ceph.com/docs/giant/rbd/libvirt/ for the vm configuration and then

[root@hyp03 ~]# virsh start dv03
Domain dv03 started

You’re done !

Configure the cluster

Crush map

For my needs, I want my cluster to be up even if one of my host is down.

In my home “datacenter”, I’ve two “rack”, two “physical hosts”, and 6 “ceph virtual hosts”, each of these running a 1TB OSD

How do I ensure the data replication occurs in a matter that no data is only on a phycial host ? you will do that bu managing your ceph CRUSH map with rules..

First, organize your ceph hosts in your “dataxenter”.

Because my home is not a really datacenter, for this example, I will name “hosts” the virtial machines hosting centos 7.3/ceph with an OSD for each VM.

I will name “rack” the deux physical hosts that run the “hosts” (VM)

I will call “datacenter” the rack where my two physical hosts are installed.

Create the datacenter, racks, and move them into the right place

ceph osd crush add-bucket rack1 rack
ceph osd crush move n0 rack=rack1
ceph osd crush move n1 rack=rack1
ceph osd crush move n2 rack=rack1
ceph osd crush move n3 rack=rack1
ceph osd crush move rack1 root=default
ceph osd crush add-bucket rack2 rack
ceph osd crush move rack2 root=default
ceph osd crush move n4 rack=rack2
ceph osd crush move n5 rack=rack2
ceph osd crush add-bucket dc datacenter
ceph osd crush move dc root=default
ceph osd crush move rack1 datacenter=dc
ceph osd crush move rack2 datacenter=dc

Look at the results

[root@hyp03 ~]# ceph osd tree
ID WEIGHT TYPE NAME             UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.54849 root default
-10 5.54849     datacenter dc
-8 3.63879         rack rack1
-2 0.90970             host n0
0 0.90970                 osd.0      up 1.00000          1.00000
-3 0.90970             host n1
1 0.90970                 osd.1      up 1.00000          1.00000
-4 0.90970             host n2
2 0.90970                 osd.2      up 1.00000          1.00000
-5 0.90970             host n3
3 0.90970                 osd.3      up 1.00000          1.00000
-9 1.90970         rack rack2
-6 0.90970             host n4
4 0.90970                 osd.4      up 1.00000          1.00000
-7 1.00000             host n5
5 1.00000                 osd.5      up 1.00000          1.00000

Configuration management : puppet

Now we have to install a configuration management tool. It saves a lot of time..

Master installation

On the admin node, we will install the master :

[root@admin ~]# sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm

[root@admin ~]# sudo yum -y install puppetserver

[root@admin ~]# systemctl enable puppetserver

[root@admin ~]# sudo systemctl start puppetserver

Agents installation :

Use dhs from the admin node :

[root@admin ~]# dsh -aM “sudo rpm -ivh https://yum.puppetlabs.com/puppetlabs-release-pc1-el-7.noarch.rpm”

[root@admin ~]# dsh -aM “sudo yum -y install puppet-agent”

Enable the agent

[root@admin ~]# dsh -aM “systemctl enable puppet”

Configure the agents, you need to set the server name if it’s not “puppet” (default). Use a fqdn, it’s important.

[root@admin ~]# dsh -aM “sudo /opt/puppetlabs/bin/puppet config set server admin.int.intra”

Start the agent

[root@admin ~]# dsh -aM “systemctl start puppet”

Puppet configuration

On the admin node, check all the agents have published their certificates to the server

[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list
“n0.int.intra” (SHA256) 95:6B:A3:07:DA:70:04:D7:9B:18:4D:64:30:39:A1:19:9E:68:B9:6B:9C:92:DC:AB:98:36:16:6D:F3:66:B3:56
“n1.int.intra” (SHA256) 07:E3:1B:1F:6F:80:33:6C:A9:A4:96:88:71:A0:74:19:B0:DE:3A:EA:B2:36:2A:38:43:B1:5D:3E:92:3C:D0:47
“n2.int.intra” (SHA256) 62:2E:7E:91:CE:75:53:0C:DA:16:28:C7:14:EA:05:33:CD:DA:8D:B8:A4:A3:59:1B:B0:78:3B:29:AE:A6:CB:C4
“n3.int.intra” (SHA256) 77:92:0F:75:2F:75:E2:8F:68:22:4A:43:4C:BB:79:C5:24:6D:BB:98:42:D0:87:A5:13:57:52:9C:3D:82:D8:74
“n4.int.intra” (SHA256) 55:F4:15:F3:83:3A:39:99:B6:15:EC:D6:09:24:6D:6D:D2:07:9B:54:F5:73:15:C5:C8:74:9F:8F:BB:A0:E2:43

Sign the certificates

[root@admin ~]# for i in {0..4}; do /opt/puppetlabs/bin/puppet cert sign n$i.int.intra ; done

Finished ! you can check all the nodes with a valid certificate :

[root@admin ~]# sudo /opt/puppetlabs/bin/puppet cert list –all
+ “admin.int.intra” (SHA256) F5:13:EE:E9:C2:F1:A7:86:01:3C:95:EE:61:EE:53:21:E9:75:15:24:45:FB:67:B8:D9:60:60:FE:DE:93:59:F6 (alt names: “DNS:puppet”, “DNS:admin.int.intra”)
+ “n0.int.intra”    (SHA256) 9D:C0:3E:AB:FD:67:00:DB:B5:25:CD:23:71:A4:2F:C5:3F:A6:56:FE:55:CA:5D:27:95:C6:97:79:A9:B2:7F:CB
+ “n1.int.intra”    (SHA256) 4F:C6:C1:B9:CD:21:4C:3A:76:B5:CF:E4:56:0D:20:D2:1D:72:35:7B:D9:53:86:D9:CD:CB:8D:3C:E8:39:F4:C2
+ “n2.int.intra”    (SHA256) D7:6E:85:63:04:CC:C6:24:79:E3:C2:CE:F2:0F:5B:2E:FA:EE:D9:EF:9C:E3:46:6A:83:9F:AA:DA:5D:3F:F8:52
+ “n3.int.intra”    (SHA256) 1C:95:61:C8:F6:E2:AF:4F:A5:52:B3:E0:CE:87:CF:16:02:2B:39:2C:61:EC:20:21:D0:BD:33:70:42:7A:6E:D9
+ “n4.int.intra”    (SHA256) E7:B6:4B:1B:0A:22:F8:C4:F1:E5:A9:3B:EA:17:5F:54:41:97:68:AF:D0:EC:A6:DB:74:3E:F9:7E:BF:04:16:FF

You have now a working puppet config management system running fine..

Monitoring

Telegraf

Install Telegraf on the nodes, with a puppet manifest.

vi /etc/puppetlabs/code/environments/production/manifests/site.pp

include this text in the file site.pp :

node ‘n0’, ‘n1’, ‘n2’, ‘n3’, ‘n4’ {
file {‘/etc/yum.repos.d/influxdb.repo’:
ensure => present, # make sure it exists
mode => ‘0644’, # file permissions
content => “[influxdb]\nname = InfluxDB Repository – RHEL \$releasever\nbaseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable\nenabled = 1\ngpgcheck = 1\ngpgkey = https://repos.influxdata.com/influxdb.key\n”,
}
}

Install it on all nodes (we could do that with puppet, too):

dsh -aM “sudo yum install telegraf”

Create a puppet module for telegraf

[root@admin modules]# cd /etc/puppetlabs/code/modules
[root@admin modules]# mkdir -p telegraf_client/{files,manifests,templates}

Create a template for telegraf.conf

[root@admin telegraf_client]# vi templates/telegraf.conf.template

put the following in that file (note the fqdn variable) :

[tags]

# Configuration for telegraf agent
[agent]
debug = false
flush_buffer_when_full = true
flush_interval = “15s”
flush_jitter = “0s”
hostname = “<%= fqdn %>”
interval = “15s”
round_interval = true

Create a template for the inputs :

[root@admin telegraf_client]# vi templates/inputs_system.conf.template

put the following (no variables, yet. customize for your needs..) :

# Read metrics about CPU usage
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = [ “usage*” ]

# Read metrics about disk usagee
[[inputs.disk]]
fielddrop = [ “inodes*” ]
mount_points=[“/”,”/home”]

# Read metrics about diskio usage
[[inputs.diskio]]
devices = [“sda2″,”sda3”]
skip_serial_number = true

# Read metrics about network usage
[[inputs.net]]
interfaces = [ “eth0” ]
fielddrop = [ “icmp*”, “ip*”, “tcp*”, “udp*” ]

# Read metrics about memory usage
[[inputs.mem]]
# no configuration

# Read metrics about swap memory usage
[[inputs.swap]]
# no configuration

# Read metrics about system load & uptime
[[inputs.system]]
# no configuration

Create a template for the outputs :

[root@admin telegraf_client]# vi templates/outputs.conf.template

and put the following text in the file

[[outputs.influxdb]]
database = “telegraf”
precision = “s”
urls = [ “http://admin:8086” ]
username = “telegraf”
password = “your_pass”

create the manifest for your module

[root@admin ~]# vi /etc/puppetlabs/code/modules/telegraf_client/manifests/init.pp

and add the following contents :

class telegraf_client {

package { ‘telegraf’:
ensure => installed,
}

file { “/etc/telegraf/telegraf.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/telegraf.conf.template”),
}

file { “/etc/telegraf/telegraf.d/outputs.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/outputs.conf.template”),
}

file { “/etc/telegraf/telegraf.d/inputs_system.conf”:
ensure => present,
owner => root,
group => root,
mode => “644”,
content => template(“telegraf_client/inputs_system.conf.template”),
}

service { ‘telegraf’:
ensure => running,
enable => true,
}
}

And finally, include the module in the global puppet manifest file. Here is mine :

[root@admin ~]# vi /etc/puppetlabs/code/environments/production/manifests/site.pp

(which content is 🙂

node default {
case $facts[‘os’][‘name’] {
‘Solaris’: { include solaris }
‘RedHat’, ‘CentOS’: { include centos }
/^(Debian|Ubuntu)$/: { include debian }
default: { include generic }
}
}

node ‘n0′,’n1′,’n2′,’n3′,’n4’ {
include cephnode
}

class cephnode {
include telegraf_client
}

class centos {
yumrepo { “CentOS-OS-Local”:
baseurl => “http://nas4/centos/\$releasever/os/\$basearch”,
descr => “Centos int.intra mirror (os)”,
enabled => 1,
gpgcheck => 0,
priority => 1
}
yumrepo { “CentOS-Updates-Local”:
baseurl => “http://nas4/centos/\$releasever/updates/\$basearch”,
descr => “Centos int.intra mirror (updates)”,
enabled => 1,
gpgcheck => 0,
priority => 1
}

yumrepo { “InfluxDB”:
baseurl => “https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable”,
descr => “InfluxDB Repository – RHEL $releasever”,
enabled => 1,
gpgcheck => 1,
gpgkey => “https://repos.influxdata.com/influxdb.key”
}
}

Wait for minutes for puppet to apply your work on the nodes or run :

[root@admin ~]# dsh -aM “/opt/puppetlabs/bin/puppet agent –test”

Check telegraf is up and running. And check the measurements in InfluxDB.

Monitoring with InfluxDB/Telegraf

Monitoring a Ceph cluster and other things with InfluxDB, Grafana, collectd & Telegraf

October 26, 2016November 27, 2016 / selyanblog / Leave a comment

Now we have a working Ceph cluster (cf Ceph Cluster) you will certainly want to monitor it..

Here is another cool open source suite of software 🙂

Using Debian 8.6 Jessie for this memo

This blog is written with the help of several web pages, including https://www.guillaume-leduc.fr/monitoring-de-votre-serveur-avec-telegraf-influxdb-et-grafana.html

Thanks Guillaume 😉

Influx DB

I’ve a virtualized admin node in my home “datacenter”. This node will be used to collect and graph these stats.

InfluxDB is a great, time series database. I wille deploy it for my needs (monitoring all my systems, at first my ceph cluster)

First add the InfluxDB repository:

cephadm@admin:~$ curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -

root@admin:~# echo "deb https://repos.influxdata.com/debian jessie stable" > /etc/apt/sources.list.d/influxdb.list

cephadm@admin:~$ sudo apt-get update
cephadm@admin:~$ sudo apt-get install influxdb

Installation done.

Now configure the databases. For my needs, I will create two databases that will be two datasources in grafana (collectd and telegraf). Collectd and Telegraf are two well known agents that collects statistics from hosts. Collectd is useful for some hosts like routers (I own an openwrt internet router, very useful to monitor internet bandwidth..)

For Ceph and the nodes we will use Telegraf.

For other speialized things we will use collectd

Enable influxdb, start and enter the database like this :

root@admin:~# systemctl enable influxdb
root@admin:~# systemctl start influxdb

root@admin:~# influx
 Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
 Connected to http://localhost:8086 version 1.0.2
 InfluxDB shell version: 1.0.2
 >

Create the databases :

> CREATE DATABASE telegraf
> CREATE DATABASE collectd_db

Check the databases :

> SHOW DATABASES;
 name: databases
 ---------------
 name
 telegraf
 _internal
 collectd_db

Create a unique user for the monitoring activities, for ex : “mon”

> CREATE USER telegraf WITH PASSWORD 'pass'
> GRANT ALL ON telegraf TO telegraf
> GRANT ALL ON collect_db TO telegraf

Look at the retention policies for InfluxDB if you want your database to purge automatically the data. This function is not detailed here.Configure InfluxDB to receive the collectd data on port 25826 (default). In /etc/influxdb/influxdb.conf, insert this conf :

[[collectd]]
  enabled = true
  bind-address = ":25826"
  database = "collectd_db"
  typesdb = "/usr/share/collectd/types.db"

Then install collectd on the admin node to get /usr/share/collectd/types.db (you can just copy the file from an agent.. but you may want to monitor your admin node too 😉 )
TelegrafOn the admin node, use dsh to install telegraf on the nodes :

cephadm@admin:~$ dsh -aM sudo apt-get install -y curl

on all nodes, run :

cephadm@admin:~$ dsh -aM "sudo curl -sL https://repos.influxdata.com/influxdb.key | sudo apt-key add -"

cephadm@admin:~$ dsh -aM "echo 'deb https://repos.influxdata.com/debian jessie stable' | sudo tee /etc/apt/sources.list.d/influxdb.list"

Then install telegraf, for example from the admin node using dsh :

cephadm@admin:~$ dsh -aM sudo apt-get update

cephadm@admin:~$ dsh -aM sudo apt-get install -y telegraf

Configure the nodes : in /etc/telegraf/telegraf.conf only let this, for ex for my node 1 (n1) :

[tags]
 
# Configuration for telegraf agent
[agent]
  debug = false
  flush_buffer_when_full = true
  flush_interval = "15s"
  flush_jitter = "0s"
  hostname = "n1"
  interval = "15s"
  round_interval = true

hostname : replace it with the influx host name

Then configure the outputs (to the central influxDB server). In /etc/telegraf/telegraf.d/outputs.conf :

[[outputs.influxdb]]
  database = "telegraf"
  precision = "s"
  urls = [ "http://admin:8086" ]
  username = "telegraf"
  password = "pass"

And configure the inputs :

Then configure the outputs (to the central influxDB server). In /etc/telegraf/telegraf.d/inputs_system.conf :

# Read metrics about CPU usage
[[inputs.cpu]]
  percpu = false
  totalcpu = true
  fieldpass = [ "usage*" ]
 
# Read metrics about disk usagee
[[inputs.disk]]
  fielddrop = [ "inodes*" ]
  mount_points=["/","/home"]
 
# Read metrics about diskio usage
[[inputs.diskio]]
  devices = ["sda2","sda3"]
  skip_serial_number = true
 
# Read metrics about network usage
[[inputs.net]]
  interfaces = [ "eth0" ]
  fielddrop = [ "icmp*", "ip*", "tcp*", "udp*" ]
 
# Read metrics about memory usage
[[inputs.mem]]
  # no configuration
 
# Read metrics about swap memory usage
[[inputs.swap]]
  # no configuration
 
# Read metrics about system load & uptime
[[inputs.system]]
  # no configuration

Enable and restart the telegraf service on all hosts

cephadm@admin:~$ dsh -aM sudo systemctl enable telegraf

cephadm@admin:~$ dsh -aM sudo systemctl start telegraf

Check the data in InfluxDB :

cephadm@admin:~$ influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
> use telegraf
Using database telegraf
> show measurements;
name: measurements
——————
name
cpu
disk
diskio
kernel
mem
net
processes
swap
system

It worked !

Collectd

If you have other hosts like openwrt, and you want to monitor them

Install collectd on the hosts

opkg update

opkg install collectd collectd-mon-network

Configure collectd to send your data to the admin node :

insert this text in /etc/collectd.conf

## CollectD Servers
LoadPlugin network
<Plugin network>
    Server "admin.int.intra" "25826"
</Plugin>

Set the hostname, polling interval and other things. User/pass if you need..

Restart collectd with this config, and check in the database

cephadm@admin:~$ influx
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://localhost:8086 version 1.0.2
InfluxDB shell version: 1.0.2
> use collectd_db
Using database collectd_db
> show measurements;
name: measurements
——————
name
conntrack_value
cpu_value
df_free
df_used
disk_read
disk_write
interface_rx
interface_tx
iwinfo_
iwinfo_value
load_longterm
load_midterm
load_shortterm
memory_value
netlink_rx
netlink_tx
netlink_value
processes_majflt
processes_minflt
processes_processes
processes_syst
processes_threads
processes_user
processes_value
tcpconns_value
wireless_value

It worked 😉

Grafana

To do later.

Here is the results :

Technology, not just for fun..

October 26, 2016October 26, 2016 / selyanblog / Leave a comment

Welcome to my blog. You will find there some interesting things about home entertainment..

An efficient Ceph cluster

October 26, 2016December 10, 2016 / selyanblog / Leave a comment

nstall Calamari

Why ?

For years, I’ve used various technologies at home to host my family data (photos, videos, and other files related to our working activities..).

At the very beginning, I owned a single linux host with samba, then a DS409slim Synology NAS, and now : a nice ZFS storage system, and so on….

I’ve always been searching for a perfect resilient storage and computing system for my VMs.

The hardware..

First, I searched for the best hardware for my needs :

1) low power but powerful

2) high speed hard disk and network interfaces

3) cheap

4) could be used and/or re-used later for other things..

The following alternatives :

ARM
- pros : energy efficient
- cons : poor performance, including disk interfaces..
Intel :
- pros : standard, powerful, ..
- cons : energy…

I looked at several hardware boxes :

ODROID XU4 (octo core, USB 3.0)

Then I found this hardware : Z8300 atom quad core, 2GB DDR3 RAM, 32GB for just 99-2 = 97 euros :

https://www.amazon.fr/BoLv-x5-Z8300-Processor-Graphics-Windows10/dp/B01DFJH78U/ref=sr_1_1?ie=UTF8&qid=1477491551&sr=8-1&keywords=bolv+z83

Cheap, seems powerful and energy efficient…

The software :

Debian 8.6 jessie x86_64 (ISO multi arch)

Ceph “jewel”

Topology

Number of MONs

It’s recommended to install 3 MONs for resilience reasons. As I’ve 4 nodes, i will choose 2 physical hosts for the two first MONs, and a virtualized host for the last MON (this VM is on the main server of my home “datacenter”)

Installing the cluster

Preparing the hardware and the OS

Requirements :

This blog do not cover the OS installation procedure. Before you continue be sure to :

Use a correct DNS configuration or configure manually each /etc/hosts file of the hosts.
You will need at least 3 nodes, plus an admin node (for cluster deployment, monitoring, ..)
You MUST install NTP on all nodes.

Now I’m assuming you have followed the installation procedure and the requirements above :). Here’s my configuration :

n0 : 192.168.10.210/24 (physical host : Z83)
n1 : 192.168.10.211/24 (physical host : Z83) 
n2 : 192.168.10.212/24 (physical host : Z83)
n3 : 192.168.10.213/24 (VM)
admin : 192.168.10.177/24 (VM)

Once you have installed jessie on the four nodes and the admin node, then let’s configure them to be deployed only from the admin node. At this point,

1) don’t forget to change your apt repositories if you installed the OSes with a local media. They should now point to a mirror for all updates (security and software).

2) check if “sudo” is intalled

root@n0:~# apt-get install sudo -y

Note : at the time these lines are written, you should install some additional packages to avoid the famous SSH hang behavior when rebooting jessie (systemd related..).

On all nodes, you’ll have to do that :

root@n0:~# apt-get install libpam-systemd dbus -y
root@n0:~# reboot (will hang your ssh session for the last time ;) )

And remember, you MUST also setup ntp. to install ntp :

On each node

sudo apt-get -y install ntp ntpdate ntp-doc

sudo systemctl enable ntp

sudo timedatectl set-ntp true

Configure it and reboot. It’s safe and more efficient to have a time source close to your cluster. Wifi AP, DSL routers often provide such services. My configuration uses my ADSL router, based on openWRT (you can setup ntpd on openwrt…)

Install Ceph

Create the ceph admin user on each node :

On each node, create a ceph admin user (used for deployment tasks). It’s important to choose a different user than “ceph” (used by the ceph installer..)

Note : you can omit the -s directive of useradd, it’s a personal choice to use bash.

root@n0:~# sudo useradd -d /home/cephadm -m cephadm -s /bin/bash
root@n0:~# sudo passwd cephadm
 Enter new UNIX password:
 Retype new UNIX password:
 passwd: password updated successfully

root@n0:~# echo "cephadm ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ceph
root@n0:~# chmod 444 /etc/sudoers.d/ceph

and so on to host admin, and nodes n1, n2 [and n3, …]

Setup the ssh authentication with cryptographic keys

On the admin node :

Create the ssh key for the user cephadm

root@admin:~# su – cephadm
cephadm@admin:~$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/cephadm/.ssh/id_dsa):
Created directory ‘/home/cephadm/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cephadm/.ssh/id_dsa.
Your public key has been saved in /home/cephadm/.ssh/id_dsa.pub.
The key fingerprint is:
ec:16:ad:b4:76:e4:32:c6:7c:14:45:bc:c3:78:5a:cf cephadm@admin
The key’s randomart image is:
+—[DSA 1024]—-+
|           oo    |
|           ..    |
|          .o .   |
|       . …*    |
|        S ++ +   |
|       = B.   E |
|        % +      |
|       + =       |
|                 |
+—————–+
cephadm@admin:~$

Then push it on the nodes of the cluster:

cephadm@admin:~$ ssh-copy-id cephadm@n0
cephadm@admin:~$ ssh-copy-id cephadm@n1
cephadm@admin:~$ ssh-copy-id cephadm@n2
cephadm@admin:~$ ssh-copy-id cephadm@n3

Install and configure dsh (distributed shell)

cephadm@admin:~$ sudo apt-get install dsh

…..

cephadm@admin:~$ cd
cephadm@admin:~$ mkdir .dsh
cephadm@admin:~$ cd .dsh
cephadm@admin:~/.dsh$ for i in {0..3} ; do echo “n$i” >> machines.list ; done

Test…

cephadm@admin:~$ dsh -aM uptime
n0: 17:52:09 up 0 min, 0 users, load average: 0.00, 0.00, 0.00
n1: 17:52:08 up 0 min, 0 users, load average: 0.00, 0.00, 0.00
n2: 17:52:09 up 0 min, 0 users, load average: 0.00, 0.00, 0.00
n3: 17:52:09 up 0 min, 0 users, load average: 0.00, 0.00, 0.00

cephadm@admin:~$ dsh -aM cat /proc/cpuinfo | grep model\ name
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n0: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n1: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n2: model name   : Intel(R) Atom(TM) x5-Z8300 CPU @ 1.44GHz
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)
n3: model name   : Intel Core Processor (Broadwell)

Good..

Now you’re ready to install your cluster with automated commands !

Install Ceph software

First make sure to be up to date on each nodes, at the very beginning of this procedure.

Feel free to use dsh from the admin node for each task you would like to apply to the nodes 😉

cephadm@admin:~$ dsh -aM sudo apt-get update

cephadm@admin:~$ dsh -aM sudo apt-get -y upgrade

On the admin node only, configure the apt repositories and install the ceph deployment program (python script) :

wget -q -O- ‘https://download.ceph.com/keys/release.asc’ | sudo apt-key add –
echo deb http://download.ceph.com/debian-jewel/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
sudo apt-get -qqy update && sudo apt-get install -qqy ntp ceph-deploy

Now you have to create de new directory in the home dir of the user cephadm. The configuration of the cluster will be stored here.

cephadm@admin:~$ mkdir cluster
cephadm@admin:~$ cd cluster

Create your three MONs (according to our topology choices, I have chosen to install 3 monitors, two on the physical hosts, one on a virtualized node)

cephadm@admin:~/cluster$ ceph-deploy new n{0,1,3}
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.33): /usr/bin/ceph-deploy new n0 n1 n3
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username                      : None
[ceph_deploy.cli][INFO ] verbose                       : False
[ceph_deploy.cli][INFO ] overwrite_conf                : False
[ceph_deploy.cli][INFO ] quiet                         : False
[ceph_deploy.cli][INFO ] cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f7971b61ab8>
[ceph_deploy.cli][INFO ] cluster                       : ceph
[ceph_deploy.cli][INFO ] ssh_copykey                   : True
[ceph_deploy.cli][INFO ] mon                           : [‘n0’, ‘n1’, ‘n3’]
[ceph_deploy.cli][INFO ] func                          : <function new at 0x7f7971b40500>
[ceph_deploy.cli][INFO ] public_network                : None
[ceph_deploy.cli][INFO ] ceph_conf                     : None
[ceph_deploy.cli][INFO ] cluster_network               : None
[ceph_deploy.cli][INFO ] default_release               : False
[ceph_deploy.cli][INFO ] fsid                          : None
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
[ceph_deploy.new][INFO ] making sure passwordless SSH succeeds
[n0][DEBUG ] connected to host: admin
[n0][INFO ] Running command: ssh -CT -o BatchMode=yes n0
[n0][DEBUG ] connection detected need for sudo
[n0][DEBUG ] connected to host: n0
[n0][DEBUG ] detect platform information from remote host
[n0][DEBUG ] detect machine type
[n0][DEBUG ] find the location of an executable
[n0][INFO ] Running command: sudo /bin/ip link show
[n0][INFO ] Running command: sudo /bin/ip addr show
[n0][DEBUG ] IP addresses found: [‘192.168.10.210’]
[ceph_deploy.new][DEBUG ] Resolving host n0
[ceph_deploy.new][DEBUG ] Monitor n0 at 192.168.10.210
[ceph_deploy.new][INFO ] making sure passwordless SSH succeeds
[n1][DEBUG ] connected to host: admin
[n1][INFO ] Running command: ssh -CT -o BatchMode=yes n1
[n1][DEBUG ] connection detected need for sudo
[n1][DEBUG ] connected to host: n1
[n1][DEBUG ] detect platform information from remote host
[n1][DEBUG ] detect machine type
[n1][DEBUG ] find the location of an executable
[n1][INFO ] Running command: sudo /bin/ip link show
[n1][INFO ] Running command: sudo /bin/ip addr show
[n1][DEBUG ] IP addresses found: [‘192.168.10.211’]
[ceph_deploy.new][DEBUG ] Resolving host n1
[ceph_deploy.new][DEBUG ] Monitor n1 at 192.168.10.211
[ceph_deploy.new][INFO ] making sure passwordless SSH succeeds
[n3][DEBUG ] connected to host: admin
[n3][INFO ] Running command: ssh -CT -o BatchMode=yes n3
[n3][DEBUG ] connection detected need for sudo
[n3][DEBUG ] connected to host: n3
[n3][DEBUG ] detect platform information from remote host
[n3][DEBUG ] detect machine type
[n3][DEBUG ] find the location of an executable
[n3][INFO ] Running command: sudo /bin/ip link show
[n3][INFO ] Running command: sudo /bin/ip addr show
[n3][DEBUG ] IP addresses found: [‘192.168.10.213’]
[ceph_deploy.new][DEBUG ] Resolving host n3
[ceph_deploy.new][DEBUG ] Monitor n3 at 192.168.10.213
[ceph_deploy.new][DEBUG ] Monitor initial members are [‘n0’, ‘n1’, ‘n3’]
[ceph_deploy.new][DEBUG ] Monitor addrs are [‘192.168.10.210’, ‘192.168.10.211’, ‘192.168.10.213’]
[ceph_deploy.new][DEBUG ] Creating a random mon key…
[ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring…
[ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf…
cephadm@admin:~/cluster$

Now edit ceph.conf (in the “cluster” directory) and insert this line at the end

osd pool default size = 3

The file ceph.conf should contain the following lines :

[global]
fsid = 74a80a50-b7f9-4588-baa4-bb242c3d4cf0
mon_initial_members = n0, n1, n3
mon_host = 192.168.10.210,192.168.10.211,192.168.10.213
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd pool default size = 3

Now, install ceph :

cephadm@admin:~/cluster$ for i in {0..3}; do ceph-deploy install --release jewel n$i; done

This command generates a lot of logs (downloads, debug messages…) but should return without error. Otherwise check the error and google it. You can just restart the ceph-deploy program, depending on the error it will work the second time 😉 (I’ve experienced some problems accessing the ceph repository, for ex….)

Now create the mons:

cephadm@admin:~/cluster$ ceph-deploy mon create-initial

Idem, a lot of logs… but no error..

Now create the OSDs (storage units). You have to know which device will be used for the data on each node. In my case, its like this :

n0 : /dev/sda : 500Gb WD “blue” 2,5” hard drive

n1 : /dev/sda : 1000 Gb HGST WD 2,5 ” hard drive

n2 : /dev/sda : 1000 Gb HGST WD 2,5 ” hard drive

n3 : /dev/sdb : 500 Gb WD “blue” 2,5” hard drive (used through an high performance ZFS On Linux :), presented to the VM as /dev/zvol…)

So : you have to issue the following commands depending on your devices. For me, it will be :

cephadm@admin:~/cluster$ for i in {0..2}; do ceph-deploy osd create n$i:sda; done

cephadm@admin:~/cluster$ ceph-deploy osd create n3:sdb

Note : if you previously installed ceph on a device, you MUST “zap” (delete) it before. Use the command “ceph-deploy disk zap n3:sdb” for example.

Now, deploy the ceph configuration to all storage nodes

cephadm@admin:~/cluster$ for i in {0..3}; do ceph-deploy admin n$i; done

And check the permissions. For some reasons the permissions are not correct:

cephadm@admin:~/cluster$ dsh -aM ls -l /etc/ceph/*key*
n0: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw——- 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring

So issue the following command :

cephadm@admin:~/cluster$ dsh -aM sudo chmod +r /etc/ceph/ceph.client.admin.keyring

and check :

cephadm@admin:~/cluster$ dsh -aM ls -l /etc/ceph/*key*
n0: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n1: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n2: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring
n3: -rw-r–r– 1 root root 63 Oct 26 19:01 /etc/ceph/ceph.client.admin.keyring

I like dsh and ssh.. 😉

Finally, install the metadata servers

cephadm@admin:~/cluster$ ceph-deploy mds create n0 n1 n3

and the rados gateway

cephadm@admin:~/cluster$ ceph-deploy rgw create n3

(on n3 for me)

Check the ntp status of the nodes (very important if you have several MONs)

cephadm@admin:~/cluster$ dsh -aM timedatectl|grep synchron
n0: NTP synchronized: yes
n1: NTP synchronized: yes
n2: NTP synchronized: yes
n3: NTP synchronized: yes

check the cluster, on one node type :

cephadm@n2:~$ ceph status
cluster 74a80a50-b7f9-4588-baa4-bb242c3d4cf0
     health HEALTH_OK
     monmap e2: 3 mons at {n0=192.168.10.210:6789/0,n1=192.168.10.211:6789/0,n3=192.168.10.213:6789/0}
            election epoch 32, quorum 0,1,2 n0,n1,n3
     osdmap e82: 4 osds: 4 up, 4 in
            flags sortbitwise
      pgmap v220: 112 pgs, 7 pools, 848 bytes data, 170 objects
            148 MB used, 2742 GB / 2742 GB avail
                 112 active+clean

Done.

Installing Calamari

dsh -aM ‘echo “deb http://repo.saltstack.com/apt/debian/8/amd64/latest jessie main” | sudo tee /etc/apt/sources.list.d/saltstack.list’