Ceph

Node reboot

  1. Disable rebalancing temporarily

    $ ceph osd set noout
    noout is set
    $ ceph osd set norebalance
    norebalance is set
    $ ceph -s
      cluster:
        id:     xxx
        health: HEALTH_WARN
                noout,norebalance flag(s) set
    [...]
    
  2. Reboot the node

    $ sudo reboot
    
  3. When the reboot is complete enable cluster rebalancing again

    $ ceph osd unset noout
    noout is unset
    $ ceph osd unset norebalance
    norebalance is unset
    $ ceph -s
      cluster:
        id:     xxx
        health: HEALTH_OK
    [...]
    

Cluster start and stop

Stop

Ensure that any services/clients using Ceph are stopped and that the cluster is in a healthy state.

  1. Set OSD flags

    $ ceph osd set noout
    $ ceph osd set nobackfill
    $ ceph osd set norecover
    $ ceph osd set norebalance
    $ ceph osd set nodown
    $ ceph osd set pause
    
    $ ceph -s
      cluster:
      [...]
        health: HEALTH_WARN
                pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover flag(s) set
    
      services:
      [...]
        osd: x osds: y up, z in
             flags pauserd,pausewr,nodown,noout,nobackfill,norebalance,norecover
    
  2. Stop the management services (manager, mds, ..) (node by node)

    $ sudo systemctl stop ceph-mgr\*.service
    
  3. Stop the osd services (node by node)

    $ sudo systemctl stop ceph-osd\*.service
    
  4. Stop the monitor service (node by node)

    $ sudo systemctl stop ceph-mon\*.service
    

Start

  1. Start the monitor services (node by node)

    $ sudo systemctl start ceph-mon\*.service
    
  2. Start the osd services (node by node)

    $ systemctl start ceph-osd@DEVICE.service
    
  3. Start the management services (manager, mds, ..) (node by node)

    $ sudo systemctl start ceph-mgr\*.service
    
  4. Unset OSD flags

    $ ceph osd unset pause
    $ ceph osd unset nodown
    $ ceph osd unset norebalance
    $ ceph osd unset norecover
    $ ceph osd unset nobackfill
    $ ceph osd unset noout
    

Check

$ sudo systemctl status ceph\*.service
$ ceph -s
  cluster:
    id:     x
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum A,B,C
    mgr: A(active), standbys: B, C
    mds: cephfs-0/0/1 up
    osd: x osds: y up, z in

  data:
    pools:   7 pools, 176 pgs
    objects: 2816 objects, 18856 MB
    usage:   69132 MB used, 44643 GB / 44711 GB avail
    pgs:     176 active+clean

Deep scrub distribution

  • Distribution per weekday:

    $ for date in $(ceph pg dump | grep active | awk '{ print $20 })'; do date +%A -d $date; done | sort | uniq -c
    
  • Distribution per hours:

    $ for date in $(ceph pg dump | grep active | awk '{ print $21 }'); do date +%H -d $date; done | sort | uniq -c
    

Set the number of placement groups

$ ceph osd pool set {pool-name} pg_num {pg_num}
set pool x pg_num to {pg_num}
$ ceph osd pool set {pool-name} pgp_num {pgp_num}
set pool x pgp_num to {pgp_num}

The new number of PGs should also be updated in environments/ceph/configuration.yml.

1 pools have many more objects per pg than average

environments/ceph/configuration.yml
##########################
# custom

ceph_conf_overrides:
  global:
    mon pg warn max object skew: 0

Logging

  • Ceph daemons are configured to log to the console instead of log files. OSDs are configured to log to MONs.

    $ docker logs ceph-mon-ceph01
    
  • Logs can become very big. docker logs provides some useful parameters to only show newest logs and to see new log messages when they appear.

    $ docker logs --tail 100 --follow ceph-mon-ceph01
    

Add new OSD

  • Add the new device to the devices list in the inventory of the corresponding host

  • Execute osism-ceph osds -l HOST on the manager node

Replace defect OSD

  • Locate defect OSD

    $ ceph osd metadata osd.22
      "bluefs_slow_dev_node": "sdk",
      "hostname": "ceph04",
    
    $ ssh ceph04
    $ dmesg -T | grep sdk | grep -i error
      ...
      blk_update_request: I/O error, dev sdk, sector 7501476358
      Buffer I/O error on dev sdk1, logical block 7470017030, async page read
      blk_update_request: I/O error, dev sdk, sector 7501476359
      Buffer I/O error on dev sdk1, logical block 7470017031, async page read
    
  • Find and replace actual hardware

    $ sudo udevadm info --query=all --name=/dev/sdk
    $ sudo hdparm -I /dev/sdk
    
  • disable defect OSD/disk

    $ ceph osd out 22
    $ sudo systemctl stop ceph-osd@sdk.service
    $ ceph osd purge osd.22
    
  • Prepare new OSD

    $ docker start -ai ceph-osd-prepare-ceph04-sdk
    $ sudo systemctl start ceph-osd@sdk.service
    
  • Add OSD to tree

    $ ceph osd df tree
       CLASS WEIGHT REWEIGHT SIZE   USE    AVAIL  %USE  VAR TYPE NAME
                7.4       -  3709G  2422G  1287G 65.30 1.06  hdd ceph04-hdd
        hdd     3.7       0      0      0      0     0    0        osd.22
        hdd     3.7 1.00000  3709G  2422G  1287G 65.30 1.08        osd.6
        ...
        hdd     0.0       0      0      0      0     0    0 osd.27
    
    $ ceph osd crush create-or-move osd.22 3.7 hdd=ceph04-hdd
    $ ceph osd df tree
       CLASS WEIGHT REWEIGHT SIZE   USE    AVAIL  %USE  VAR TYPE NAME
                7.4       -  3709G  2422G  1287G 65.30 1.06  hdd ceph04-hdd
        hdd     3.7 1.00000  3709G      0  3709G     0    0        osd.22
        hdd     3.7 1.00000  3709G  2422G  1287G 65.30 1.08        osd.6
    

Add new pool

$ ceph osd pool create sample 32 32
pool 'sample' created
$ ceph osd pool application enable sample rbd
enabled application 'rbd' on pool 'sample'
$ ceph auth get client.cinder
[client.cinder]
   key = ...
   caps mon = "allow r"
   caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images"
exported keyring for client.cinder
$ ceph auth caps client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=sample'
updated caps for client.cinder
$ ceph auth get client.nova
[client.nova]
   key = ...
   caps mon = "allow r"
   caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups"
exported keyring for client.nova
$ ceph auth caps client.nova mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images, allow rwx pool=vms, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=sample'
updated caps for client.nova

Export image

$ rbd export --pool=volumes volume-035f3636-ad68-4562-88f5-11d7e295d03e /home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img
$ docker cp cephclient_cephclient_1:/home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img /tmp
$ docker exec -it cephclient_cephclient_1 rm -f /home/dragon/035f3636-ad68-4562-88f5-11d7e295d03e.img
$ rm -f /tmp/035f3636-ad68-4562-88f5-11d7e295d03e.img

Repair PGs

  • Health of Ceph cluster

$ sudo ceph status
  cluster:
    id:     0155072f-6a71-4f5c-8967-f86e5307033f
    health: HEALTH_ERR
            4 scrub errors
            Possible data damage: 1 pg inconsistent

$ sudo ceph health detail
HEALTH_ERR 4 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 4 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 54.76 is active+clean+inconsistent, acting [39,6,15]
  • Repair the PG

$ sudo ceph pg repair 54.76
instructing pg 54.76 on osd.39 to repair
  • give the Ceph cluster some time for repair and check health

$ sudo ceph health detail
HEALTH_OK

$ sudo ceph status
  cluster:
    id:     0155072f-6a71-4f5c-8967-f86e5307033f
    health: HEALTH_OK

Rebalance the cluster

  1. Test what OSDs would be affected by teh reweight

$ sudo ceph osd test-reweight-by-utilization
no change
moved 6 / 4352 (0.137868%)
avg 51.8095
stddev 12.3727 -> 12.3621 (expected baseline 7.15491)
min osd.10 with 30 -> 30 pgs (0.579044 -> 0.579044 * mean)
max osd.68 with 92 -> 92 pgs (1.77574 -> 1.77574 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.4187
overload_utilization 0.5025
osd.14 weight 0.9500 -> 0.9000
osd.27 weight 0.9500 -> 0.9000
osd.37 weight 0.9500 -> 0.9000
osd.29 weight 1.0000 -> 0.9500
  1. If the OSDs match your “fullest” OSDs execute the reweight

$ sudo ceph osd reweight-by-utilization
no change
moved 6 / 4352 (0.137868%)
avg 51.8095
stddev 12.3727 -> 12.3621 (expected baseline 7.15491)
min osd.10 with 30 -> 30 pgs (0.579044 -> 0.579044 * mean)
max osd.68 with 92 -> 92 pgs (1.77574 -> 1.77574 * mean)

oload 120
max_change 0.05
max_change_osds 4
average_utilization 0.4187
overload_utilization 0.5025
osd.14 weight 0.9500 -> 0.9000
osd.27 weight 0.9500 -> 0.9000
osd.37 weight 0.9500 -> 0.9000
osd.29 weight 1.0000 -> 0.9500
  1. Wait for the cluster to rebalance itself and check disk usage again. Repeat above if necessary