Ceph: OSD "down" and "out" of the cluster - An obvious case

When setting up a cluster with ceph-deploy, just after the ceph-deploy osd activate phase and the distribution of keys, the OSDs should be both “up” and “in” the cluster.

One thing that is not mentioned in the quick-install documentation with ceph-deploy or the OSDs monitoring or troubleshooting page (or at least I didn’t find it), is that, upon (re-)boot, mounting the storage volumes to the mount points that ceph-deploy prepares is up to the administrator (check this discussion on the Ceph mailing list).

So, after a reboot of my storage nodes, the Ceph cluster couldn’t reach a healthy state showing the following OSD tree:

$ ceph osd tree
# id weight type name up/down reweight
-1 3.64 root default
    -2 1.82 host ceph-osd0
        0 0.91 osd.0 down 0
        1 0.91 osd.1 down 0
    -3 1.82 host ceph-osd1
        2 0.91 osd.2 down 0
        3 0.91 osd.3 up 1

I wasn’t thinking about mounting the drives, as this process was hidden to me during the initial installation, but a simple mount command would have immediately unveiled the mistery :D.

So, the simple solution was to mount the devices:

sudo mount /dev/sd<XY> /var/lib/ceph/osd/ceph-<K>/

and then to start the OSD daemons:

sudo start ceph-osd id=<K>

For some other troubleshooting hints for Ceph, you may look at this page.

3 Kommentare

kanchana says:

16. June 2016 at 15:37

Hi,
How can I down a osd and bring it back in RHEL 7.2 with ceph verison 10.2.2

sudo start ceph-osd id=1 fails with “sudo: start: command not found”.

I have 5 osds in each node and i want to down one particular osd (sudo stop ceph-sd id=1 also fails) and see whether replicas are written to other osds without any issues.
Thanks in advance.
–kanchana.

- Constin says:
  
  20. December 2016 at 16:26
  
  ceph-disk activate /var/lib/ceph/osd/ceph-{number}
  or
  ceph-disk activate-all
  
  - Nura21 says:
    
    9. September 2021 at 12:12
    
    Is this to deactivate an OSD or…activate an OSD?

Service Engineering (ICCLab & SPLab)

Ceph: OSD “down” and “out” of the cluster – An obvious case

3 Kommentare

Leave a Reply Cancel reply