Ceph: OSD “down” and “out” of the cluster – An obvious case

When setting up a cluster with ceph-deploy, just after the ceph-deploy osd activate phase and the distribution of keys, the OSDs should be both “up” and “in” the cluster.

One thing that is not mentioned in the quick-install documentation with ceph-deploy or the OSDs monitoring or troubleshooting page (or at least I didn’t find it), is that, upon (re-)boot, mounting the storage volumes to the mount points that ceph-deploy prepares is up to the administrator (check this discussion on the Ceph mailing list).

So, after a reboot of my storage nodes, the Ceph cluster couldn’t reach a healthy state showing the following OSD tree:

$ ceph osd tree
# id weight type name up/down reweight
-1 3.64 root default
    -2 1.82 host ceph-osd0
        0 0.91 osd.0 down 0
        1 0.91 osd.1 down 0
    -3 1.82 host ceph-osd1
        2 0.91 osd.2 down 0
        3 0.91 osd.3 up 1

I wasn’t thinking about mounting the drives, as this process was hidden to me during the initial installation, but a simple mount command would have immediately unveiled the mistery :D.

So, the simple solution was to mount the devices:

sudo mount /dev/sd<XY> /var/lib/ceph/osd/ceph-<K>/

and then to start the OSD daemons:

sudo start ceph-osd id=<K>

For some other troubleshooting hints for Ceph, you may look at this page.

2 thoughts on “Ceph: OSD “down” and “out” of the cluster – An obvious case

  1. Hi,
    How can I down a osd and bring it back in RHEL 7.2 with ceph verison 10.2.2

    sudo start ceph-osd id=1 fails with “sudo: start: command not found”.

    I have 5 osds in each node and i want to down one particular osd (sudo stop ceph-sd id=1 also fails) and see whether replicas are written to other osds without any issues.
    Thanks in advance.
    –kanchana.

Leave a Reply

Your email address will not be published. Required fields are marked *