(Part 1/3 – Installation – Part 3/3 – librados client)
It is quite common that after the initial installation, the Ceph cluster reports health warnings. Before using the cluster for storage (e.g., allow clients to access it), a HEALTH_OK
state should be reached:
cluster-admin@ceph-mon0:~/ceph-cluster$ ceph health HEALTH_OK
This part of the tutorial provides some troubleshooting hints that I collected during the setup of my deployments. Other helpful resources are the Ceph IRC channel and mailing lists.
Useful diagnostic commands
A collection of diagnostic commands to check the status of the cluster is listed here. Running these commands is how we can understand that the Ceph cluster is not properly configured.
- Ceph status
$ ceph status
In this example, the disk for one OSD had been physically removed, so 2 out of 3 OSDs were in and up.
cluster-admin@ceph-mon0:~/ceph-cluster$ ceph status cluster 28f9315e-6c5b-4cdc-9b2e-362e9ecf3509 health HEALTH_OK monmap e1: 1 mons at {ceph-mon0=192.168.0.1:6789/0}, election epoch 1, quorum 0 ceph-mon0 osdmap e122: 3 osds: 2 up, 2 in pgmap v4699: 192 pgs, 3 pools, 0 bytes data, 0 objects 87692 kB used, 1862 GB / 1862 GB avail 192 active+clean
- Ceph health
$ ceph health $ ceph health detail
- Pools and OSDs configuration and status
$ ceph osd dump $ ceph osd dump --format=json-pretty
the second version provides much more information, listing all the pools and OSDs and their configuration parameters
- Tree of OSDs reflecting the CRUSH map
$ ceph osd tree
This is very useful to understand how the cluster is physically organized (e.g., which OSDs are running on which host).
- Listing the pools in the cluster
$ ceph osd lspools
This is particularly useful to check clients operations (e.g., if new pools were created).
- Check the CRUSH rules
$ ceph osd crush dump --format=json-pretty
- List the disks of one node from the admin node
$ ceph-deploy disk list osd0
- Check the logs.
Log files in/var/log/ceph/
will provide a lot of information for troubleshooting. Each node of the cluster will contain logs about the Ceph components that it runs, so you may need to SSH on different hosts to have a complete diagnosis.
Check your firewall and network configuration
Every node of the Ceph cluster must be able to successfully run
$ ceph status
If this operation times out without giving any results, it is likely that the firewall (or network configuration) is not allowing the nodes to communicate.
Another symptom of this problem is that OSDs cannot be activated, i.e., the ceph-deploy osd activate <args>
command will timeout.
Ceph monitor default port is 6789. Ceph OSDs and MDS try to get the first available ports starting at 6800.
A typical Ceph cluster might need the following ports:
Mon: 6789 Mds: 6800 Osd1: 6801 Osd2: 6802 Osd3: 6803
Depending on your security requirements, you may want to simply allow any traffic to and from the Ceph cluster nodes.
References: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/2231
Try restarting first
Without going for fine troubleshootings and log analysis, sometimes (especially after the first installation), I’ve noticed that a simple restart of the Ceph components has helped the transition from a HEALTH_WARN
to a HEALTH_OK
state.
If some of the OSDs are not in or not up, like in the case below
cluster 07d28faa-48ae-4356-a8e3-19d5b81e159e health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean; 1/2 in osds are down; clock skew detected on mon.1, mon.2 monmap e3: 3 mons at {0=192.168.252.10:6789/0,1=192.168.252.11:6789/0,2=192.168.252.12:6789/0}, election epoch 36, quorum 0,1,2 0,1,2 osdmap e27: 6 osds: 1 up, 2 in pgmap v57: 192 pgs, 3 pools, 0 bytes data, 0 objects 84456 kB used, 7865 MB / 7948 MB avail 192 incomplete
try to start the OSD daemons with
# on osd0
$ sudo /etc/init.d/ceph -a start osd0
If the OSDs are in, but PGs are in weird states, like in the example below
cluster 07d28faa-48ae-4356-a8e3-19d5b81e159e health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; clock skew detected on mon.1, mon.2 monmap e3: 3 mons at {0=192.168.252.10:6789/0,1=192.168.252.11:6789/0,2=192.168.252.12:6789/0}, election epoch 36, quorum 0,1,2 0,1,2 osdmap e34: 6 osds: 6 up, 6 in pgmap v71: 192 pgs, 3 pools, 0 bytes data, 0 objects 235 MB used, 23608 MB / 23844 MB avail 128 active+degraded 64 active+replay+degraded
try to restart the monitor(s) with
# on mon0
$ sudo /etc/init.d/ceph -a restart mon0
Unfortunately, a simple restart will be the solution in just a few rare cases. More troubleshooting will be required in the majority of the situations.
Unable to find keyring
During the deployment of the monitor nodes (the ceph-deploy <mon> [<mon>] create-initial
step), Ceph may complain about missing keyrings:
[ceph_deploy.gatherkeys][WARNIN] Unable to find /etc/ceph/ceph.client.admin.keyring on ['ceph-server']
If this warning is reported (even if the message is not an error), the Ceph cluster will probably not reach an healthy state.
The solution to this problem is to use exactly the same names for the hostnames (i.e., the output of hostname -s
) and the Ceph node names.
This means that the files
/etc/hosts
/etc/hostname
.ssh/config
(only for the admin node)
and the result of the command hostname -s
, all should have the same names for a certain node.
See also:
- https://www.mail-archive.com/ceph-users@lists.ceph.com/msg03506.html (problem)
- https://www.mail-archive.com/ceph-users@lists.ceph.com/msg03580.html (solution)
Check that replication requirements can be met
I’ve found that most of my problems with Ceph health were related to wrong (i.e., unfeasible) replication policies.
This is particularly likely to happen in test deployment where one doesn’t care about setting up many OSDs or separating them across different hosts.
Some common pitfalls here may be:
- The number of required replicas is higher than the number of OSDs (!!)
- CRUSH is instructed to separate replicas across hosts but multiple OSDs are on the same host and there are not enough OSD hosts to satisfy this condition
The visible effect when running diagnostic commands is that PGs will be in wrong statuses.
CASE 1: the replication level is such that it cannot be accomplished with the current cluster (e.g., a replica size of 3 with 2 OSDs).
Check the replicated size
of pools with
$ ceph osd dump
Adjust the replicated size
and min_size
, if required, by running
$ ceph osd pool set <pool_name> size <value> $ ceph osd pool set <pool_name> min_size <value>
CASE 2: the replication policy would require replicas to sit on separate hosts, but OSDs are running within the same hosts
Check what crush_ruleset
applies to a certain pool with
$ ceph osd dump --format=json-pretty
In the example below, the pool with id 0
(“data”) is using the crush_ruleset
with id 0
"pools": [ { "pool": 0, "pool_name": "data", [...] "crush_ruleset": 0, <---- "object_hash": 2, [...]
then check with
$ ceph osd crush dump --format=json-pretty
what crush_ruleset 0
is about.
In the example below, we can observe that this rules says to replicate data by choosing the first available leaf in the CRUSH map, which is of type host.
"rules": [ { "rule_id": 0, "rule_name": "replicated_ruleset", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -1, "item_name": "default"}, { "op": "chooseleaf_firstn", <----------- "num": 0, "type": "host"}, <----------- { "op": "emit"}]}],
If not enough hosts are available, then the application of this rule will fail.
To allow replicas to be created on different OSDs but possibly on the same host, we need to create a new ruleset:
$ ceph osd crush rule create-simple replicate_within_hosts default osd
After the rule has been created, it should be listed in the output of
$ ceph osd crush dump
from where we can not its id.
The next step is to apply this rule to the pools as required:
$ ceph osd pool set data crush_ruleset <rulesetId> $ ceph osd pool set metadata crush_ruleset <rulesetId> $ ceph osd pool set rbd crush_ruleset <rulesetId>
Thanks for this post! Wasn’t able to get the pgs active and it was the host type that had to be changed to osd.