02.05.2014 piiv

Deploy Ceph and start using it: end to end tutorial – Troubleshooting (part 2/3)

(Part 1/3 – Installation – Part 3/3 – librados client)

It is quite common that after the initial installation, the Ceph cluster reports health warnings. Before using the cluster for storage (e.g., allow clients to access it), a HEALTH_OK state should be reached:

cluster-admin@ceph-mon0:~/ceph-cluster$ ceph health
HEALTH_OK

This part of the tutorial provides some troubleshooting hints that I collected during the setup of my deployments. Other helpful resources are the Ceph IRC channel and mailing lists.

Useful diagnostic commands

A collection of diagnostic commands to check the status of the cluster is listed here. Running these commands is how we can understand that the Ceph cluster is not properly configured.

Ceph status

$ ceph status

In this example, the disk for one OSD had been physically removed, so 2 out of 3 OSDs were in and up.

cluster-admin@ceph-mon0:~/ceph-cluster$ ceph status
    cluster 28f9315e-6c5b-4cdc-9b2e-362e9ecf3509
     health HEALTH_OK
     monmap e1: 1 mons at {ceph-mon0=192.168.0.1:6789/0}, election epoch 1, quorum 0 ceph-mon0
     osdmap e122: 3 osds: 2 up, 2 in
      pgmap v4699: 192 pgs, 3 pools, 0 bytes data, 0 objects
            87692 kB used, 1862 GB / 1862 GB avail
                 192 active+clean

Ceph health
```
$ ceph health
$ ceph health detail
```
Pools and OSDs configuration and status
```
$ ceph osd dump
$ ceph osd dump --format=json-pretty
```
the second version provides much more information, listing all the pools and OSDs and their configuration parameters
Tree of OSDs reflecting the CRUSH map
```
$ ceph osd tree
```
This is very useful to understand how the cluster is physically organized (e.g., which OSDs are running on which host).
Listing the pools in the cluster
```
$ ceph osd lspools
```
This is particularly useful to check clients operations (e.g., if new pools were created).

Check the CRUSH rules

$ ceph osd crush dump --format=json-pretty

List the disks of one node from the admin node
```
$ ceph-deploy disk list osd0
```
Check the logs.
Log files in /var/log/ceph/ will provide a lot of information for troubleshooting. Each node of the cluster will contain logs about the Ceph components that it runs, so you may need to SSH on different hosts to have a complete diagnosis.

Check your firewall and network configuration

Every node of the Ceph cluster must be able to successfully run

$ ceph status

If this operation times out without giving any results, it is likely that the firewall (or network configuration) is not allowing the nodes to communicate.

Another symptom of this problem is that OSDs cannot be activated, i.e., the ceph-deploy osd activate <args> command will timeout.

Ceph monitor default port is 6789. Ceph OSDs and MDS try to get the first available ports starting at 6800.

A typical Ceph cluster might need the following ports:

Mon:  6789
Mds:  6800
Osd1: 6801
Osd2: 6802
Osd3: 6803

Depending on your security requirements, you may want to simply allow any traffic to and from the Ceph cluster nodes.

References: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/2231

Try restarting first

Without going for fine troubleshootings and log analysis, sometimes (especially after the first installation), I’ve noticed that a simple restart of the Ceph components has helped the transition from a HEALTH_WARN to a HEALTH_OK state.

If some of the OSDs are not in or not up, like in the case below

    cluster 07d28faa-48ae-4356-a8e3-19d5b81e159e
     health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs stuck unclean; 1/2 in osds are down; clock skew detected on mon.1, mon.2
     monmap e3: 3 mons at {0=192.168.252.10:6789/0,1=192.168.252.11:6789/0,2=192.168.252.12:6789/0}, election epoch 36, quorum 0,1,2 0,1,2
     osdmap e27: 6 osds: 1 up, 2 in
      pgmap v57: 192 pgs, 3 pools, 0 bytes data, 0 objects
            84456 kB used, 7865 MB / 7948 MB avail
                 192 incomplete

try to start the OSD daemons with

# on osd0
$ sudo /etc/init.d/ceph -a start osd0

If the OSDs are in, but PGs are in weird states, like in the example below

cluster 07d28faa-48ae-4356-a8e3-19d5b81e159e
     health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; clock skew detected on mon.1, mon.2
     monmap e3: 3 mons at {0=192.168.252.10:6789/0,1=192.168.252.11:6789/0,2=192.168.252.12:6789/0}, election epoch 36, quorum 0,1,2 0,1,2
     osdmap e34: 6 osds: 6 up, 6 in
      pgmap v71: 192 pgs, 3 pools, 0 bytes data, 0 objects
            235 MB used, 23608 MB / 23844 MB avail
                 128 active+degraded
                  64 active+replay+degraded

try to restart the monitor(s) with

# on mon0
$ sudo /etc/init.d/ceph -a restart mon0

Unfortunately, a simple restart will be the solution in just a few rare cases. More troubleshooting will be required in the majority of the situations.

Unable to find keyring

During the deployment of the monitor nodes (the ceph-deploy <mon> [<mon>] create-initial step), Ceph may complain about missing keyrings:

[ceph_deploy.gatherkeys][WARNIN] Unable to find
/etc/ceph/ceph.client.admin.keyring on ['ceph-server']

If this warning is reported (even if the message is not an error), the Ceph cluster will probably not reach an healthy state.

The solution to this problem is to use exactly the same names for the hostnames (i.e., the output of hostname -s) and the Ceph node names.

This means that the files

/etc/hosts
/etc/hostname
.ssh/config (only for the admin node)

and the result of the command hostname -s, all should have the same names for a certain node.

Check that replication requirements can be met

I’ve found that most of my problems with Ceph health were related to wrong (i.e., unfeasible) replication policies.

This is particularly likely to happen in test deployment where one doesn’t care about setting up many OSDs or separating them across different hosts.

Some common pitfalls here may be:

The number of required replicas is higher than the number of OSDs (!!)
CRUSH is instructed to separate replicas across hosts but multiple OSDs are on the same host and there are not enough OSD hosts to satisfy this condition

The visible effect when running diagnostic commands is that PGs will be in wrong statuses.

CASE 1: the replication level is such that it cannot be accomplished with the current cluster (e.g., a replica size of 3 with 2 OSDs).

Check the replicated size of pools with

$ ceph osd dump

Adjust the replicated size and min_size, if required, by running

$ ceph osd pool set <pool_name> size <value>
$ ceph osd pool set <pool_name> min_size <value>

CASE 2: the replication policy would require replicas to sit on separate hosts, but OSDs are running within the same hosts

Check what crush_ruleset applies to a certain pool with

$ ceph osd dump --format=json-pretty

In the example below, the pool with id 0 (“data”) is using the crush_ruleset with id 0

"pools": [
        { "pool": 0,
          "pool_name": "data",
          [...]
          "crush_ruleset": 0,  <----
          "object_hash": 2,
          [...]

then check with

$ ceph osd crush dump --format=json-pretty

what crush_ruleset 0 is about.

In the example below, we can observe that this rules says to replicate data by choosing the first available leaf in the CRUSH map, which is of type host.

"rules": [
        { "rule_id": 0,
          "rule_name": "replicated_ruleset",
          "ruleset": 0,
          "type": 1,
          "min_size": 1,
          "max_size": 10,
          "steps": [
                { "op": "take",
                  "item": -1,
                  "item_name": "default"},
                { "op": "chooseleaf_firstn",     <-----------
                  "num": 0,
                  "type": "host"},               <-----------
                { "op": "emit"}]}],

If not enough hosts are available, then the application of this rule will fail.

To allow replicas to be created on different OSDs but possibly on the same host, we need to create a new ruleset:

$ ceph osd crush rule create-simple replicate_within_hosts default osd

After the rule has been created, it should be listed in the output of

$ ceph osd crush dump

from where we can not its id.

The next step is to apply this rule to the pools as required:

$ ceph osd pool set data crush_ruleset <rulesetId>
$ ceph osd pool set metadata crush_ruleset <rulesetId>
$ ceph osd pool set rbd crush_ruleset <rulesetId>

Schlagwörter: Ceph, cloud storage, troubleshooting

Service Engineering (ICCLab & SPLab)

2 Kommentare

Leave a Reply Cancel reply