Category: HowTos (page 3 of 4)

Managing hosts in a running OpenStack environment

How does one remove a faulty/un/re-provisioned physical machine from the list of managed physical nodes in OpenStack nova? Recently we had to remove a compute node in our cluster for management reasons (read, it went dead on us). But nova perpetually maintains the host entry hoping at some point in time, it will come back online and start reporting its willingness to host new jobs.

Normally, things will not break if you simply leave the dead node entry in place. But it will mess up the overall view of the cluster if you wish to do some capacity planning. The resources once reported by the dead node will continue to show up in the statistics and things will look all ”blue” when in fact they should be ”red”.

There is no straight forward command to fix this problem, so here is a quick and dirty fix.

  1. log on as administrator on the controller node
  2. locate the nova configuration file, typically found at /etc/nova/nova.conf
  3. location the ”connection” parameter – this will tell you the database nova service uses

Depending on whether the database is mysql or sqlite endpoint, modify your queries. The one shown next are for mysql endpoint.

# mysql -u root
mysql> use nova;
mysql> show tables;

The tables of interest to us are ”compute_nodes” and ”services”. Next find the ”host” entry of the dead node from ”services” table.

mysql> select * from services;
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+
| created_at          | updated_at          | deleted_at | id | host              | binary           | topic       | report_count | disabled | deleted | disabled_reason |
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+
| 2013-11-15 14:25:48 | 2014-04-29 06:20:10 | NULL       |  1 | stable-controller | nova-consoleauth | consoleauth |      1421475 |        0 |       0 | NULL            |
| 2013-11-15 14:25:49 | 2014-04-29 06:20:05 | NULL       |  2 | stable-controller | nova-scheduler   | scheduler   |      1421421 |        0 |       0 | NULL            |
| 2013-11-15 14:25:49 | 2014-04-29 06:20:06 | NULL       |  3 | stable-controller | nova-conductor   | conductor   |      1422189 |        0 |       0 | NULL            |
| 2013-11-15 14:25:52 | 2014-04-29 06:20:05 | NULL       |  4 | stable-compute-1  | nova-compute     | compute     |      1393171 |        0 |       0 | NULL            |
| 2013-11-15 14:25:54 | 2014-04-29 06:20:06 | NULL       |  5 | stable-compute-2  | nova-compute     | compute     |      1393167 |        0 |       0 | NULL            |
| 2013-11-15 14:25:56 | 2014-04-29 06:20:05 | NULL       |  6 | stable-compute-4  | nova-compute     | compute     |      1392495 |        0 |       0 | NULL            |
| 2013-11-15 14:26:34 | 2013-11-15 15:06:09 | NULL       |  7 | 002590628c0c      | nova-compute     | compute     |          219 |        0 |       0 | NULL            |
| 2013-11-15 14:27:14 | 2014-04-29 06:20:10 | NULL       |  8 | stable-controller | nova-cert        | cert        |      1421467 |        0 |       0 | NULL            |
| 2013-11-15 15:48:53 | 2014-04-29 06:20:05 | NULL       |  9 | stable-compute-3  | nova-compute     | compute     |      1392736 |        0 |       0 | NULL            |
+---------------------+---------------------+------------+----+-------------------+------------------+-------------+--------------+----------+---------+-----------------+

The output for one of our test cloud is shown above, clearly the node that we want to remove is ”002590628c0c”. Note down the corresponding id for the erring host entry. This ”id” value will be used for ”service_id” in the following queries. Modify the example case with your own specific data. It is important that you first remove the corresponding entry from the ”compute_nodes” table and then in the ”services” table, otherwise due to foreign_key dependencies, the deletion will fail.

mysql> delete from compute_nodes where service_id=7;
mysql> delete from services where host='002590628c0c';

Change the values above with corresponding values in your case. Voila! The erring compute entries are gone in the dashboard view and also from the resource consumed metrics.

Nagios / Ceilometer integration: new plugin available

The famous Nagios open source monitoring system has become a de facto standard in recent years. Unlike commercial monitoring solutions Nagios does not come as a one-size-fits-all monitoring system with thousands of monitoring agents and monitoring functions. Nagios is rather a small, lightweight monitoring system reduced to the bare essential of monitoring: an event management and notification engine. Nagios is very lightweight and flexible, but it must be extended in order to become a solution which is valuable for your organization. Plugins are a very important part in setting up a Nagios environment. Though Nagios is extremely customizable, there are no plugins that capture OpenStack specific metrics like number of floating IPs or network packets entering a virtual machine (even if there are some Nagios plugins to check that OpenStack services are up and running).

Ceilometer is the OpenStack component that captures these metrics. OpenStack measures typical performance indices like CPU utilization, Memory allocation, disk space used etc. for all VM instances within OpenStack. When an OpenStack environment has to be metered and monitored, Ceilometer is the right tool to do the job. Though Ceilometer is a quite powerful and flexible metering tool for OpenStack, it lacks capabilities to visualize the collected data.

It can easily be seen that Nagios and Ceilometer are complementary products which can be used in an integrated solution. There are no Nagios plugins to integrate the Ceilometer API (though Enovance has developed plugins to check that OpenStack components alive) with the Nagios monitoring environment and therefore allow Nagios to monitor not only the OpenStack components, but also all the hosted VMs and other services.

The ICCLab has developed a Nagios plugin which can be used to capture metrics through the Ceilometer API. The plugin is available download on Github. The Ceilometer call plugin can be used to capture a Ceilometer metric and define thresholds for employing the nagios alerting system.

In order to use the plugin simply copy it into your Nagios plugins folder (e. g. /usr/lib/nagios/plugins/) and define a Nagios command in your commands.cfg file (in /etc/nagios/objects/commands.cfg). Don’t forget to make your Nagios plugin executable to the Nagios API (chmod u+x).

A command to monitor the CPU utilization could look like this:

define command {
command_name    check_ceilometer-cpu-util
command_line    /usr/lib/nagios/plugins/ceilometer-call -s "cpu_util" -t 50.0 -T 80.0
}

Then you have to define a service that uses this command.

define service {
check_command check_ceilometer-cpu-util
host_name
normal_check_interval 1
service_description OpenStack instances CPU utilization
use generic-service
}

Now Nagios can employ Ceilometer API to monitor VMs inside OpenStack.

Getting Started with OpenShift and OpenStack

In Mobile Cloud Networking (MCN) we rely heavily on OpenStack, OpenShift and of course Automation. So that developers can get working fast with their own local infrastructure, we’ve spent time setting up an automated workflow, using Vagrant and puppet to setup both OpenStack and OpenShift. If you want to experiment with both OpenStack and OpenShift locally, simply clone this project:

$ git clone https://github.com/dizz/os-ops.git

Once it has been cloned you’ll need to initialise the submodules:

$ git submodule init
$ git submodule update

After that just you can begin the setup of OpenStack and OpenShift. You’ll need an installation of VirtualBox and Vagrant.

OpenStack

  • run in controller/worker mode:
      $ vagrant up os_ctl
      $ vagrant up os_cmp
    

There’s some gotchas, so look at the known issues in the README, specific to OpenStack. Otherwise, open your web browser at: http://10.10.10.51.

OpenShift

You’ve two OpenShift options:

  • run all-in-one:
      $ cd os-ops
      $ vagrant up ops_aio
    
  • run in controller/worker mode:
      $ cd os-ops
      $ vagrant up ops_ctl
      $ vagrant up ops_node
    

Once done open your web browser at: https://10.10.10.53/console/applications. There more info in the README.

In the next post we’ll look at getting OpenShift running on OpenStack, quickly and fast using two approaches, direct with puppet and using Heat orchestration.

Floating IPs management in Openstack

Openstack is generally well suited for typical use cases and there is hardly reasons to tinker with advance options and features available. Normally you would plan your public IP addresses usage and management well in advance, but if you are an experimental lab like ours, many a times things are handled in an ad-hoc manner. Recently, we ran into a unique problem which took us some time to figure out a solution.

We manage a full 160.xxx.xxx.xxx/24 block of 255 public IP addresses. Due to an underestimated user demand forecast, in our external cloud we ended up with a floating-ip pool that was woefully inadequate. One solution was to remove the external network altogether and recreate a new one with the larger floating-ip pool. The challenge was – we had real users, with experiments running on our cloud and destroying the external network was not an option.

So here is what we did to add more floating ips to the pool without even stopping or restarting any of the neutron services –

  1. Log onto your openstack controller node
  2. Read the neutron configuration file (usually located at /etc/neutron/neutron.conf
  3. Locate the connection string – this will tell you where the neutron database in located
  4. Depending on the database type (mysql, sqlite) use appropriate database managers (ours was using sqlite)

I will next show you what to do to add more IPs to the floating pool for sqlite3, this can be easily adapted for mysql.

$ sqlite3 /var/lib/neutron/ovs.sqlite
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .tables

The list of tables used by neutron dumped by the previous command will be similar to –

agents ovs_tunnel_endpoints
allowedaddresspairs ovs_vlan_allocations
dnsnameservers portbindingports
externalnetworks ports
extradhcpopts quotas
floatingips routerl3agentbindings
ipallocationpools routerroutes
ipallocations routers
ipavailabilityranges securitygroupportbindings
networkdhcpagentbindings securitygrouprules
networks securitygroups
ovs_network_bindings subnetroutes
ovs_tunnel_allocations subnets

The tables that are of interest to us are –

  • ipallocationpools
  • ipavailabilityranges

Next look into the schema of these tables, this will shed more light into what needs to be modified –

sqlite> .schema ipavailabilityranges
CREATE TABLE ipavailabilityranges (
allocation_pool_id VARCHAR(36) NOT NULL,
first_ip VARCHAR(64) NOT NULL,
last_ip VARCHAR(64) NOT NULL,
PRIMARY KEY (allocation_pool_id, first_ip, last_ip),
FOREIGN KEY(allocation_pool_id) REFERENCES ipallocationpools (id) ON DELETE CASCADE
);
sqlite> .schema ipallocationpools
CREATE TABLE ipallocationpools (
id VARCHAR(36) NOT NULL,
subnet_id VARCHAR(36),
first_ip VARCHAR(64) NOT NULL,
last_ip VARCHAR(64) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY(subnet_id) REFERENCES subnets (id) ON DELETE CASCADE
);
sqlite>

Next look into the content of these tables, for brevity only partial outputs are shown below. Also I have masked some of the IP addresses with xxx, replace these with real values when using this guide.

sqlite> select * from ipallocationpools;
b5a7b8b4-ad10-4d92-b877-e406df8ceb91|f0034b20-3566-4f9f-a6d5-b725c02f98fc|10.10.10.2|10.10.10.254
7bca3261-e578-4cfa-bba1-51ba6eae7791|765adcdf-72a4-4e07-8860-f443c7b9098b|160.xxx.xxx.32|160.xxx.xxx.80
a9994f70-2b9a-45f3-b5db-31ccc6cb7e90|72250c58-5fda-4d1b-a847-b71b432ea218|10.10.1.2|10.10.1.254
23032620-731a-4092-9509-7591b53b5ddf|12849c1f-4456-4fc1-bea6-444cce4f1ac6|10.10.2.2|10.10.2.254
fcf22336-2bd6-4e1c-92cd-e33af0b23ad9|bcf1082d-50d5-4ebc-a311-7e0618096356|10.10.11.2|10.10.11.254
bc961a47-4902-4ca2-b4f4-c5fd581a364e|09b79d08-aa92-4b99-b1fd-61d5f31d3351|10.10.25.2|10.10.25.254
sqlite> select * from ipavailabilityranges;
b5a7b8b4-ad10-4d92-b877-e406df8ceb91|10.10.10.6|10.10.10.254
a9994f70-2b9a-45f3-b5db-31ccc6cb7e90|10.10.1.2|10.10.1.2
7bca3261-e578-4cfa-bba1-51ba6eae7791|160.xxx.xxx.74|160.xxx.xxx.74
7bca3261-e578-4cfa-bba1-51ba6eae7791|160.xxx.xxx.75|160.xxx.xxx.75

Looking at the above two outputs, it is immediately clear what needs to be done next in order to add more IPs to the floating-ip range.

  1. modify the floating-ip record in the ipallocationpools table, extend the first_ip and/or last_ip value(s)
  2. for each new ip address to be added in the pool, create an entry in the ipavailabilityranges table with first_ip same as last_ip value (set to the actual IP address)

An an example, say I want to extend my pool from 160.xxx.xxx.80 to 160.xxx.xxx.82, this is what I would do

sqlite> update ipallocationpools set last_ip='160.xxx.xxx.82' where first_ip='160.xxx.xxx.32';
sqlite> insert into ipavailabilityranges values ('7bca3261-e578-4cfa-bba1-51ba6eae7791', '160.xxx.xxx.81', '160.xxx.xxx.81');
sqlite> insert into ipavailabilityranges values ('7bca3261-e578-4cfa-bba1-51ba6eae7791', '160.xxx.xxx.82', '160.xxx.xxx.82');
sqlite> .exit

And that’s all, you have 2 additional IPs available for use from your floating-ip pool. And you don’t even need to restart any of the neutron services. make sure that the subnet id is the same as in the ipallocationpools table entry.

Empty parameter list in C function, do you write func(void) or func()?

While reviewing code for the KIARA project I came across a change set which read like this:

- void super_duper_func () {
+ void super_duper_func (void) {

I was puzzled, what’s the difference anyway except from making it explicitly clear that there are no parameter expected? Well, I was wrong. The ISO 9899 standard (read: C99 standard) states under paragraph ‘6.7.5.3 Function declarators (including prototypes)’ that

10 — The special case of an unnamed parameter of type void as the only item in the list
specifies that the function has no parameters.
14 — An identifier list declares only the identifiers of the parameters of the function. An empty
list in a function declarator that is part of a definition of that function specifies that the
function has no parameters. The empty list in a function declarator that is not part of a
definition of that function specifies that no information about the number or types of the
parameters is supplied.

Therefore we can conclude that even though your code may compile and work correctly, your code is not standard compliant and you may even leap a compile time error detection. Have a look a this snippet which compiled flawless with clang 3.4:

#include 

void func();

int main() {
    func("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
    return 0;
}

void func() {
    printf("in func()\n");
}

Though when you turn on all warnings in clang you will get a warning but this is easily overlooked and not very obvious:

$ clang -std=c99 -Weverything -o empty_param_list empty_param_list.c
empty_param_list.c:10:6: warning: no previous prototype for function 'func' [-Wmissing-prototypes]
void func() {
     ^
empty_param_list.c:3:6: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function
void func();
     ^
          void
1 warning generated.

If you go through the code you will find a function prototype and you may think that if there was no previous prototype and the function is defined later than ‘main’ the compiler will fail anyway … In this case if you forgot the function prototype the compiler would throw an error (conflicting types for ‘func’) even if you passed no arguments.

To sum it up:

  • Create function prototypes/declarations (they go before the first function definition in your source)
  • If you don’t need any parameters explicitly write void in the parameter list (helps you with finding mistakes)
  • Turn on all warnings with either ‘-Wall’ (gcc) or ‘-Weverything’ (clang) and don’t ignore those warnings!

Mirantis Fuel – Openstack installation for Noddy

While we have lots of experience working with cloud automation tools, for OpenStack in particular, it has taken us a little while to get around to checking out Fuel from Mirantis. Here, we give a short summary of our initial impressions of this very interesting tool.

Continue reading

Const in C++, a brief overview

A word of warning at the beginning: This post is about C++ and not about C! So whatever you read here may not necessarily apply to C. Nevertheless you may have a look Sugih’s page which features are in C. Besides I’m not going to explain what references and pointers are.

Introduction

While working on the KIARA project writing C++11 code I was faced with the task of passing variables to functions which will not alter these. Or pure getter methods. And after reading a post about C++ constness I was left with more questions than really understanding. A quick search revealed thorough FAQ about const usage in C++. Here I’d like to make a short round-up of it.

How to read const

First of all, read it from right-to-left. For instance int const* const p would be read as “p is a constant pointer to a constant int”.

A few examples

string const& s

Let’s say you read something like this:

void MyClass::func (std::string const& s);

So reading from right-to-left this would say “s is reference to a constant std::string”. This implies that func is not going to modify s. But keep in mind that you may have a dangling reference here, especially if you work with multi-threaded code. If this would be a pointer instead of a reference it is still possible the pointer to be NULL.

string* const s

void MyClass::func (std::string* const s);

Again, read from right-to-left: “s is a constant pointer to a string”. s may be modified (the object itself) but s may not point to a different object. As you may have already noticed it’s not that hard to read when you stick to the “read from right-to-left” rule.

string const* const s

void MyClass::func (std::string const* const s);

This would read “s is a constant pointer to a constant string” and therefore the function guarantees (respectively the compiler enforces it) that the content of s may not be changed nor may s point to a different object.

Similar consts

Obviously some signatures mean the same thing:

void MyClass::func (const std::string& s); // you may know this from C
void MyClass::func (std::string const& s);

void MyClass::func (const std::string* s);
void MyClass::func (std::string const* s); // equivalent
void MyClass::func (std::string* const s); // Pitfall: NOT equivalent

Now, it doesn’t matter which version you use but is more a decision you have to make. If previously written code opts for one variant you should for the sake of consistency use the same way.

Methods that do not change its object

When you write a pure getter method you may want to tell your compiler that the following code may not change the object itself.

std::string const& MyClass::func () const;

The last const indicates that this function may not alter its objects data.

Conclusion

C++ is not a language you master in a few hours of training, C++ is like meditating and takes daily training.

OpenStack Grizzly Multi-Node Installation with Stackforge Puppet-Modules on CentOS 6.4

This blog post describes the installation of OpenStack Grizzly with help of the Stackforge Puppet-Modules on CentOS 6.4 – with the use of network namespaces. The setup consists of a controller/network and a compute node. Of course additional compute nodes can later be added as needed. Continue reading

How to model service quality in the cloud

Why is service quality important?

A cloud can be seen as a service which is provided by a cloud provider and consumed by an end user. The cloud provider has the goal to maximize profit by providing cloud services to end users. Usually there are no fixed prices for using cloud services: users have to pay a variable price that depends on the consumption of cloud services. Service quality is a constraint to the cloud provider’s optimization goal of profit-maximization. The cloud provider should deliver cloud services with sufficiently good performance, capacity, security and availability and maximize his profit. Since quality costs more, a low quality cloud service is preferable to a high quality service, because it costs less. So why should profit-oriented cloud providers bother at all with quality?

A new view of service quality

In the new view, we see service quality not as a restriction to profit maximization. Cloud service quality is an enabler of further service consumption and therefore a force that increases profit of cloud providers. If we think of cloud computing as a low quality service with low degrees of availability (many outages), running slowly and in an insecure environment, one can easily see that cloud consumers will stop using the cloud service as soon as there are alternatives to it. But there is another argument in favour of using clouds with a high degree of quality of service (QoS): if cloud service consumption is performing well, it can be used more often and by more users at once. Therefore an operator of a quality cloud service can handle more user requests and at lower costs than a non-quality-oriented cloud provider.

What is quality in the cloud?

Quality can have different meanings: for us it must be measured in terms of availability, performance, capacity and security. For each of these four terms we have to define metrics that measure quality. The following metrics are used in service management practice:

  1. Availability: Availability can be calculated only indirectly by measuring the downtime, because outages are directly observable while normal operation of a system is not. When an outage occurs, the downtime is reported as the time difference between discovery of an outage and restoration of the service. Availability is then the ratio of total operating time minus downtime to the total operating time. Availability of a system can be tested by using the Dependability Modeling Framework, i. e. a series of simulated random outages which tell system operators how stable their system is.
  2. Performance: Performance is usually the tested by measurement of the time it takes to perform a set of sample queries in a computer program. Such a time measurement is called a benchmark test. Performance of a cloud service can be measured by running multiple standard user queries and then measure their execution time.
  3. Capacity: By capacity we mean storage which is free for service consumption. Capacity on disks can be measured directly by checking how much storage is used and how much storage is free. If we want to know how much working memory must be available, the whole measurement becomes a little bit more complicated: we must measure memory consumption during certain operations. Usually this can be done by profiling the system: like in benchmarking we run a set of sample queries and measure how much memory is consumed. Then we calculate the memory which is necessary to operate the cloud service.
  4. Security: Security is the most abstract quality indicator, because it can not be measured directly. A common practice is to create a vector of potential security threats and estimate the probability that a threat will lead to an attack and estimate the potential damage in case of an attack. Threats can be measured as the product of the attack probability and the potential of a damage. The goal should be  to mitigate the biggest risks with a given budget. A risk is mitigated when there are countermeasures against identified security threats (risk avoidance), minimization measures for potential damages (damage minimization), transfer of security risks to other organizations (e. g. insurances) and (authorized) risk acceptance. Because nobody can know all potential threats in advance there is always an unknown rest risk which cannot be avoided. Security management of a cloud service is good, when the security threat vector is regularly updated and the worst risks are mitigated.

The given metrics are a good starting point for modelling service quality. In optimization there are two types of models: descriptive models and optimization models.

A descriptive model of service quality in the cloud

Descriptive models describe how a process is performed and are used to explore how the process works. Usually descriptive models answer “What If?”-questions. They consist in an input of variables, a function that transforms the input in output and a set of (unchangeable) parameters that influence the transformation function. A descriptive model of cloud service quality would describe how a particular configuration of service components (service assets like hardware, software etc. and management of the service assets) delivers a particular set of output in terms of service quality metrics. If we can e. g. increase availability of the cloud service by using a recovery tool like Pacemaker, a descriptive model is able to tell us how the quality of the cloud service changes.

Sets are all possible resources we can use in our model to produce an outcome. In OpenStack we use hardware, software and labour. Parameters are attributes of the set entities which are not variable: e. g. labour cost, price of hardware assets etc. All other attributes are called variables:  The goal of the modeler is to change these variables and see what comes out. The outcomes are called consequences.

A descriptive model of the OpenStack service could be described as follows:

  • Sets:
    • Technology used in the OpenStack environment
      • Hardware (e. g. physical servers, CPU, RAM, harddisks and storage,  network devices, cables, routers)
      • Operating system (e. g. Ubuntu, openSUSE)
      • Services used in OpenStack (e. g. Keystone, Glance, Quantum, Nova, Cinder, Horizon, Heat, Ceilometer)
      • HA Tools (e. g. Pacemaker, Keepalive, HAProxy)
      • Monitoring tools (e. g.
      • Benchmark tools
      • Profiling tools
      • Security Tools (e. g. ClamAV)
    • Management of the OpenStack environment
      • Interval of availability tests.
      • Interval of performance benchmark tests.
      • Interval of profiling and capacity tests.
      • Interval of security tests.
      • Interval of Risk Management assessments (reconsideration of threat vector).
  • Parameters:
    • Budget to run the OpenStack technology and service management actions
      • Hardware costs
      • Energy costs
      • Software costs (you don’t have to pay licence fees in the Open Source world, but you still have maintenance costs)
      • Labor cost to handle tests
      • Labor costs to install technologies
      • Labor costs to maintain technologies
    • Price of technology installation, maintenance and service management actions
      • Price of tangible assets (hardware) and intangible assets (software, energy consumption)
      • Salaries, wages
    • Quality improvement by operation of particular technology or by performing service management actions
      • Price of tangible assets (hardware) and intangible assets (software, energy consumption)
      • Salaries, wages
  • Variables:
    • Quantities of a particular technology which should be installed and maintained:
      • Hardware (e. g. quantitity of physical servers, CPU-speed, RAM-size, harddisks and storage size, number of network devices, speed of cables, routers)
      • Operating system of each node (e. g. Ubuntu, openSUSE)
      • OpenStack services per node(e. g. Keystone, Glance, Quantum, Nova, Cinder, Horizon, Heat, Ceilometer)
      • HA Tools per node (e. g. Pacemaker, Keepalive, HAProxy)
      • Monitoring tools (e. g.
      • Benchmark tools
      • Profiling tools
      • Security Tools (e. g. ClamAV)
  • Consequences:
    • Costs for installation and maintenance of the OpenStack environment:
      • Infrastructure costs
      • Labour costs
    • Quality of the OpenStack service in terms of:
      • Availability
      • Performance
      • Capacity
      • Security

In the following picture we show a generic descriptive model for optimization of quality of an IT service:

Fig. 1: Descriptive model of service quality of an IT service.

Fig. 1: Descriptive model of service quality of an IT service.

Such a descriptive model is good to exploit the quality improvements delivered by different system architectures and service management operations. The input variables form a vector of systems and operations: Hardware, network architecture, operating systems, OpenStack services, HA tools, benchmark tools, profiling monitors, security software and service operations performed by system administrators. One can experiment with different systems and operations and then check the outcomes. The outcomes are the costs (as a product of prices and systems) and the service quality. The service quality is then measured by our metrics we have defined.

Even if the descriptive model is quite useful, it is very hard to actually optimize service quality. Therefore the descriptive model has to be extended to an optimization model.

An optimization model of service quality in the cloud

Optimization models enhance descriptive models by adding constraints to the inputs of the descriptive model and by defining an objective function.  Optimization models answer “What’s Best?”-questions. They consist in an input of variables, a function that transforms the input in output and a set of (unchangeable) parameters that influence the transformation function. Additionally they contain constraints that restrict the number of possible inputs and an objective function which tells the model user what output should be achieved.

An optimization model of the OpenStack service could be described as follows:

  • Sets:
    • Technology used in the OpenStack environment
      • Hardware (e. g. physical servers, CPU, RAM, harddisks and storage, network devices, cables, routers)
      • Operating system (e. g. Ubuntu, openSUSE)
      • Services used in OpenStack (e. g. Keystone, Glance, Quantum, Nova, Cinder, Horizon, Heat, Ceilometer)
      • HA Tools (e. g. Pacemaker, Keepalive, HAProxy)
      • Monitoring tools (e. g.
      • Benchmark tools
      • Profiling tools
      • Security Tools (e. g. ClamAV)
    • Management of the OpenStack environment
      • Interval of availability tests.
      • Interval of performance benchmark tests.
      • Interval of profiling and capacity tests.
      • Interval of security tests.
      • Interval of Risk Management assessments (reconsideration of threat vector).
  • Parameters:
    • Budget to run the OpenStack technology and service management actions
      • Hardware costs
      • Energy costs
      • Software costs (you don’t have to pay licence fees in the Open Source world, but you still have maintenance costs)
      • Labor cost to handle tests
      • Labor costs to install technologies
      • Labor costs to maintain technologies
    • Price of technology installation, maintenance and service management actions
      • Price of tangible assets (hardware) and intangible assets (software, energy consumption)
      • Salaries, wages
    • Quality improvement by operation of particular technology or by performing service management actions
      • Price of tangible assets (hardware) and intangible assets (software, energy consumption)
      • Salaries, wages
  • Variables:
    • Quantities of a particular technology which should be installed and maintained:
      • Hardware (e. g. quantitity of physical servers, CPU-speed, RAM-size, harddisks and storage size, number of network devices, speed of cables, routers)
      • Operating system of each node (e. g. Ubuntu, openSUSE)
      • OpenStack services per node(e. g. Keystone, Glance, Quantum, Nova, Cinder, Horizon, Heat, Ceilometer)
      • HA Tools per node (e. g. Pacemaker, Keepalive, HAProxy)
      • Monitoring tools (e. g.
      • Benchmark tools
      • Profiling tools
      • Security Tools (e. g. ClamAV)
  • Constraints:
    • Budget limitation for installation and maintenance of the OpenStack environment:
      • Infrastructure costs
      • Labour costs
    • Technological constraints:
      • Incompatible technologies
      • Limited knowledge of system administrators
    • Objective Function:
      • Maximization of service quality in terms of:
        • Availability
        • Performance
        • Capacity
        • Security

The following picture shows a generic optimization model for an IT service:

Fig. 2: Service quality optimization model for an IT service.

Fig. 2: Service quality optimization model for an IT service.

With such an optimization model at hand we are able to optimize service quality of an OpenStack environment. What we need are clearly defined values for the sets, parameters, constraints and objective functions. We must be able to create a formal notation for all model elements.

What further investigations are required?

The formal model can be created if we get to know all information required to assign concrete values to all model elements. This infomration is:

  • List of all set items (OpenStack system environment plus regular maintenance operations): First we must know all possible values for the systems and operations used in our OpenStack environment. We must know which hardware, OS and software we can use to operate OpenStack and which actions (maintenance) must be performed regularly in order to keep OpenStack up and running.
  • List of all parameters (costs of OpenStack system environment  elements, labour cost for maintenance operations and quality improvement per set item): In a second step we must obtain all prices for our set items. This means we must know how much it costs to install a particular hardware, OS or software and we must know how much the maintance operations cost in terms of salaries. Additionally we must know the quality improvement which is delivered per set item: this can be done by testing the environment with and without the item (additional system or service operation) and using our quality metrics.
  • List of constraints (budget limit and technical constraints): In a third step we must get to know the constraints, i. e. budget limits and technical constraints. A technical constraint can be a restriction like that you can use only one profiling tool.
  • Required outcomes (targeted quality metric value maximization): Once we know the sets, parameters and constraints, we must define how quality is measured in a function. Again we can use our quality metrics for that.
  • Computation of optimal variable values (which items should be bought): Once we know all model elements, we can compute the optimal variables. Since we will not get a strict mathematical formula for the target function and since we may also work with incomplete information, it is obvious that we should use a metaheuristic (like e. g. evolutionary algorithms) to find a way on how to optimize service quality.

We have seen that creating a model for service quality optimization in the cloud requires a lot of investigation. Some details about it will be revealed in further articles.

 

OpenStack Development Process

by Josef Spillner

Preface

OpenStack is a cloud computing project to provide an infrastructure as a service (IaaS). It is free open source software released under the terms of the Apache License. The project is managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. More than 200 companies joined the project.
How you can understand that is large project with hundreds of developers, hundreds of thousands lines of code. This topic will make clear development process in OpenStack and how to push the code into OpenStack.

Start OpenStack development

1. Signing up for accounts:

The first thing you should do is signing up for LaunchPad account. LaunchPad is a web application and website that allows users to develop and maintain software, particularly open-source software. Launchpad is developed and maintained by Canonical Ltd. The OpenStack project uses LaunchPad for mailing list, blueprints, groups, bug tracking. Each OpenStack project will have a LaunchPad project.  You can create account here.

The next you should signing up in Gerrit. Gerrit is a free, web-based team software code review tool. Software developers in a team can review each other’s modifications on their source code using a Web browser and approve or reject those changes. It integrates closely with Git, a distributed version control system. To interact with Gerrit you need to set SSH key. Because all Gerrit commands are using SSH protocol and the host port is 29418. A user can access Gerrit’s Git repositories with SSH or HTTP protocols. The user must have registered in Gerrit and a public SSH key before any command line commands can be used.

2. Communication tools: 

You should be on OpenStack’s mailing list and also on your OpenStack project’s mailing list. It’s necessary to take part in discussions about code development, project design etc. You can subscribe to mailing list according this instruction.

Also for quick answers you can use the IRC channels. It can be questions about how to work with particular methods or about why tests fail. Every week you should take part on IRC meeting of your project where you can discuss some release detail or your own bugs etc. Information about IRC channels you can find here.

3. Setting up development environment:

You can develop on any system you want but the most used and the most comfortable for this aim is Ubuntu. The first thing you need it is Git. In software development, Git is a distributed revision control and source code management (SCM) system with an emphasis on speed. You need Git to pull and push code into Gerrit.

The next step is DevStack. DevStack’s mission is to provide and maintain tools used for the installation of the central OpenStack services from source (git repository master, or specific branches) suitable for development and operational testing. It also demonstrates and documents examples of configuring and running services as well as command line client usage.

git clone git://github.com/openstack-dev/devstack.git
cd devstack; ./stack.sh

Now you can get an Openstack project:

git clone https://git.openstack.org/openstack/ceilometer

This topic not about development of ceilometer so let skip it. Let image that you already have a complete code.
You need install a pip. Pip is a tool for installing and managing Python packages.

sudo apt-get install python pip

Mostly you need pip for install a tox.

sudo pip install tox

OpenStack has a lot of projects. For each project, the OpenStack Jenkins needs to be able to perform a lot of tasks. If each project has a slightly different way to accomplish those tasks, it makes the management of a consistent testing infrastructure very difficult to deal with. Additionally, because of the high volume of development changes and testing, the testing infrastructure has to be able to pre-cache artifacts that are normally fetched over the internet. To that end, each project should support a consistent interface for driving tests and other necessary tasks.

  • tox -epy26 – Unit tests for python2.6
  • tox -epy27 – Unit tests for python2.7
  • tox -epep8 –  pep8 checks

If all tests are running you can already pull the code into Gerrit.

4.Publication code:

Simply running git review should be sufficient to push your changes to Gerrit, assuming your repository is set up as described above, you don’t need to read the rest of this section unless you want to use an alternate workflow.

If you want to push your changes without using git-review, you can push changes to gerrit like you would any other git repository, using the following syntax (assuming “gerrit” is configured as a remote repository):

git push gerrit HEAD:refs/for/$BRANCH[/$TOPIC]

Where $BRANCH is the name of the Gerrit branch to push to (usually “master”), and you may optionally specify a Gerrit topic by appending it after a slash character.

If you want to commit changes: Git commit messages should start with a short 50 character or less summary in a single paragraph. The following paragraph(s) should explain the change in more detail.

If your changes addresses a blueprint or a bug, be sure to mention them in the commit message using the following syntax:

blueprint BLUEPRINT
Closes-Bug: ####### (Partial-Bug or Related-Bug are options)

For example:
Adds keystone support

...Long multiline description of the change...

Implements: blueprint authentication
Closes-Bug: #123456
Change-Id: I4946a16d27f712ae2adf8441ce78e6c0bb0bb657

5.Code review: 

Automatic testing occurs and the results are displayed. Reviewers comment in the comment box or in the code itself.

If someone leaves an in-line comment, you can see it from expanded “Patch Set.” “Comments” column shows how many comments are in each file. If you click a file name that has comments, the new page shows a diff page with the reviewer’s name and comments. Click “Reply” and write your response. It is saved as a draft if you click “Save.” Now, go back to the page that shows a list of patch sets and click “Review,” and then, click “Publish comments.”

If your code is not ready for review, click “Work in Progress” to indicate that a reviewer does not need to review it for now. Note that the button is invisible until you login the site.

 

« Older posts Newer posts »