Code Driven Infrastructure and Deployment

This thesis was done as part of a bachelor's degree study at ZHAW Zurich University of Applied Sciences in Zurich. The source code of this document is available online at https://github.com/fabian/code-driven-infrastructure-and-deployment.

All sources are numbered [n] and listed in the bibliography in the appendix. Basic knowledge in computer science is required for reading and understanding the thesis. The most important acronyms and concepts are explained in the glossary that can be found in the appendix as well.

This document has been written in HTML and was converted into a PDF document with Prince. The font used is Helvetica Neue, created by D. Stempel AG and based on Helvetica by Max Miedinger.

The icons used on the cover and in this document are from small-n-flat, they are released to the public domain.

Abstract

WorldSkills International manages its web infrastructure manually with a small team of developers. There is no designated system administrator, the responsibility of managing the web infrastructure is shared among the developers. The author of this thesis is employed as a developer by WorldSkills International. Target of the thesis is to develop a concept for a testable and reproducible infrastructure where all changes are done within a revision control system. This increases visibility of changes within the developers team and makes the change tracable.

Many software solutions exist for provisioning of servers and deployment of applications. They can be grouped into software containers, configuration repository, and remote command execution. With software containers the software needed to run an application is encapsulated and run in operating system-level virtualization. A configuration repository is a centralized repository with configuration files and software is used to configure servers based on the configuration files. With remote command execution the installation commands and configuration files are transmitted in a coordinated manner to a remote server.

The current infrastructure requirements were analyzed and documented. The different types of software for provisioning of servers and deployment of applications were considered for solving the problem. Each type was evaluated with a popular representative according to the requirements. The evaluation showed that software containers need additional software for the orchestration of the provisioning and deployment. Due to the additional complexity and the missing requirements for software containers their usage was delayed to a potential separate project after this thesis. The automation software Ansible was choosen as best fitting for the requirements.

An architecture concept for an automated infrastructure was developed and successfully verified in a proof-of-concept. All software and configuration files required are defined in structured text files that can be read and transmitted to the server with Ansible. Local development and testing can be done in a virtual machine on the developers’ computer. The whole infrastructure can easily be cloned by running the provisioning scripts against new servers. A continious integration server executes the provisioning script after each change and verifies the infrastructure with system tests. The automated creation of the proof-of-concept infrastructure takes about 20 minutes. The automated process of setting up a new infrastructure environment for testing is simply initiated by creating a new branch adhering to a naming scheme. The implementation of the architecture concept and a migration to the new automated infrastructure is planned for fall 2015.

Introduction

Current Situation

WorldSkills International is a non-profit membership association which organizes a world championship in skilled professions every two years. The author of this thesis is employed by WorldSkills International. To manage members and to organize the preparation and execution of the competition the organization runs multiple web applications. The mix of PHP and Java applications consists of legacy systems and a newly developed software system with a service-oriented architecture.

All web software is running on rented virtual servers with a Linux operating system. They are managed manually using a web control panel (Parallels Plesk). Changes to the infrastructure are done by the four internal software developers manually.

Fundamental changes like the migration to a new server or the switch to a new runtime engine require a lot of knowledge about the existing system and manual testing of the new installation.

Objectives

The main goal of this thesis is to develop a concept for a versioned, testable and reproducible infrastructure. Changes to the system should be visible for the IT team and traceable if needed. As a result knowledge is shared in written form.

To achieve this goal, manual steps to build or change the infrastructure should be replaced by code stored in a revision control system. Three different types of software for provisioning of servers and deployment exist at the moment:

Software containers: Software needed to run an application is encapsulated and run in operating system-level virtualization. Examples are Docker, rkt.
Configuration repository: A centralized repository with configuration files and software is used to configure servers. Examples are Chef, Puppet, CFEngine.
Remote command execution: Installation commands and configuration files are transmitted in a coordinated manner to a remote server. Examples are Ansible, SaltStack, Rex.

These types of software should be evaluated and an architecture documentation as well as a test concept should be written. The proposed architecture should then be tested in a proof-of-concept.

Tasks

The following tasks will be completed by the student as part of the bachelor thesis:

Analyze infrastructure requirements
Test requirements with popular representatives for each type of provisioning software
Evaluate provisioning software
Write architecture documentation
Develop test concept
Implement proof-of-concept with automated provisioning and deployment

Project Management

The following gantt diagram shows an overview of the actual timeline during the project as well as the dates of the most important milestones.

	February		July
Kick-Off	25.02.15 ◆
Requirements analysis
Software evaluation
Design Review		◆ 29.04.15
Architecture concept
Test concept
Proof-of-concept
Final date			31.07.15 ◆

Project schedule

According to the rules at least 360 hours have to be invested into the bachelor thesis. The planning was done according to those hours.

Both the requirements analysis and the software evaluation took longer than expected as many aspects had to be studied in detail. In return the architecture concept could be done faster. The proof-of-concept also took less time then expected as only a small number of problems occured during development. Writing the documentation was slightly underestimated.

Description	Planned	Actual
Requirements analysis	64 h	~72 h
Software evaluation	48 h	~64 h
Architecture concept	88 h	~80 h
Test concept	40 h	~40 h
Proof-of-concept	80 h	~64 h
Write documentation	40 h	~56 h
Total	360 h	376 h

Comparison planned and actual hours

Requirements

Overview

The following requirements were established by analysing the existing infrastructure and taking known problems into account.

Stakeholders

The table below shows the stakeholders which have been found for the current infrastructure. They have a direct or indirect influence on the requirements. These stakeholders are also used in the system context diagram.

Stakeholder	Description
Developer	Works at WorldSkills International and is responsible for the development and maintenance of the infrastructure. There are four developers working full-time for WorldSkills International. There is no designated system administrator. Developers have different backgrounds and therefore different knowledge about specific components of the instrastructure. They all share the responsibility for keeping the infrastructure running. All developers are using Mac OS X for development. They work from three different time zones.
User	Interacts with applications running on the WorldSkills infrastructure. This includes the Secretariat, Members, competition personnel and website visitors. Most of them are registered users. Their expectations for fast and continuously running services influence the requirements.
Hosting provider	Provides virtual Linux servers for the infrastructure. As they are in a highly competitive and fast moving market they can become obsolete and need to be replaced with another hosting provider with a better offering.
GitHub	Hosts Git code repositories for WorldSkills International. Provides a web interface for managing permissions of the repositories. They control how code can be accessed.
Codeship	Provides a hosted continuous integration software for WorldSkills International. The software is based on Linux with support for PHP, Java and JavaScript applications. Their functionality defines how applications can be built and deployed.

Stakeholders

System Context

Both Developer and User need to interact with the infrastructure or the applications running on it. They have a direct influence on the requirements and lie within the system context.

The source code of most applications is stored on GitHub. Codeship is used for running automated tests and executing the deployment of new versions. The hosting provider supplies the servers for running the infrastructure. All three vendors influence the requirements indirectly with the constraints of their services. They are outside of the system context as they cannot be influenced.

Applications

The current infrastructure is composed of multiple applications deployed on three servers. There is no requirement to keep them on separate servers. The following diagram gives an overview of all applications. A short description of each application can be found in the appended table.

The following table lists all applications and their special requirements.

Application	Description	Requirements
Application	Description	RabbitMQ	MySQL	JavaMail	Uploads
Web services	Java applications for managing organization information
Management	JavaScript applications for accessing the web services
Auth	PHP application for login
worldskills.org	Organization website
WSC2015 website	WorldSkills São Paulo 2015 event website
WSC2017 website	WorldSkills Abu Dhabi 2017 event website
Members map	World map with Facebook pages from other countries
IL	PHP application for managing infrastructure lists
Aggregator	PHP application which serves the mobile app content
Who-is-who	PHP application for managing organization personnel
Registrations	PHP application for registering people and a web service
Forums	Dicussion forums
Portal	Website with information about WorldSkills Competition participants
CIS demo	Competition Information System demo
Rooms	Java application for reserving meeting rooms
Archive	Static copies of old event websites
Mailer	PHP application for sending emails to groups of people
SMP	Skill Management Plan for planning the skill competitions
CPT	Competition Planning Timetable with important deadlines

Application requirements

User Stories

To describe the functional requirements for the new infrastructure user stories are used in this chapter. They are prioritized in agreement with the developers in three levels: Must, Should, Could.

Name	R01 PHP applications
Description	As a developer I want to run multiple PHP applications so users can access them. A PHP application usually needs a MySQL database, the source code is stored on GitHub.
Acceptance Criteria	Each PHP application is running and can be accessed with a web browser.
Priority	Must

Name	R02 Java applications
Description	As a developer I want to run multiple Java applications on a Tomcat server so users and other applications can use the services provided by the applications. A Java application usually needs a MySQL database, the source code is stored on GitHub.
Acceptance Criteria	Each Java application is running and can be accessed over HTTP/S.
Priority	Must

Name	R03 Server configuration
Description	As a developer I might want to change a server configuration file to optimize a setting. For example the MySQL Query Cache size needs to be increased. Another example would be that the servlet container configuration needs a new variable due to a change in the application.
Acceptance Criteria	A configuration file gets modified and the change is pushed to the code repository. The new configuration file is automatically deployed to the server and the affected applications loads the new configuration.
Priority	Must

Name	R04 Application deployment
Description	As a developer I want to deploy a new version of an application so users can benefit from new features or bug fixes. The deployment causes no downtime of the application.
Acceptance Criteria	A new version of an application gets pushed to the code repository. Automated tests of the application are executed and if they pass the application gets deployed to the server. The old version keeps responding to requests until the new version is ready.
Priority	Should

Name	R05 Staging environment
Description	As a developer, I want to test a new version of a web service in a staging environment so I can make sure it works correctly with all other components of the system.
Acceptance Criteria	A new version of a web service is pushed in a separate branch, the functionality is available for testing in a staging environment within 30 minutes from the push.
Priority	Should

Name	R06 Hosting provider switch
Description	As a developer I want to switch the server hosting provider so I can profit from a better offer. Another reason could be that the current hosting provider shuts down or its no longer justifiable because of his actions (e.g. security problems).
Acceptance Criteria	The infrastructure can be ported to a different provider within 48 hours. All needed software is installed automatically, databases and user files are transferred manually.
Priority	Could

Non-functional Requirements

The following non-functional requirements are not conclusive but they are the most critical ones. They are classified according to quality model of ISO/IEC 25010:2011 and prioritized in agreement with the developers in three levels: Must, Should, Could.

The requirements are based on the current infrastructure and events in the past related to it.

Name	R11 Configuration files
Classification	Changeability
Description	All configuration files are stored in a code repository.
Priority	Must

Name	R12 Change history
Classification	Accountability
Description	Every change must be traceable by a developer. Associated with every change is an explanation.
Priority	Must

Name	R13 Open Source
Classification	Replaceability
Description	All software used for infrastructure has to be built on open-source software. This guarantees that components can easily be ported to different providers or maintainers. It also allows other Members to easily copy parts of the infrastructure.
Priority	Must

Name	R14 Test environment
Classification	Testability
Description	To test configuration changes to it, the whole or part of the infrastructure can be started in a test environment. This is different from the staging environment in that the test environment can be local and automated tests are executed against it.
Priority	Must

Name	R15 Encrypted passwords
Classification	Confidentiality
Description	Server passwords should be stored only encrypted on third-party systems. The advantage of storing encrypted passwords in the code repository and sharing a key file instead of sharing the passwords in a file, is that the file doesn't need to be updated for everyone each time a password is added.
Priority	Must

Name	R16 Superuser access
Classification	Technical accessibility
Description	In case of a problem that only occurs in a certain environment, a developer needs unrestricted access to the server to debug the error and try out different solutions.
Priority	Must

Name	R17 Custom software
Classification	Interoperability
Description	New software can be installed without restrictions. New features or analytics tools might require the installation of additional software.
Priority	Must

Name	R18 Learning curve
Classification	Learnability
Description	How to use the software system to install and configure the infrastructure can be learned quickly so all developers can make changes to the infrastructure without spending weeks studying it.
Priority	Should

Name	R19 Horizontal scaling
Classification	Changeability
Description	Horizontal scalable applications can be installed on multiple servers and served through a load balancer.
Priority	Could

Evaluation

Introduction

Based on the given requirements the following evaluation defines the guiding principles for the architecture concept. It compares the different approaches for configuring and deploying software and how well they fit to the existing environment.

Only specific attributes of the approaches and their software is analyzed, general comparisons have been written before.

Cloud solutions for managing applications exist, but are usually locked in on the vendor and targeted at high volumes. Our scaling requirements are low as the infrastructure needs to serve mainly the Competition and the Members and they are both limited by other resources (e.g. Member budgets).

Software Containers

This type of software recently became popular with Docker. The idea is to use the advantages of a virtual machine (isolation, portability, efficiency) while sharing resources. Docker has been chosen as it has been perceived as the most active project. A similar project is rkt (Rocket) by CoreOS.

Docker only works on Linux, so for development on Mac OS X the software Boot2Docker is used on the development laptop. Installation instructions are provided in the Docker documentation in the chapter Installation for Mac OS X.

After installation the virtual machine with Linux running Docker is launched with the following two commands.

$ boot2docker init
$ boot2docker start
Waiting for VM and Docker daemon to start...
.............ooo
Started.
Writing /Users/fabian/.boot2docker/certs/boot2docker-vm/ca.pem
Writing /Users/fabian/.boot2docker/certs/boot2docker-vm/cert.pem
Writing /Users/fabian/.boot2docker/certs/boot2docker-vm/key.pem

To connect the Docker client to the Docker daemon, please set:
    export DOCKER_HOST=tcp://192.168.59.103:2376
    export DOCKER_CERT_PATH=/Users/fabian/.boot2docker/certs/boot2docker-vm
    export DOCKER_TLS_VERIFY=1

Boot2Docker initialization

Another useful tool provided by Docker is Docker Compose which allows multiple instances to be started based on a configuration file. However its usage in production is not recommended at the moment as it's missing some features for managing running instances.

During the following verification The Docker Book was used as a reference.

Requirements verification

Requirement	R01 PHP applications
Verification	Docker publishes official images for certain applications and programming languages in the Docker Hub Repository. There's also an official image for PHP which can be found on GitHub. Maintained images for recent PHP versions are provided with built-in Apache or PHP-FPM for running the application as a service. The easiest way to run a PHP application with Docker is to the use Apache web server so the application itself and static files (JavaScript, CSS, etc.) can both be served from one process. The MySQL server would be started in a separate instance, so an additional Dockerfile for MySQL is needed and then needs to be linked with the container running the PHP application. To simplify this Docker Compose can be used. It's a tool that can start multiple Docker instances based on a configuration file.
Verdict	Pass

Requirement

R01 PHP applications

Verification

Docker publishes official images for certain applications and programming languages in the Docker Hub Repository. There's also an official image for PHP which can be found on GitHub. Maintained images for recent PHP versions are provided with built-in Apache or PHP-FPM for running the application as a service.

The easiest way to run a PHP application with Docker is to the use Apache web server so the application itself and static files (JavaScript, CSS, etc.) can both be served from one process.

The MySQL server would be started in a separate instance, so an additional Dockerfile for MySQL is needed and then needs to be linked with the container running the PHP application. To simplify this Docker Compose can be used. It's a tool that can start multiple Docker instances based on a configuration file.

Verdict

Pass

Requirement	R02 Java applications
Verification	For Tomcat official images are available from the Docker Hub Repository as well. Multiple versions for Java 7 and 8 are available. Configuration and the application code can be copied to the working directory of Tomcat - the application is then started automatically. Multiple applications can be bundled into one container or one container per application can be used. Again as with PHP, a MySQL server would be started in a separate instance.
Verdict	Pass

Requirement

R02 Java applications

Verification

For Tomcat official images are available from the Docker Hub Repository as well. Multiple versions for Java 7 and 8 are available.

Configuration and the application code can be copied to the working directory of Tomcat - the application is then started automatically. Multiple applications can be bundled into one container or one container per application can be used. Again as with PHP, a MySQL server would be started in a separate instance.

Verdict

Pass

Requirement	R03 Server configuration
Verification	To guarantee reproducible containers, configuration files are usually copied into the container at build time. This allows the container to be tested in a staging environment and guarantees the same results in the production environment. Docker provides the tools to build the image and upload it to a central repository, however it doesn't provide any tools to do an actual deployment of the image to a live environment out of the box.
Verdict	Fail

Requirement

R03 Server configuration

Verification

To guarantee reproducible containers, configuration files are usually copied into the container at build time. This allows the container to be tested in a staging environment and guarantees the same results in the production environment.

Docker provides the tools to build the image and upload it to a central repository, however it doesn't provide any tools to do an actual deployment of the image to a live environment out of the box.

Verdict

Fail

Requirement	R04 Application deployment
Verification	There are different ways to implement this with Docker. One way would be to build a new image, deploy it separately and then route requests with a load balancer to the new image. However as noted before, Docker doesn't provide tools to do a deployment of the image nor the orchestration of running machines. Another possibility would be to use volumes to deploy the new application into the running container but this would require each application to have built-in support for zero downtime deployment, which is not always the case (Tomcat supports this with parallel deployment but not PHP).
Verdict	Fail

Requirement

R04 Application deployment

Verification

There are different ways to implement this with Docker. One way would be to build a new image, deploy it separately and then route requests with a load balancer to the new image. However as noted before, Docker doesn't provide tools to do a deployment of the image nor the orchestration of running machines.

Another possibility would be to use volumes to deploy the new application into the running container but this would require each application to have built-in support for zero downtime deployment, which is not always the case (Tomcat supports this with parallel deployment but not PHP).

Verdict

Fail

Requirement	R05 Staging environment
Verification	Containers can be started quickly in a new environment (e.g. a new virtual server). Database fixtures can be loaded from additional containers which are linked to the database container.
Verdict	Pass

Requirement	R06 Hosting provider switch
Verification	Thanks to virtualization the only requirement for a new hosting provider would be to run Docker. Transfer of the data is possible by launching additional containers on the old server and mounting the data volumes. The installation of Docker would need to be done manually if Docker is not pre-installed by the hosting provider. But more importantly the configuration of the host machine (e.g. SSH keys, logging, security settings) could not be automated with Docker alone.
Verdict	Fail

Requirement

R06 Hosting provider switch

Verification

Thanks to virtualization the only requirement for a new hosting provider would be to run Docker. Transfer of the data is possible by launching additional containers on the old server and mounting the data volumes. The installation of Docker would need to be done manually if Docker is not pre-installed by the hosting provider. But more importantly the configuration of the host machine (e.g. SSH keys, logging, security settings) could not be automated with Docker alone.

Verdict

Fail

Requirement	R11 Configuration files
Verification	All configuration files can be stored in the code repository and then added to the container during build time.
Verdict	Pass

Requirement	R12 Change history
Verification	By storing all Dockerfiles and configuration files in a code repository and building the container images on a continuous integration server each change can be traced back.
Verdict	Pass

Requirement	R13 Open Source
Verification	Both Docker and the registry server for Docker are Open Source on GitHub, they are licensed under the Apache License.
Verdict	Pass

Requirement	R14 Test environment
Verification	Multiple containers can be launched locally so tests can be executed.
Verdict	Pass

Requirement	R15 Encrypted passwords
Verification	Secret data is usually passed as environment variables to the Docker container. Docker doesn't provide functionality for encrypting information.
Verdict	Fail

Requirement	R16 Superuser access
Verification	Containers can be accessed from the outside with different methods and superuser access to the host system is required by Docker.
Verdict	Pass

Requirement	R17 Custom software
Verification	There are some limitations of what can be run inside a container on the kernel level, but they are not affecting most use cases.
Verdict	Pass

Requirement	R18 Learning curve
Verification	Immutable applications are a complex topic and Docker requires specific methods for normal tasks.
Verdict	Fail

Requirement	R19 Horizontal scaling
Verification	Multiple instances of images can be started in parallel and Docker Swarm, a clustering tool for Docker, can be used to manage them.
Verdict	Pass

Summary Software Containers

One notable property of Software Containers is their iteration speed. Due to how file changes are saved by the file system, setup commands don't need to be repeated each time but the resulting files can be restored within seconds. This makes experimenting with the container much faster than setting up a complete virtual machine each time.

It became clear that Docker itself is a great tool for reducing the overhead of virtual machines, however it is not a complete solution for infrastructure management (yet).

Advantages	Disadvantages
Active Docker community Software can be updated for each container independently Clear separation of concern Can be scaled well horizontally Fast development iterations	Added complexity Additional virtualization layer makes debugging harder New technology, still in development

Analysis Software Containers

A comparison of all requirements can be found after the evaluation of the three different software types.

Configuration Repository

Chef and Puppet are the most popular representatives for this type of software. It has existed for a few years now and has the target to provide a central repository for infrastructure configuration from which server clients regularly pull their changes. This kind of software type usually also provides a way to use it without a central server but that loses the advantage of having a central point to coordinate changes.

Chef is used here because of the already existing knowledge of Ruby and its popularity. It is usually running on a Linux machine, so for development a virtual machine with Linux is required. To simplify the setup of the virtual machine, Vagrant can be used. Vagrant is software to run virtual machines based on a configuration file. After installing Vagrant, a virtual machine can be started with a Linux image provided by Chef which already has all the required software pre-installed.

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
    default: The Berkshelf shelf is at "/Users/fabian/.berkshelf/…"
==> default: Sharing cookbooks with VM
==> default: Importing base box 'chef/ubuntu-14.04'...
[…]
==> default: Running provisioner: chef_solo...
    default: Installing Chef (latest)...
Generating chef JSON and uploading...
==> default: Running chef-solo...
==> default: stdin: is not a tty
==> default: […] INFO: Forking chef instance to converge...
==> default: […] INFO: *** Chef 12.2.1 ***
==> default: […] INFO: Chef-client pid: 2030
==> default: […] INFO: Setting the run_list to ["recipe[example-chef::default]"]
==> default: […] INFO: Run List is [recipe[example-chef::default]]
==> default: […] INFO: Run List expands to [example-chef::default]
==> default: […] INFO: Starting Chef Run for example-chef-berkshelf
==> default: […] INFO: Running start handlers
==> default: […] INFO: Start handlers complete.
==> default: […] INFO: Chef Run complete in 0.017230899 seconds

Vagrant initialization with Chef

Chef organizes commands in recipes and cookbooks. Variables are stored in files called data bags. The books Cooking Infrastructure by Chef and Taste Test were used during the following verification.

Requirements verification

Requirement	R01 PHP applications
Verification	Cookbooks for PHP are available in the Supermarket, the official repository for cookbooks. The PHP cookbook is published and maintained by the company behind Chef itself. However the functionality provided by the cookbook is mostly focused on the PHP extension repository PEAR, which is not a requirement. A more popular alternative is the apache2 cookbook. It provides the possibility to install the Apache web server with mod_php to run PHP together within a web server. For MySQL there's an official cookbook available as well which installs and starts a MySQL server as required.
Verdict	Pass

Requirement

R01 PHP applications

Verification

Cookbooks for PHP are available in the Supermarket, the official repository for cookbooks. The PHP cookbook is published and maintained by the company behind Chef itself.

However the functionality provided by the cookbook is mostly focused on the PHP extension repository PEAR, which is not a requirement. A more popular alternative is the apache2 cookbook. It provides the possibility to install the Apache web server with mod_php to run PHP together within a web server.

For MySQL there's an official cookbook available as well which installs and starts a MySQL server as required.

Verdict

Pass

Requirement	R02 Java applications
Verification	The Supermarket also has an official cookbook for Tomcat. Java must be installed separately, again there's a cookbook available in the Supermarket. The same MySQL cookbook as mentioned before can be used.
Verdict	Pass

Requirement	R03 Server configuration
Verification	With Chef configuration files are usually built from templates which are stored in the code repository. Once uploaded to the Chef server the client receives them, replaces the variables in them and puts them to the targeted location. In addition the service using the file can be restarted.
Verdict	Pass

Requirement R04 Application deployment

Verification

Requirement	R04 Application deployment
Verification	For Tomcat parallel deployment can be used here. The actual deployment to the server is done using the Tomcat Manager App. The `deploy` directive can be used to update a PHP application to the latest revision without causing any downtime. It keeps a copy of the current version running and uses symlinks to make the latest version active once all dependencies have been installed.
Verdict	Pass

For Tomcat parallel deployment can be used here. The actual deployment to the server is done using the Tomcat Manager App.

The deploy directive can be used to update a PHP application to the latest revision without causing any downtime. It keeps a copy of the current version running and uses symlinks to make the latest version active once all dependencies have been installed.

Verdict Pass

Requirement	R05 Staging environment
Verification	Chef can be used to setup a virtual machine with the needed environment. Once the virtual machine with the Chef client is running it can pull all information needed to setup the environment from the Chef server.
Verdict	Pass

Requirement	R06 Hosting provider switch
Verification	The Chef client needs to be installed manually on a new virtual machine if the hosting provider doesn't provide machines with Chef pre-installed. Once the Chef client is installed it can install all required software and configure the machine as needed.
Verdict	Pass

Requirement	R11 Configuration files
Verification	Configuration files can be stored as templates in the code repository. They get compiled and written to their target destination by Chef.
Verdict	Pass

Requirement	R12 Change history
Verification	All cookbooks and templates can be stored in a code repository, a continuous integration server uploads them to the Chef server for distribution to the clients.
Verdict	Pass

Requirement	R13 Open Source
Verification	Chef Client, Chef Server and Chef Development Kit (DK) are Open Source on GitHub, they are licensed under the Apache License.
Verdict	Pass

Requirement	R14 Test environment
Verification	The cookbooks can also be used to configure the virtual machine on which the tests are running on.
Verdict	Pass

Requirement	R15 Encrypted passwords
Verification	Data bags can store variables for the Chef server. Confidential information can be stored in encrypted data bags. The secret key can be shared within the developers over other secure channels.
Verdict	Pass

Requirement	R16 Superuser access
Verification	No restrictions about access to the server are imposed by Chef.
Verdict	Pass

Requirement	R17 Custom software
Verification	Packages from the Linux distribution are simple to install with Chef, but also custom software can be downloaded and installed.
Verdict	Pass

Requirement	R18 Learning curve
Verification	Chef has a complex architecture and many dependencies on other libraries. Developers not familiar with Ruby also need to learn the syntax first.
Verdict	Fail

Requirement	R19 Horizontal scaling
Verification	Multiple Chef clients can connect to the same Chef server and the same cookbooks can be installed on multiple hosts.
Verdict	Pass

Summary Configuration Repository

The installation of the Chef DK was difficult as it relies on overriding rbenv, a tool for managing Ruby versions, in the PATH variable. Some modifications to .bash_profile were needed to get it running properly.

The possibility to test the cookbooks locally inside a virtual machine with Vagrant proved handy. However a few times the cookbook failed unexpectedly (e.g. because a wrong package was accidentally installed first) - a complete rebuild solved the problem but took quite some time.

A popular cookbook that was initially selected for deploying Java applications turned out to be incompatible with the latest version of the official Tomcat cookbook for no obvious reason. In general the whole system seemed complex: many components rely on others and compatibility has to be figured out manually.

Advantages	Disadvantages
Can be used to manage large number of servers Utilizes operating system packages Central server as single source for configuration	Complex dependencies Maintained server or hosted solution for Chef server needed Complete rebuild is slow

Analysis Configuration Repository

Remote Command Execution

This type of software is similar to the Configuration Repository described in the previous chapter: Software to be installed and commands to be executed are defined in text files. The main difference is that instead of having a central server where everything is available to get pulled, everything gets pushed to the hosts. In fact Configuration Repository software like Chef also supports this kind of operation mode with Chef Solo.

Ansible, Salt, and Rex are implementations specific for this kind of software type. Ansible is used here because it is only focused on this kind of software type and because of its popularity.

Ansible is a Python application and can be installed on a development machine as a Python package according to the Ansible documentation. Ansible uses SSH to communicate to remote servers and execute commands on them. Directives are organized in playbooks and roles.

Again Vagrant can be used to simplify the setup of a remote host and test the playbook. As Ansible requires no specific software on the host any Linux distribution can be used with Vagrant. The following snippet shows the initialization process of Vagrant with Ansible:

$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Running provisioner: ansible...
[…] ansible-playbook […] site.yml

PLAY [all] ********************************************************************

[…]

PLAY RECAP ********************************************************************
default                    : ok=1    changed=0

Vagrant initialization with Ansible

The book Taste Test was used during the following verification.

Requirements verification

Requirement

R01 PHP applications

Verification

Ansible includes already most tools needed for installing and configuring software using the operating system mechanisms. Examples for how to use them are provided on GitHub by Ansible itself.

The Apache web server and mod_php can be installed with the operating system packages, their configuration can be created with the Ansible template directive and the service directive makes sure Apache is running. The source code of the PHP application can either be copied to the host or cloned from Git.

MySQL can be installed and configured the same way as Apache.

Verdict

Pass

Requirement	R02 Java applications
Verification	In the examples provided by Ansible they also describe how to install a standalone instance of Tomcat. The installation package gets downloaded from the Tomcat download server, extracted to an appropriate location, the configuration files are created from templates and Tomcat is started as a service. MySQL can be installed with the operating system packages.
Verdict	Pass

Requirement	R03 Server configuration
Verification	Configuration files can be either simply copied to the server from the local playbook or templates with variables can be used. Ansible checks if there were any changes in the configuration file and notifies a service to restart if needed. The playbook can also be run from an continuous integration server.
Verdict	Pass

Requirement R04 Application deployment

Verification

For Tomcat parallel deployment can be used here again. The actual deployment to the server can be done using the Tomcat Manager App or by using Ansible to copy the file to the server.

A combination of the git and file directives can be used to checkout the latest version of the PHP application and activate it with a symlink as soon as all dependencies have been installed.

Verdict Pass

Requirement	R05 Staging environment
Verification	As long as Ansible can connect via SSH to a machine it can set it up as needed. Ansible can also launch new virtual machines in the cloud which is useful for quickly starting a staging environment.
Verdict	Pass

Requirement

R06 Hosting provider switch

Verification

As mentioned before the only requirement on the host machine is the ability for Ansible to connect to it. Once this is provided Ansible can setup the new server as required. The only limitation is that Ansible differentiates between package management systems, so switching from Apt (used by Debian) to Yum (used by SUSE Linux) would require some code changes, but this switch is outside of the scope of this requirement.

Verdict

Pass

Requirement	R11 Configuration files
Verification	All configuration files can be stored as templates or raw files in the code repository of the playbook. Ansible copies them to the target server when being executed.
Verdict	Pass

Requirement	R12 Change history
Verification	The playbook and all its files can be stored in the code repository. A continuous integration server can run the playbook if a change has been made.
Verdict	Pass

Requirement	R13 Open Source
Verification	Ansible is Open Source on GitHub and licenced under GNU General Public License (GPL) v3.0.
Verdict	Pass

Requirement	R14 Test environment
Verification	The playbook can also be executed locally to configure a test environment on the same machine.
Verdict	Pass

Requirement	R15 Encrypted passwords
Verification	Ansible provides a tool called Vault to encrypt files with variables. The encrypted file can automatically be encrypted when running the playbook. The password can be stored in a separate file and shared among developers.
Verdict	Pass

Requirement	R16 Superuser access
Verification	Ansible doesn't restrict how the server can be accessed. In fact Ansible requires that the server can be access using SSH which is the preferred way for developers to access the server as well.
Verdict	Pass

Requirement	R17 Custom software
Verification	Operating system packages as well as custom downloaded software can be installed using Ansible.
Verdict	Pass

Requirement	R18 Learning curve
Verification	Ansible files are easy to read and the concept can be understood quickly. Only the YAML syntax requires a bit of learning but most examples already give a good idea how it works.
Verdict	Pass

Requirement	R19 Horizontal scaling
Verification	Ansible can connect to multiple servers and execute the defined commands there. Servers can also be grouped by their role so that only specific commands are executed on certain servers.
Verdict	Pass

Summary Remote Command Execution

Ansible was easy to install with the Python package manager pip and was instantly ready to use. Because all servers are configured with SSH key authentication already, no additional setup is required to make Ansible usable on them.

It is basically just a thin layer above shell scripts but provides all the tools needed for provisioning and maintaining a server (commands, services, configuration files). It lacks a few advanced features for highly complex setups, but the fact that it's easy to learn is useful in a small team with shared responsibilities.

Advantages	Disadvantages
Few dependencies Easy to learn Utilizes operating system packages No central server needed	Limited reusability of playbooks No programming in playbooks possible System package installation not distribution independent

Analysis Remote Command Execution

Conclusion

Requirement	Software Containers	Configuration Repository	Remote Command Execution
R01 PHP applications
R02 Java applications
R03 Server configuration
R04 Application deployment
R05 Staging environment
R06 Hosting provider switch
R11 Configuration files
R12 Change history
R13 Open Source
R14 Test environment
R15 Encrypted passwords
R16 Superuser access
R17 Custom software
R18 Learning curve
R19 Horizontal scaling

Requirements comparison

Software Containers don't comply with all Must requirements, a Configuration Repository misses one Should requirement (Learning curve), and Remote Command Execution matches all requirements.

Even though Software Containers don't comply with all requirements it is still considered a useful concept if it gets combined with a Configuration Repository or Remote Command Execution for orchestration. The next chapter evaluates the possibility of combining Software Containers with another Configuration Repository or Remote Command Execution.

Software Containers in Combination

The following rough estimation for introducing a software type alone or together with Software Containers under regular workload also shows the result of the higher complexity of a Configuration Repository compared with Remote Command Execution.

	Training	Setup	Migration	Total

Configuration Repository	4 weeks	7 weeks	3 weeks	14 weeks
Configuration Repository with Software Containers	6 weeks	10 weeks	5 weeks	21 weeks

Remote Command Execution	2 weeks	4 weeks	2 weeks	8 weeks
Remote Command Execution with Software Containers	5 weeks	8 weeks	5 weeks	18 weeks

Estimation introduction time

The introduction of an orchestration software together with Software Containers could be divided into two phases. In the first phase the orchestration could be set up with a Configuration Repository or Remote Command Execution for the existing infrastructure. In a second phase new applications which fit into the model of Software Containers could be adapted to those.

	Phase 1		Phase 2
	Month 1	Month 2	Month 3	Month 4	Month 5
Training Orchestration
Setup Orchestration
Migration Orchestration
Training Software Containers
Setup Software Containers
Migration Software Containers

Project schedule

Recommendation

Software Containers add additional complexity and their advantages are not urgently needed for a team of four developers. One of them being able to quickly setting up a development environment, which is not as important as no new developers are expected to join the team within the next four years. The initial architecture concept of this thesis will focus only on phase one based on the requirements and the time available.

The evaluation has shown that Remote Command Execution is easier to understand and maintain than a Configuration Repository. Understanding the system easily is crucial as all developers need to be able to make quick changes to the infrastructure. Based on this the infrastructure concept will use Remote Command Execution software to describe phase one mentioned above. The software Ansible will be used based on its popularity and the good experience during the evaluation.

Concept

Introduction

The following architecture concept is based on the arc42 template, which provides a guideline for documenting software architecture. Goals, constraints and the scope (requirements related information) have already been documented in the previous chapters. For easier understanding some core concepts of Ansible are outlined in this introduction.

Ansible uses the term playbook to describe the collective of configuration files, deployment instructions, and environment specific variables. Each playbook can have multiple roles which allow modular grouping of related files. Best practices how to organize playbooks and roles are provided in the Ansible documentation.

A playbook can also be described as a better structured shell script that tells the server which commands to execute in which order. Before the commands are executed however, it checks if the command is still required or if it is redundant. For configuration files it can check if the file needs to be updated by comparing the desired content to the actual content on the server.

Different environment variables can be grouped in inventory files. An inventory can contain a list of server addresses of an environment as well as configuration variables specific to that environment.

[webservers]
foo.example.org
bar.example.org

[webservers:vars]
proxy_host=proxy.example.org

Example inventory

The YAML syntax is used in playbooks for defining variables and commands to execute. The following is an example of a file for setting up a webserver.

---
- hosts: webservers
  vars:
    http_port: 80
    max_clients: 200
  remote_user: root
  tasks:
  - name: ensure apache is at the latest version
    apt: name=apache2 state=latest

Example Ansible file

Solution Strategy

The whole infrastructure is described in an Ansible playbook which is shared with a code repository on GitHub. Different services are split up into roles. Inside the roles provisioning and deployment are separate tasks and can be executed independently.

Changes to the infrastructure are pushed to the code repository and then Ansible runs on the continuous integration system Codeship to deploy the changes to the servers. Confidential information like passwords and encryption keys are stored in encrypted Ansible Vaults.

For application deployment the playbook is downloaded by the continuous integration system of the application and the required deployment tasks are executed with Ansible to update the application on the server.

Building Block View

The files and their organization within the playbook used to maintain the WorldSkills infrastructure are explained below. The most important parts of the playbook are the inventories and the roles, they are displayed in the following overview diagram.

Environment specific variables are stored in different inventories. There are two inventories: prod (for production) and staging (for system tests). As environment specific variables often contain sensitive information they are encrypted with Ansible Vault.

The role common is used to execute often used tasks like adding public keys of all developers. Instead of hardcoding all servers in the inventory file, they are created or fetched dynamically with the servers role. Each software has its own role for the installation and the setup of the configuration files. Additionally each self-developed application has a role to define the deployment and the setup of configuration files.

The following table show the file structure of the playbook and explains the function of most files and folders. The role apache is used as an example, the structure applies for all roles.

File	Description
site.yml	Main playbook file, delegate execution of commands to servers
inventories
prod
hosts	Inventory file with list of servers for production
group_vars
all	Variables for production (e.g. database password)
staging
hosts	Inventory file with list of servers for staging
group_vars
all	Variables for staging environment
roles
apache	Role for the installation and configuration of the webserver
defaults
main.yml	Default webserver configuration variables (e.g. virtual hosts)
files
worldskills.org.crt	SSL certificate
worldskills.org.key	SSL key
handlers
main.yml	Commands for restarting the webserver
tasks
vhosts.yml	Setup of virtual hosts
main.yml	Installation of the webserver and setup of the configuration files
templates
vhost.conf.j2	Virtual host configuration file template

File structure playbook

Runtime View

This view explains the dynamic aspects of the playbook execution. In particular the creation of the dynamic inventory and how they work together with the roles are highlighted.

While running the playbook, the first role that gets executed is the servers role. This role doesn't execute any remote commands but runs locally to connect to the web service of the hosting provider. It either creates and boots the servers if they don't exist yet or just fetches their information. The returned IP addresses are added to the dynamic inventory so following commands can be executed on the servers.

Subsequently all other roles are executed. This includes setup of software and applications. The commands are sent to the remote servers by Ansible.

Deployment View

In the bigger context the playbook is not executed manually on the developer machine but automated on a continuous integration server.

Source code is stored in Git repositories. The playbook and each application have all their own private code repository. The continuous integration server has a build pipeline for each code repository. For each pipeline a unique private SSH key is used. The public keys for those private keys can be exported and are deployed to the code repository provider to guarantee access to the source code. They are also copied to the virtual servers to allow Ansible to connect to the servers via SSH.

Each application pipeline uses the playbook to provision the infrastructure and then deploys the application to the targeted environment. The servers required for the production and staging environment are supplied by the hosting provider.

Design Decisions

Playbook organization

A fundamental decision of the project is how to organize and structure the playbooks. Each application has its own code repository and allows actions to be executed after a push.

Two variations are possible: One code repository with a playbook for the infrastructure and deployment, or a playbook in each application code repository. The first approach has been chosen to easily allow sharing of variables from the infrastructure setup with the application deployment.

Usage of inventories

Ansible is built to also manage legacy infrastructures with existing servers, so the main purpose of inventories is to provide a list of servers grouped by their function. But Ansible also has built-in server provisioners, which allows to take the approach of an idempotent infrastructure one step further and also create the virtual servers with Ansible as part of the playbook.

Usually inventory files in Ansible playbooks contain a list of IP addresses or server names which belong to the infrastructure, but with the proposed concept the only server listed in the inventory file is localhost. This is because the servers get added to the inventory list in memory while running the playbook. If the servers don't exist yet, they get created; if they already exist, their IP addresses are added to the inventory.

This approach has the advantage that temporary environments (like staging) can easily be created and existing IP addresses of existing servers don't need to be manually maintained inside an inventory file.

Ansible Tower

The company behind Ansible distributes also commercial software called Ansible Tower for running playbooks with a user interface. The software provides a web interface with permission management, an audit trail and visualizations.

Ansible Tower was not used for this project due to the high price (starting at $5000 per year) and because it wouldn't bring many additional benefits. As all four developers responsible for the system are used to working with Git there's no need for a graphical user interface. It would also contradict the principle of having all changes driven by code and it's advantage of being able to easily trace a change.

Technical Risks

The following paragraphs outlines potential technical risks of the architecture concept. The factors convenience and security are often conflicting and need to be weighted in each case.

One disadvantage of automating the server setup is that the executor needs full administration access to the servers. In the proposed concept the continuous integration server has superuser access to all servers. This leads to two attack scenarios: An attacker could compromise the continuous integration server to gain access to the servers. Or the attacker could compromise the code repository and infiltrate the server through malicious commands. Both GitHub and Codeship take appropriate actions to prevent these kind of attacks.

As the staging environment is always built by syncing databases from production the risk of data leakage increases as staging environments are often regarded as non-critical. By treating the staging environment as confidentially as the production environment (e.g. encrypting staging passwords), incidents can be prevented.

Testing

The testing is divided into multiple stages which all aim to find bugs in the Ansible playbook. Testing of the self-developed applications has to be done by each application individually. All automated tests are executed on the continuous integration server.

After each code change a syntax check of the Ansible playbook is performed. The playbook is then executed to create a temporary staging environment. To verify the integrity of the staging environment system tests are performed as a last step. Once the staging environment is ready manual tests can be executed depending on the change made. Testing during development can be done locally with Vagrant.

Syntax check

Ansible provides two flags when running playbooks which are particularly useful for testing. --syntax-check validates the YAML syntax of the playbook and can be used to discover problems early.

The flag --check runs the playbook in a special mode where a dry run is performed. In this mode no changes are made on remote systems, but all tasks are checked if they can be executed. Thanks to this it can easily be checked if the playbook would run or fail (e.g. because of some missing variables). Unfortunately this is not possible in all cases as some modules require some dependencies to be installed (e.g. rabbitmq_plugins).

Because of the restrictions of the check mode only the syntax check is performed as an automated test.

Staging environment

New functionality is developed in feature branches in the application repositories. To test new functionality which requires changes in multiple applications they all need to be deployed to the same staging environment. For this a branch naming convention is defined as following: Each time a branch starting with staging- is created, a new staging environment is automatically created.

The first time an application uses the branch name, it boots the required servers and installs all applications from their master branches. Only the application using the branch name is installed from the branch.

If another application creates a branch with the same name it will get deployed to the same staging environment. It makes sure the servers in the environment are in the desired state and as applications are only updated during the initial provisioning, the first application won't get overridden with the master branch.

After testing is completed and the branch has been merged into the master branch the staging environment can be removed with a separate Ansible script.

System tests

Simple system tests can be done with the uri module of Ansible. The module allows to send HTTP requests and store the response in a variable. The response can then be checked for certain content.

- name: check Auth API
  action: uri url="https://api.worldskills.org/auth/ping" return_content=yes
  register: uri_response

- name: verify Auth API
  fail: msg="Auth API not reachable"
  when: "'pong' not in uri_response.content"

Example system test

Testing the web interface of the applications also verifies that all underlying software like the Apache web server and PHP are running properly. For each self-developed application at least one system should be created to make sure they are running as expected.

Migration

To migrate the existing infrastructure to the new automated infrastructure multiple steps need to be taken. Not only do all applications need to be installed and configured on the new infrastructure but existing data needs to be transferred and DNS entries need to be updated.

The following steps should be executed in the order outlined. Some have to be done manually, others can be automated with Ansible.

Reduce DNS time to live (TTL), manual
Create new infrastructure, automated
Stop existing infrastructure, manual
Synchronize database to new infrastructure, automated
Synchronize user files to new infrastructure, automated
Switch DNS entries, manual
Increase DNS time to live (TTL), manual

The first steps are reducing the time DNS entries are cached so the switch to the new server gets propagated faster. The new infrastructure is then created with the main Ansible playbook. Once the new infrastructure is ready the old one can be switched off so no changes can be made anymore by users. To synchronize the database and user files from the existing infrastructure a separate Ansible playbook should be written so it can be executed on its own. After everything has been transferred to the new infrastructure the DNS entries can be changed to point to the new infrastructure and the DNS TTL can be increased again.

Implementation

Proof-of-concept

For the proof-of-concept the following reduced infrastructure has been selected to test the feasibility of the concept focusing on the requirements. The proof-of-concept infrastructure includes a PHP application (R01), two Java applications (R02), one JavaScript application, multiple MySQL databases and the RabbitMQ message queue.

The proof-of-concept also includes the dynamic generation of staging environments based on branch naming (R04). The source code for all applications and the Ansible playbook is stored in a separate organization on GitHub. The domain name worldskills.ch is used for the proof-of-concept.

To prepare for horizontal scaling (R19) databases are installed on a separate server. Initial data fixtures are imported from the existing WorldSkills infrastructure (using an SSH connection).

All Must and Should requirements can be tested with the proof-of-concept infrastructure. The following list includes all requirements that are checked with the proof-of-concept infrastructure.

R01 PHP applications
R02 Java applications
R03 Server configuration
R04 Application deployment
R05 Staging environment
R11 Configuration files
R12 Change history
R13 Open Source
R14 Test environment
R15 Encrypted passwords
R16 Superuser access
R17 Custom software
R18 Learning curve

The following Could requirements are not explicitly tested with the proof-of-concept infrastructure because of their complexity.

R06 Hosting provider switch
R19 Horizontal scaling

Ansible Playbook

The playbook code repository is organized with roles. Each role is responsible for the installation and configuration of the applications. The main playbook delegates the roles for execution to the servers.

- hosts: database

  roles:
    - mysql
    - rabbitmq

- hosts: api

  roles:
    - tomcat
    - worldskills_api_auth
    - worldskills_api_events

- hosts: web

  roles:
    - apache
    - php
    - worldskills_concrete5
    - worldskills_auth
    - worldskills_events

Main playbook

System Packages

The PHP role is an example of software that gets installed with the package manager of the operating system. The template module of Ansible is used to create the required configuration files. The same approach is used for the installation of MySQL, RabbitMQ, Tomcat and the Apache web server.

- name: install PHP
  apt: name={{ item }} state=present # install software packages
  with_items:
    - php5
    - libapache2-mod-php5
    - php5-mysql
    - php5-curl
    - php5-gd
  notify: restart apache # restart Apache after PHP installation

- name: put configuration file in place
  template: src=worldskills.ini.j2 dest="{{ php_ext_path }}/worldskills.ini"
  notify: restart apache # restart Apache if configuration changes

Role php tasks

Virtual Hosts

Virtual hosts in the Apache web server are also created with the template module and by linking the configuration files to /etc/apache2/sites-available.

- name: add configuration files
  template: >
    src="vhost.conf.j2"
    dest="/etc/apache2/sites-available/{{ item.servername }}.conf"
  with_items: vhosts
  notify: restart apache

- name: enable vhosts
  file: >
    src="/etc/apache2/sites-available/{{ item.servername }}.conf"
    dest="/etc/apache2/sites-enabled/{{ item.servername }}.conf" state=link
  with_items: vhosts
  notify: restart apache

Virtual hosts tasks

PHP and JavaScript Applications

Self-developed applications are installed by checking out the source code and then installing all dependencies. To keep the existing application running the source code is installed in a new folder and a symlink pointing to the latest checkout is updated at the end of the installation process.

- name: clone project
  git: >
    repo="git@github.com:worldskills-infrastructure/worldskills-events.git"
    dest="{{ worldskills_events_path }}/repo"
    version="{{ worldskills_events_branch }}"
    update="{{ worldskills_events_update }}"
  register: worldskills_events_clone

- name: set release path
  set_fact: >
    worldskills_events_release="{{ worldskills_events_path }}/releases/
    {{ worldskills_events_clone.after }}"

- name: export a copy of the repo
  command: >
    git checkout-index -a --prefix="{{ worldskills_events_release }}/"
    chdir="{{ worldskills_events_path }}/repo"
    creates="{{ worldskills_events_release }}"

- name: install npm dependencies
  npm: path="{{ worldskills_events_release }}"

- name: install bower dependencies
  command: >
    bower install chdir="{{ worldskills_events_release }}"
    creates="{{ worldskills_events_release }}/app/bower_components"

- name: create config
  template: >
    src=config.js.j2 dest="{{ worldskills_events_release }}
    /app/scripts/config.js"
  notify: build worldskills_events

- name: link release to current
  file: >
    state=link path="{{ worldskills_events_path }}/current"
    src="{{ worldskills_events_release }}"

Role Events application tasks

Java Applications

Java applications are deployed using the Tomcat Maven Plugin. This plugin builds the application and uploads it to the Tomcat server. Again the old version of the application is kept running until the new application has been started and is ready. For Tomcat this is achieved with the built-in functionality for parallel deployment as described in the Tomcat Documentation.

The current build timestamp is used as a substitute for the version number of the application and configured in the Maven file pom.xml.

<plugin>
    <groupId>org.apache.tomcat.maven</groupId>
    <artifactId>tomcat7-maven-plugin</artifactId>
    <version>2.2</version>
    <configuration>
        <port>9090</port>
        <contextFile>src/test/resources/context.xml</contextFile>
        <url>http://api.worldskills.ch/manager/text</url>
        <path>/events##${maven.build.timestamp}</path>
        <server>tomcat</server>
        <update>true</update>
        <mode>war</mode>
    </configuration>
</plugin>

Tomcat Maven Plugin configuration

Ansible Vault

As environment specific variables contain sensitive data like database passwords, they are encrypted with Ansible Vault. The password for the Vault is stored in the file vault_password.txt which is not inside the code repository but shared manually between developers. To prevent the password file from accidentaly being added to the code repository it is explicitly excluded in the file .gitignore.

The encrypted file in the inventory can be edited with the following command which uses the password file and opens the configured text editor with the content of the file.

$ ansible-vault \
    edit \
    --vault-password-file=vault_password.txt \
    inventories/prod/group_vars/all

Ansible Vault edit command

Data Synchronization

As the CMS of the website heavily depends on content in the database to work properly, the database must be synced from the current infrastructure to the proof-of-concept infrastructure. Additionally this synchronization must be done when creating a staging environment.

Because Ansible is using SSH the server which the new database is running on can get an export of the database directly from the production database server.

- name: get worldskills_concrete5 dump from prod
  command: >
    rsync -ze "ssh -o StrictHostKeyChecking=no"
    root@beuk.worldskills.org:/path/to/database/mysqldump.sql.gz
    /usr/local/worldskills-playbook/worldskills_concrete5.sql.gz
    creates=/usr/local/worldskills-playbook/worldskills_concrete5.sql.gz

- name: import worldskills_concrete5 database
  mysql_db: >
    name="{{ worldskills_api_auth_database_name }}"
    state=import
    target=/usr/local/worldskills-playbook/worldskills_concrete5.sql.gz

Data synchronization tasks

Vagrant Setup

For local development the playbook can be executed in Vagrant. Ansible is configured as provisioner in Vagrant and points to the main playbook. As the Ubuntu image used by Vagrant has the root user disabled by default, the playbook is executed with the user vagrant and with superuser permissions (sudo). Environment specific variables like URLs are overridden with the extra_vars parameter. To simplify things all server groups are mapped to the default server, so only one virtual machine is needed to run the infrastructure locally.

config.vm.provision "ansible", run: "always" do |ansible|
  ansible.playbook = "site.yml" # WorldSkills playbook
  ansible.raw_arguments = ['--diff'] # show file changes
  ansible.sudo = true # force sudo
  ansible.extra_vars = {
    ansible_ssh_user: "vagrant",
    server_prefix: "",
    worldskills_auth_url: "http://vagrant-auth.worldskills.ch/",
    worldskills_events_url: "http://vagrant-events.worldskills.ch/",
  }
  ansible.groups = {
    "servers" => ["default"],
    "web" => ["default"],
    "api" => ["default"],
    "database" => ["default"],
  }
end

Vagrant provisioner configuration

GitHub Setup

An independent GitHub organization is used to keep the proof-of-concept separated from existing WorldSkills projects. All projects which are part of the proof-of-concept have been replicated as private repositories in this organization. Most notably is the new code repository worldskills-playbook which contains the Ansible playbook.

Every commit is also displayed in Slack, the chat application used by WorldSkills. This guarantees that all developers are aware of the change.

Codeship Setup

Codeship knows setup, test and deployment commands. Setup and test commands are always executed, deployment commands can be different for each branch. Deployment commands are only executed if the test commands succeeded. If there's a syntax error in the Ansible playbook, the deployment commands are not executed.

The setup commands on Codeship install Ansible and other required software. The Vault password is stored as an environment variable on Codeship. During setup the variable is written to a Vault password file. This approach prevents it from being displayed in the execution logs.

pip install ansible dopy httplib2
echo $VAULT_PASSWORD > vault_password.txt

Codeship setup commands

The test command then runs the syntax check of the main Ansible playbook file locally. The comma at the end of localhost, tells Ansible to use the passed string as inventory instead of trying to load an inventory file.

ansible-playbook -i localhost, site.yml --syntax-check

Codeship test commands

With the deployment commands, the master branch gets deployed to the production environment. The production inventory is loaded from the file inventories/prod and the Vault password file created with the setup commands is used for encrypting the production variables.

ansible-playbook \
    -i inventories/prod \
    --vault-password-file=vault_password.txt \
    site.yml

Codeship deployment commands

All branches starting with staging- are deployed to an environment with the branch name. Codeship provides the current branch name as environment variable $CI_BRANCH. The variable digital_ocean_droplet_prefix gets overridden from the command line with the option -e (additional variables).

ansible-playbook \
    -i inventories/staging \
    --vault-password-file=vault_password.txt \
    -e "digital_ocean_droplet_prefix=$CI_BRANCH" \
    site.yml

Codeship deployment commands for staging environment

All commands are logged and reported in the Codeship web interface.

With a webhook the status of every build is also displayed in Slack.

The danger of a race condition with multiple playbooks running at the same time is avoided by the Codeship plan which allows only one concurrent build.

DigitalOcean Setup

The servers for the infrastructure are created by the servers role with Ansible on DigitalOcean. The servers are identified by their name, their IP address is stored for usage in other roles (e.g. digital_ocean_ip_web).

- name: create servers
  digital_ocean:
    command: droplet
    state: present
    name: "{{ item.key }}" # server name
    size_id: 63 # 1 GB memory
    image_id: 11836690 # Ubuntu 14.04
    unique_name: yes # identify server by name
    client_id: "{{ digital_ocean_client_id }}"
    api_key: "{{ digital_ocean_api_key }}"
  register: droplets_response # store DigitalOcean response
  with_dict: # loop over list of servers
    web: {group: web}
    api: {group: api}
    database: {group: database}

- name: store server ips # store IP addresses in variables
  set_fact: "digital_ocean_ip_{{ item.item.key }}={{ item.droplet.ip_address }}"
  with_items: droplets_response.results # loop over all servers

Role servers

After running the servers role the servers are visible in the DigitalOcean web interface.

Challenges

During the implementation the following problems occurred and had to be fixed. Outlined are descriptions of the problems, the steps taken, and the final solutions.

Hierarchy of group_vars

Originally it was planned to use group_vars on the playbook level to define default values for various variables and override the values with environment specific values. However in Ansible group_vars on the playbook level precede inventory variables. This hierarchy was reversed in Ansible 1.7 due to a bug and is still controversial.

The problem has been solved by moving the default values to the individual roles. This has the disadvantage that default values can only be shared between roles that run on the same server. Hopefully Ansible will introduce default values on the playbook level in the future.

Problem SSH connection closed

A problem that occurred during testing with DigitalOcean was that sometimes the servers couldn't be created properly. When trying to connect to the just created server via SSH the connection always failed.

ssh root@46.101.135.160
Connection closed by 46.101.135.160

SSH error message closed connection

After resetting the root password of the server via the DigitalOcean web interface and connecting via their remote control interface it turned out the server was missing the SSH keys required to successfully connect to it. Contacting the support of DigitalOcean didn't bring any new information, they assured that they are not aware of any software problem. As this problem only occurs sometimes it is most likely related to a race condition when deleting a server and creating a new one with the same name. Waiting at least 5 minutes before trying to create the same servers again solved the problem.

Problem SSH connection timeout

Another similar SSH problem was that sometimes the SSH connection was reset while the Ansible playbook was running. A first part of the playbook was successfully executed on the database servers but the next part which was to be executed on the API servers failed with the following message.

ssh: connect to host 46.101.255.100 port 22: Connection timed out
Couldn't read packet: Connection reset by peer

SSH error message timed out connection

Checking the SSH timeout variables (ClientAliveInterval, ClientAliveCountMax, ServerAliveInterval) showed that the both the SSH server and the client are configured correctly. Manual testing of the SSH connection from Codeship to DigitalOcean by keeping the connection idle for some time also worked as expected. The problem could not be reproduced and happened only twice. Most likely a temporary network problem between Codeship and DigitalOcean.

Problem MySQL socket

After running the playbook repeatedly, PHP was suddenly unable to connect to the MySQL database. Investigating the problem it turned out that the MySQL socket had disappeared. After debugging the issue the problem could be tracked down to a problem in Ubuntu when the command /etc/init.d/mysql restart was executed. Due to a recent change in Ubuntu to upstart scripts this is actually a legacy command and service mysql restart should be executed instead.

The main issue is that Ansible isn't using the correct restart command, so instead of relying on the Ansible service module, the correct command is now executed with the shell module directly. A fix for this bug will be available in Ansible 1.9.2. After the release the service module can be used again.

Missed handlers when task failed

If a task in an Ansible playbook fails, the execution of the whole playbook stops. Queued handlers are lost if this happens and in the next run the handlers might not get notified again. Because of this a required restart easily gets forgotten and can cause unexpected server states (e.g. configuration file has been updated but a restart is required for loading the updated configuration). Luckily Ansible can be forced to always run handlers, even if a task fails with the setting force_handlers: True. This solved the problem of missed application restarts.

Test Results

The whole proof-of-concept infrastructure can be created in 18 minutes. The output of the execution can be found below.

PLAY [provision] **************************************************************
TASK: [servers | create droplet]
TASK: [servers | store ips]
TASK: [servers | store private ips]
TASK: [servers | create web domain]
TASK: [servers | create events domain]
TASK: [servers | create api domain]
TASK: [servers | add hosts to the inventory group]
PLAY [digital_ocean]
TASK: [Wait for port 22 to become available.]

PLAY [servers] ****************************************************************
GATHERING FACTS
TASK: [common | remove Landscape packages (cause unnecessary delay)]
TASK: [common | update apt cache once a day]
TASK: [common | install development tools]
TASK: [common | add public keys]
TASK: [common | add GitHub to known hosts]
TASK: [common | create working directory]

PLAY [database] ***************************************************************
GATHERING FACTS
TASK: [mysql | install]
TASK: [mysql | install Python interface]
TASK: [mysql | install Java interface]
TASK: [mysql | create configuration file]
TASK: [mysql | create worldskills_concrete5 database]
TASK: [mysql | create worldskills_concrete5 user]
TASK: [mysql | get worldskills_concrete5 dump from prod]
TASK: [mysql | copy worldskills_concrete5 normalize script]
TASK: [mysql | create worldskills_auth database]
TASK: [mysql | create worldskills_auth user]
TASK: [mysql | get worldskills_auth dump from prod]
TASK: [mysql | copy worldskills_auth normalize script]
TASK: [mysql | create worldskills_events database]
TASK: [mysql | create worldskills_events user]
TASK: [mysql | get worldskills_events dump from prod]
TASK: [mysql | copy worldskills_events normalize script]
TASK: [rabbitmq | import repository key]
TASK: [rabbitmq | add apt repository]
TASK: [rabbitmq | install]
TASK: [rabbitmq | enable plugins]
TASK: [rabbitmq | run service]
TASK: [rabbitmq | add users]
TASK: [rabbitmq | remove default guest user]
NOTIFIED: [mysql | restart mysql]
NOTIFIED: [mysql | import worldskills_concrete5 database]
NOTIFIED: [mysql | run worldskills_concrete5 normalize script]
NOTIFIED: [mysql | import worldskills_auth database]
NOTIFIED: [mysql | run worldskills_auth normalize script]
NOTIFIED: [mysql | import worldskills_events database]
NOTIFIED: [mysql | run worldskills_events normalize script]
NOTIFIED: [rabbitmq | restart rabbitmq]

PLAY [api] ********************************************************************
GATHERING FACTS
TASK: [tomcat | install Java]
TASK: [tomcat | install]
TASK: [tomcat | create configuration file]
TASK: [tomcat | create logging configuration]
TASK: [tomcat | run service]
TASK: [tomcat | get wsideps from prod]
TASK: [tomcat | add wsideps to classpath]
TASK: [worldskills_api_auth | get WAR from prod]
TASK: [worldskills_api_auth | deploy]
TASK: [worldskills_api_events | get WAR from prod]
TASK: [worldskills_api_events | deploy]
NOTIFIED: [tomcat | restart tomcat]

PLAY [web] ********************************************************************
GATHERING FACTS
TASK: [apache | install]
TASK: [apache | run service]
TASK: [apache | enable mod_rewrite]
TASK: [apache | add vhosts configuration]
TASK: [apache | enable vhosts]
TASK: [php | install]
TASK: [php | ensure configuration directories exist]
TASK: [php | put WorldSkills configuration file in place]
TASK: [php | download Composer]
TASK: [php | move Composer to global location]
TASK: [npm | install]
TASK: [npm | install packages]
TASK: [worldskills_concrete5 | clone project]
TASK: [worldskills_concrete5 | create shared directory]
TASK: [worldskills_concrete5 | get files directory from prod]
TASK: [worldskills_concrete5 | set release path]
TASK: [worldskills_concrete5 | export a copy of the repo]
TASK: [worldskills_concrete5 | check for existing files directory]
TASK: [worldskills_concrete5 | remove existing files directory]
TASK: [worldskills_concrete5 | link files directory]
TASK: [worldskills_concrete5 | install Composer dependencies]
TASK: [worldskills_concrete5 | create config]
TASK: [worldskills_concrete5 | create properties file]
TASK: [worldskills_concrete5 | link release to current]
TASK: [worldskills_auth | clone project]
TASK: [worldskills_auth | create shared directory]
TASK: [worldskills_auth | create logs directory]
TASK: [worldskills_auth | create log file]
TASK: [worldskills_auth | set release path]
TASK: [worldskills_auth | export a copy of the repo]
TASK: [worldskills_auth | check for existing logs directory]
TASK: [worldskills_auth | remove existing logs directory]
TASK: [worldskills_auth | link logs directory]
TASK: [worldskills_auth | install Composer dependencies]
TASK: [worldskills_auth | create config]
TASK: [worldskills_auth | link release to current]
TASK: [worldskills_events | clone project]
TASK: [worldskills_events | set release path]
TASK: [worldskills_events | export a copy of the repo]
TASK: [worldskills_events | install npm dependencies]
TASK: [worldskills_events | install bower dependencies]
TASK: [worldskills_events | create config]
TASK: [worldskills_events | link release to current]
NOTIFIED: [apache | restart apache]
NOTIFIED: [worldskills_events | build worldskills_events]

PLAY RECAP ********************************************************************
127.0.0.1                  : ok=7    changed=3    unreachable=0    failed=0
46.101.255.100             : ok=21   changed=15   unreachable=0    failed=0
46.101.255.101             : ok=54   changed=42   unreachable=0    failed=0
46.101.255.110             : ok=40   changed=34   unreachable=0    failed=0

Output playbook

After the playbook has been executed the WorldSkills website can be accessed on the proof-of-concept infrastructure at www.worldskills.ch.

As the whole playbook is idempotent, it can be executed repeatedly without making any changes. Executing it without making any modifications takes 3 minutes.

All system tests are executed after the main Ansible playbook. They check content on various URLs to make sure the applications are running as expected.

PLAY [all] ********************************************************************
TASK: [check Events API]
TASK: [verify Events API]
TASK: [check Auth API]
TASK: [verify Auth API]
TASK: [check concrete5]
TASK: [verify concrete5]
TASK: [check Auth application]
TASK: [verify Auth application]
TASK: [check Events application]
TASK: [verify Events application]

PLAY RECAP ********************************************************************
localhost                  : ok=5    changed=0    unreachable=0    failed=0

Output system tests

Verification

The requirements outlined for the proof-of-concept are verified manually against the running proof-of-concept infrastructure.

Requirement	R01 PHP applications
Acceptance Criteria	Each PHP application is running and can be accessed with a web browser.
Verification	The website is running at http://www.worldskills.ch/ and displays information from the Events web service. Clicking on Login redirects to http://auth.worldskills.ch/ where the Auth application is running as expected.
Verdict	Pass

Requirement	R02 Java applications
Acceptance Criteria	Each Java application is running and can be accessed over HTTP/S.
Verification	The Auth and the Events web service are running. Events are listed when accessing https://api.worldskills.ch/events and organizational groups are listed on https://api.worldskills.ch/auth/ws_entities/1.
Verdict	Pass

Requirement	R03 Server configuration
Acceptance Criteria	A configuration file gets modified and the change is pushed to the code repository. The new configuration file is automatically deployed to the server and the affected applications loads the new configuration.
Verification	After increasing the value of `query_cache_size` in the file `roles/mysql/templates/my.cnf.j2` and pushing the change to GitHub, the playbook is executed and the change is deployed to the server. The MySQL server automatically gets restarted because of the configuration file change.
Verdict	Pass

Requirement R04 Application deployment

Acceptance Criteria A new version of an application gets pushed to the code repository. Automated tests of the application are executed and if they pass the application gets deployed to the server. The old version keeps responding to requests until the new version is ready.

Verification

The favicon.ico of the main website is changed and pushed to GitHub. The unit tests are executed on Codeship and the Ansible playbook is checked out. With a separate playbook the changed file gets deployed to the server.

The new Favicon is only visible after the new version of the website is completely ready.

Verdict Pass

Requirement

R05 Staging environment

Acceptance Criteria

A new version of a web service is pushed in a separate branch, the functionality is available for testing in a staging environment within 30 minutes from the push.

Verification

A new branch called staging-timetable is created for the Events web service and pushed to GitHub. The unit tests are executed on Codeship and the Ansible playbook is checked out. A new staging environment is created on DigitalOcean and all applications are installed.

The new staging environment can be used at http://staging-timetable-www.worldskills.ch/ after 19 minutes.

Verdict

Pass

Requirement	R11 Configuration files
Acceptance Criteria	All configuration files are stored in a code repository.
Verification	Configuration files are part of the Ansible playbook and stored in the code repository.
Verdict	Pass

Requirement	R12 Change history
Acceptance Criteria	Every change must be traceable by a developer. Associated with every change is an explanation.
Verification	Every change gets commited to the code repository. An explanation is included in the commit comment.
Verdict	Pass

Requirement	R13 Open Source
Acceptance Criteria	All software used for infrastructure has to be built on open-source software. This guarantees that components can easily be ported to different providers or maintainers. It also allows other Members to easily copy parts of the infrastructure.
Verification	The playbook can be executed with Ansible which is Open Source. The required Python dependencies dopy and httplib2 are Open Source too.
Verdict	Pass

Requirement	R14 Test environment
Acceptance Criteria	To test configuration changes to it, the whole or part of the infrastructure can be started in a test environment. This is different from the staging environment in that the test environment can be local and automated tests are executed against it.
Verification	The playbook can be executed locally with Vagrant. Automated tests can be executed locally too.
Verdict	Pass

Requirement	R15 Encrypted passwords
Acceptance Criteria	Server passwords should be stored only encrypted on third-party systems. The advantage of storing encrypted passwords in the code repository and sharing a key file instead of sharing the passwords in a file, is that the file doesn't need to be updated for everyone each time a password is added.
Verification	Sensitive data is stored inside the Ansible Vault file.
Verdict	Pass

Requirement	R16 Superuser access
Acceptance Criteria	In case of a problem that only occurs in a certain environment, a developer needs unrestricted access to the server to debug the error and try out different solutions.
Verification	The servers on DigitalOcean can be accessed directly with SSH.
Verdict	Pass

Requirement	R17 Custom software
Acceptance Criteria	New software can be installed without restrictions. New features or analytics tools might require the installation of additional software.
Verification	Any software for Linux can be installed on the servers. As an example, rsync was installed on the servers to get data files for the website from the existing infrastructure.
Verdict	Pass

Requirement	R18 Learning curve
Acceptance Criteria	How to use the software system to install and configure the infrastructure can be learned quickly so all developers can make changes to the infrastructure without spending weeks studying it.
Verification	Ansible can be understood within one day. The development of the Ansible playbook took about about 2 weeks.
Verdict	Pass

The proof-of-concept passes all 13 requirements that were defined for it. The proof-of-concept covers all Must and Should requirements of the infrastructure.

Conclusions

Ansible and Idempotence

The project has shown that idempotence on a computer system is hard. This not only applies to Ansible but to all server management software. There are always multiple sources for changes on a modern computer system and almost infinite states exist. Reducing the configuration of the whole system to one code repository (the Ansible playbook in this case) helps by reducing dependencies.

The promise of idempotence is also not true in every case with Ansible as some attributes require additional manual actions after a change. Changing the physical server location inside the digital_ocean module for example won't move the server from one location to another - the attribute is only used during creation of the server. Another example is the copy module which makes sure that a file exists in the specified location. The module checks this with every execution, however when the file path is changed (e.g. because it had a typo in it) it only makes sure the file exists in the new location. To delete the old file a separate task has to be written in Ansible. So the developers always have to be aware of what the module exactly does and they need to take care of cleanup operations by themselves.

Given that the whole infrastructure can be duplicated automatically one could think of using this for automated testing of every change to the Ansible playbook. However as the creation of such a testing environment takes around 20 minutes, every little change to the infrastructure would also take at least that long - so this option is not considered useful. Instead the choice of creating a staging environment is left up to developer if a certain change needs to be tested in more detail.

In summary the usage of Ansible makes the infrastructure more structured and changes more visible while keeping it flexible for future developments (like the integration of Docker).

Staging Environment

The time required to create the staging environment is considered a potential problem, especially as it gets worse with every additional application. The most time is used by the build of JavaScript and PHP applications. Storing compiled versions of these applications on a central server would improve the build time of a new staging environment but requires more development outside the scope of this project.

Another option of speeding up the creation of the staging environment would be to clone the server completely (on system level) instead of recreating it every time with Ansible. However this only works within the same hosting provider (not even with all hosting providers) and one of the targets of this projects was to be independent from the hosting provider. By recreating the servers each time the developers can be sure that the playbook is ready to be used with a different hosting provider.

The staging environment also suits well for more end-user tests and automated tests with tools like Selenium.

Software Containers

The Evaluation showed that software containers have potential in abstracting the deployment by bundling applications into their own unit and separating the infrastructure and applications. However adopting this technology requires some changes to the applications and the deployment needs to be orchestrated with another software increasing the complexity of the whole system in total.

Applications can more easily be moved between servers and are less error prone to configuration mistakes. The advantages need to be carefully weighed in each case and testing new approaches before implementing them in production is recommended.

Objectives

The objectives of creating a versioned, testable and reproducible infrastructure where all changes are visible to the whole IT team has been fully achieved. The combination of multiple technologies and services allows quick modifications and the results are visible for everyone.

Comparison before and after infrastructure concept

Comparing the deployment pipeline before and after the infrastructure concept shows that both infrastructure changes and application deployments follow now the same steps. Manual actions on the server and custom application deployment scripts have been replaced with Ansible.

Handling changes to the infrastructure the same way as changes to application helps the developers in their daily work as the processes and feedback loops are aligned. They get notified about modifications to the infrastructure in the chat application Slack, can review the diff on GitHub and give feedback there. The testing and deployment status is displayed in Slack as well and correction of failed changes can be organized right there with the developers online.

Recommendations

It is recommended to keep an eye on the development of software containers and re-evaluate their usage in six months. Suitable applications should then be migrated within three months if the software containers would bring reasonable advantages.

Using Ansible for provisioning and deployment of the infrastructure makes sense. It is recommended to extend the proof-of-concept playbook with all applications running on the current infrastructure. Once the setup of the new infrastructure is completed a migration to it can be done with Ansible as outlined before. The targeted time for this migration is fall 2015.

As the setup of a staging environment takes some time it's questionable how useful it is for small features. Investing into unit tests that can be executed locally is recommended here. Being able to replicate the complete infrastructure is useful for bigger features which join multiple services. It is recommended to implement the staging environment as described in the architecture documentation but use it only if needed and further invest in automated system tests.

Appendix

Bibliography

Agile Orbit (2015): java Cookbook. https://supermarket.chef.io/cookbooks/java (retrieved 21. April 2015)

Ansible, Inc. (2015): Ansible Documentation. April 2015. http://docs.ansible.com/ (retrieved 20. April 2015)

Ansible, Inc. (2015): Ansible Tower Pricing. http://www.ansible.com/pricing (retrieved 13. July 2015)

Ansible, Inc. (2015): Ansible Tower. http://www.ansible.com/tower (retrieved 18. June 2015)

Ansible, Inc. (2015): Ansible. http://www.ansible.com/ (retrieved 20. April 2015)

Apache Software Foundation (2013): Apache Tomcat Maven Plugin. November 2013. Version 2.2. https://tomcat.apache.org/maven-plugin-2.2/ (retrieved 18. July 2015)

Apache Software Foundation (2015): Apache HTTP Server Project. https://httpd.apache.org/ (retrieved 19. July 2015)

Apache Software Foundation (2015): Apache Tomcat 7 Configuration Reference - The Context Container. June 2015. Version 7.0.63. https://tomcat.apache.org/tomcat-7.0-doc/config/context.html (retrieved 18. July 2015)

Ben-Kiki O., Evans C., Döt Net I. (2009): YAML Ain’t Markup Language. http://www.yaml.org/spec/1.2/spec.html (retrieved 31. May 2015)

CFEngine AS (2015): CFEngine. http://cfengine.com/ (retrieved 13. July 2015)

Chef Software, Inc. (2015): Chef. https://www.chef.io/ (retrieved 20. April 2015)

Chef Software, Inc. (2015): Supermarket. https://supermarket.chef.io/ (retrieved 21. April 2015)

Chesne A. (2014): small-n-flat. http://paomedia.github.io/small-n-flat/ (retrieved 3. November 2015)

Codeship (2015): Codeship. https://codeship.com/ (retrieved 19. July 2015)

Codeship (2015): Security Guidlines of Codeship. https://codeship.com/security (retrieved 4. June 2015)

CoreOS, Inc. (2015): rkt Documentation. https://coreos.com/rkt/docs/0.5.6 (retrieved 13. July 2015)

DeHaan M. (2014): Ansible GitHub. https://github.com/ansible/ansible (retrieved 26. July 2015)

DeHaan M. (2014): Ansible GitHub issue #9877. https://github.com/ansible/ansible/issues/9877 (retrieved 24. June 2015)

DeHaan M. (2014): Ansible GitHub issue #999. https://github.com/ansible/ansible/issues/999 (retrieved 24. June 2015)

DigitalOcean, Inc. (2015): Digital Ocean. https://www.digitalocean.com/ (retrieved 26. July 2015)

Docker, Inc. (2015): Docker. https://www.docker.com/ (retrieved 20. April 2015)

Docker, Inc. (2015): Docker Documentation. Version v1.5. https://docs.docker.com/v1.5/ (retrieved 7. April 2015)

Docker, Inc. (2015): Docker Hub. https://hub.docker.com/ (retrieved 13. July 2015)

Free Software Foundation, Inc. (2007): GNU General Public License. June 2007. Version 3. https://www.gnu.org/licenses/gpl.html (retrieved 13. July 2015)

Geerling J. (2015): Ansible for DevOps. May 2015. ISBN 978-0-9863934-0-2

GitHub, Inc. (2015): GitHub. https://github.com/ (retrieved 19. July 2015)

GitHub, Inc. (2015): GitHub Security. https://help.github.com/articles/github-security/ (retrieved 4. June 2015)

Gravi T. (2015): Docker Official Image packaging for PHP. https://github.com/docker-library/php (retrieved 7. April 2015)

Gregorio J. (2015): httplib2. https://pypi.python.org/pypi/httplib2 (retrieved 19. July 2015)

HashiCorp (2015): Vagrant. https://www.vagrantup.com/ (retrieved 13. July 2015)

Hochstein L. (2015): Ansible: Up and Running. March 2015. Early release revision 4. ISBN 063-6-920-03562-6

International Organization for Standardization (2011): ISO/IEC 25010:2011. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733 (retrieved 13. July 2015)

Jaynes M. (2014): Taste Test: Puppet, Chef, SaltStack, Ansible. June 2014. https://devopsu.com/books/taste-test-puppet-chef-salt-stack-ansible.html (retrieved 7. April 2015)

Parallels IP Holdings GmbH (2015): Plesk. http://parallels.com/en/products/plesk (retrieved 7. April 2015)

Puppet Labs (2015): Open Source Puppet. https://puppetlabs.com/puppet/puppet-open-source (retrieved 13. July 2015)

PyPA (2014): pip. https://pip.pypa.io/en/stable/ (retrieved 13. July 2015)

(R)?ex - A simple framework to simplify system administration and datacenter automation. http://www.rexify.org/ (retrieved 13. July 2015)

SaltStack (2015): SaltStack automation for CloudOps, ITOps & DevOps at scale. http://saltstack.com/ (retrieved 13. July 2015)

SPI (2015): Apt, Advanced Package Tool. https://wiki.debian.org/Apt (retrieved 13. July 2015)

Starke G., Hruschka P. (2014): arc42 Template. March 2012. Version 6.0. http://www.arc42.de/template/ (retrieved 13. July 2015)

Stephenson S. (2013): rbenv. July 2014. https://github.com/sstephenson/rbenv (retrieved 26. July 2015)

Selenium Project (2015): Selenium. http://docs.seleniumhq.org/ (retrieved 19. July 2015)

Slack Technologies, Inc. (2015): Slack. https://slack.com/ (retrieved 19. July 2015)

The Apache Software Foundation (2004): Apache License. January 2004. Version 2.0. https://www.apache.org/licenses/LICENSE-2.0.html (retrieved 13. July 2015)

The PHP Group (2015): PEAR. http://pear.php.net/ (retrieved 13. July 2015)

The PHP Group (2015): PHP Supported Versions. https://php.net/supported-versions.php (retrieved 7. April 2015)

Turnbull J. (2015): The Docker Book. February 2015. Version v1.5.0. ISBN 978-0-9888202-3-4

Van Zoest S. (2015): apache2 Cookbook. https://supermarket.chef.io/cookbooks/apache2 (retrieved 21. April 2015)

Vasiliev A. (2014): Cooking Infrastructure by Chef. http://chef.leopard.in.ua/ (retrieved 18. April 2015)

Venezia P. (2013): Puppet vs. Chef vs. Ansible vs. Salt. http://www.infoworld.com/article/2609482/data-center/data-center-review-puppet-vs-chef-vs-ansible-vs-salt.html (retrieved 13. July 2015)

Viallet V. (2015): dopy. https://pypi.python.org/pypi/dopy (retrieved 19. July 2015)

YesLogic Pty. Ltd. (2015): Prince. http://www.princexml.com/ (retrieved 25. July 2015)

Yum Package Manager. http://yum.baseurl.org/ (retrieved 13. July 2015)

Zürcher Hochschule für Angewandte Wissenschaften (2014): Reglement Bachelorarbeit Studiengang Informatik der ZHAW am Standort Zürich. March 2015. Version 3.3. https://ebs.zhaw.ch/files/documents/informatik/Reglemente/Bachelor/Bachelorarbeit/a_Reglement-Bachelorarbeit_Studiengang-Informatik_V3.3.pdf (retrieved 25. July 2015)

List of Figures

Software

To make the results of this thesis reproducible, the following list shows the version of the software used on the development machine:

Mac OS X 10.10.2 Build 14C1514
Git 1.9.5
Boot2Docker 1.5.0 (Docker v1.5.0, Linux v3.18.5)
Docker Compose 1.1.0
VirtualBox 4.3.26 r98988
Vagrant 1.7.2
ChefDK 0.4.0
Chef 12.0.3
Berkshelf 3.2.3
Tower 2.2.0 Build 274
Chocolat 3.1.4
Prince 9.0 rev 5
RubyGems.org 2.4.4
Ansible 1.9.0.1
Python 2.7.9
pip 6.0.6
dopy 0.3.6
httplib2 0.9.1
Atom 0.211.0
Slack 1.0.5

The following versions of software were used on the server:

Ubuntu 14.04
Apache 2.4.7
PHP 5.5.9
MySQL 5.5.43
RabbitMQ 3.5.3
Tomcat 7.0.52.0
OpenJDK 7u51
npm 1.3.10
Grunt 0.4.5
Bower 1.4.1

Glossary

Branch: A method for making parallel code changes in a code repository.
CI: Continuous integration: Automated testing and deployment of code changes.
Cloud: Running dynamic computer software without detailed knowledge of the physical hardware.
CMS: Content management system: Software to edit websites.
Code repository: Used to store and manage source code in a version control system.
Competition: WorldSkills Competition: Biennial skills competition for youth.
CSS: Cascading Style Sheets: Language used to style websites
Database fixtures: Initial data for a database.
Debian: Variation of Linux maintained by a group of volunteers.
Deployment: Installing or updating a software on a server.
DNS: Domain Name System: Software used to resolve domain names.
Environment: Logical group of servers.
Favicon: Small icon used by web browsers to represent a website.
Git: A distributed version control system.
Horizontal scaling: Extending the capacity of a service by distributing the load over multiple servers.
HTML: Hypertext Markup Language: A markup language used on the world wide web.
HTTP: Hypertext Transfer Protocol: A protocol for transfering data of the world wide web.
HTTPS: HTTP Secure: Protocol for using HTTP in a secure way with an added encryption layer.
Idempotence: Operation that can be executed multiple times without changing the state after an initial change.
IP address: Numerical identifier of a computer within a network.
IT: Information technology: Management of data using computers.
Java: Programming language used for different purposes.
JavaScript: Programming language which can be executed in a web browser.
Kernel: Component of an operation system responsible for executing system commands.
Linux: Open Source operating system.
Mac OS X: An operating system by Apple Inc.
Maven: Dependency and build automation software for Java applications.
Member: Member organization of WorldSkills International.
mod_php: Apache web server module for running PHP.
MySQL: A relational database management system.
MySQL Query Cache: A cache of MySQL used to reduce the time required to parse the query.
OS: Operating system: Software running on a computer and providing the possibility to run other software.
OSS: Open source software: Software where the source code is licensed in a way that it can be viewed and changed.
PDF: Portable Document Format: A portable file format for documents.
PHP: A programming language used mostly on the world wide web.
PHP-FPM: PHP FastCGI Process Manager: An way to run PHP so a webserver can execute PHP applications.
Python: Dynamic programming language used for various purposes.
RabbitMQ: Open source message queue software.
Race condition: Unintended dependency of the ouput of a system on time.
Ruby: A dynamic programming language used on the web and on the command line.
Secretariat: People working for the organization WorldSkills International.
Servlet container: An application to run Java web applications.
SSH: Secure Shell: An encrypted protocol for executing commands on a remote server.
Staging: A dedicated environment to test software applications.
Superuser: Special user account on a system with unrestricted access.
SUSE Linux: Variation of Linux supported by Novell.
Symlink: Symbolic link: A file or directory pointing to another path.
Template file: File with placeholders that gets compiled into its final form by replacing those placeholders with variable values.
Tomcat: A web server to run Java applications.
Ubuntu: Variation of Linux maintained by Canonical Ltd.
URL: Uniform resource locator: Reference to a document in a network.
Vagrant: Software for running virtual machines locally for development.
Virtual machine: Emulation of a computer system used to run multiple machines on the same hardware.
Webhook: A URL that gets called by an online service after a specific action to let another service know about the outcome of the action.
Web service: Interface to provide programmatic access to data and functionality over the web.
YAML: YAML Ain't Markup Language: File format for writting structured data.

Code Driven Infrastructure and Deployment

Abstract

Contents

Introduction

Current Situation

Objectives

Tasks

Project Management

Requirements

Overview

Stakeholders

System Context

Applications

User Stories

Non-functional Requirements

Evaluation

Introduction

Software Containers

Requirements verification

Summary Software Containers

Configuration Repository

Requirements verification

Summary Configuration Repository

Remote Command Execution

Requirements verification

Summary Remote Command Execution

Conclusion

Software Containers in Combination

Recommendation

Concept

Introduction

Solution Strategy

Building Block View

Runtime View

Deployment View

Design Decisions

Playbook organization

Usage of inventories

Ansible Tower

Technical Risks

Testing

Syntax check

Staging environment

System tests

Migration

Implementation

Proof-of-concept

Ansible Playbook

System Packages

Virtual Hosts

PHP and JavaScript Applications

Java Applications

Ansible Vault

Data Synchronization

Vagrant Setup

GitHub Setup

Codeship Setup

DigitalOcean Setup

Challenges

Hierarchy of group_vars

Problem SSH connection closed

Problem SSH connection timeout

Problem MySQL socket

Missed handlers when task failed

Test Results

Verification

Conclusions

Ansible and Idempotence

Staging Environment

Software Containers

Objectives

Recommendations

Appendix

Bibliography

List of Figures

Software

Glossary