The DevOps world without YAML is difficult to imagine. In fact, YAML is a superset of JavaScript Object Notation (JSON). However, the focus of JSON is more on data serialization (e.g., to make data available to an API).
In contrast, YAML plays to its strengths when used as a configuration language because the format is more easily readable than JSON. Python programmers love YAML because, unlike JSON, it uses indentations instead of parentheses to define objects.
Basic YAML Syntax
Listing 1 shows a simple YAML document. The ---
string in the first line means a file can contain several such documents; it is then followed by typical key-value pairs, which are familiar if you have used JSON. The first key pair is a simple scalar with a string value, although numbers and booleans are also allowed. The list that follows is a collection of objects. In this case, only numeric values are used, each of which is indented with spaces.
Listing 1: YAML Objects
--- name: starwars collection year of publication: - 1977 - 1980 - 1983 movies: # Only movies from the original trilogy (OT) are listed here. ot: - Episode IV - A New Hope - Episode V - The Empire Strikes Back - "Episode VI - Return of the Jedi Knights."
You should avoid using tabs if possible because they can cause issues when processing the data. By the way, you do not have to write strings in parentheses, as shown in the final line of Listing 1. This collection of key-value pairs is a dictionary. Unlike JSON, you can also work with comments in YAML without problems. Comments are introduced at the beginning or end of a line with the hash mark (#
).
To process the data stored in this way with Python, you could use the PyYAML module, which converts YAML objects into Python dictionary (dict
) objects, which you can then process further according to your own requirements. Listing 2 shows a simple example of the Python script reading data from the starwars.yaml
file and forming it before output.
Listing 2: starwars.py
#!/usr/bin/env python3 ** import yaml from yaml.loader import SafeLoader ** with open('starwars.yaml') as f: sw = yaml.load(f, Loader=SafeLoader) print(yaml.dump(sw, indent=4, default_flow_style=False))
Command-Line YAML Parser
The yq (yq documentation), parser is a very good choice for processing configuration files written in YAML. Because it is based on the well-known JSON jq parser, it uses very similar syntax. A nice side effect is that you can process JSON data with yq, as well. If your Linux distribution does not offer a preconfigured yq package, simply install the software directly from the GitHub page:
wget https://github.com/mikefarah/yq/releases/download/v4.14.1/yq_linux_amd64 -O ~/bin/yq chmod u+x ~/bin/yq
On macOS, you can also import the software with the Homebrew package manager.
Searching YAML Documents
A typical task when processing YAML files is to search for a specific key and the value assigned to that key. For example, if you want to filter out all years of publication from the starwars.yaml
file, use:
yq eval ".publication-year[]" starwars.yaml
If you only want to know when the first movie was released, put an index on the first element of the list:
yq eval ".publication-year[0]" starwars.yaml
Listing 3 contains a slightly more extensive YAML document that describes a Kubernetes pod. For an initial overview of what keys this file contains, run the command:
yq eval keys pod.yaml
Listing 3: Kubernetes Pod in YAML
apiVersion: v1 kind: Pod metadata: name: my-pod spec: containers: - name: db1-container image: k8s.gcr.io/busybox env: - name: DB_URL value: postgres://db_url:5431 - name: db2-container image: k8s.gcr.io/busybox env: - name: DB_URL value: postgres://db_url:5432
You can view a list of all container names with:
yq eval ".spec.containers[].name" pod.yaml
The command
yq eval '.spec.containers[].env[].value | select(. == "*32")' pod.yaml postgres://db_url:5432
filters with the select
function.
Validating Values
The validation of certain values with the length
function can be quite useful:
yq eval ".spec.containers[].name | length" pod.yaml
A YAML template can also be easily modified by yq to create configurations for different environments. For example, if you want to insert the hostname of your production database into the URL variable of the first container in the pod.yaml
template, you can use:
yq eval '.spec.containers[0].env[0].value = "postgres:// prod.example.com:5431"' pod.yaml > prod-pod.yaml
To make sure the change is visible not just on the screen, the modified YAML document has been written to a new file named prod-pod.yaml
, which now contains the modification, as shown with the command:
yq eval ".spec.containers[0].env[0].value" prod-pod.yaml postgres://prod.example.com:5431
With Kubernetes, this function proves to be extremely useful, because you can use it to change existing configurations immediately. For example, you can simply forward the output of the Kubernetes kubectl
tool with
yq (kubectl ... | yq eval ...)
and use kubectl apply
to send the result directly back to Kubernetes.
This article originally appeared in ADMIN magazine and is reprinted here with permission.
Want to read more? Check out the latest edition of ADMIN Network & Security.
Comments