YAML: The Missing Battery in Python


Python is often marketed as a batteries-included language because it comes with almost everything you’d ever expect from a programming language. This statement is mostly true, as the standard library and the external modules cover a broad spectrum of programming needs. However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.

In this tutorial, you’ll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML. If you’re new to YAML or haven’t used it in a while, then you’ll have a chance to take a quick crash course before diving deeper into the topic.

In this tutorial, you’ll learn how to:

Read and write YAML documents in Python
Serialize Python’s built-in and custom data types to YAML
Safely read YAML documents from untrusted sources
Take control of parsing YAML documents at a lower level

Later, you’ll learn about YAML’s advanced, potentially dangerous features and how to protect yourself from them. To parse YAML at a low level, you’ll build a syntax highlighter tool and an interactive preview in HTML. Finally, you’ll take advantage of custom YAML tags to extend the data format’s syntax.

To get the most out of this tutorial, you should be familiar with object-oriented programming in Python and know how to create a class. If you’re ready to jump in, then you can follow the link below to get the source code for the examples that you’ll code in this tutorial:

Get Source Code: Click here to get the source code you’ll use to work with YAML in Python.

Taking a Crash Course in YAML

In this section, you’re going to learn the basic facts about YAML, including its uses, syntax, and some of its unique and powerful features. If you’ve worked with YAML before, then you can skip ahead and continue reading from the next section, which covers using YAML in Python.

Historical Context

YAML, which rhymes with camel, is a recursive acronym that stands for YAML Ain’t Markup Language because it’s not a markup language! Interestingly enough, the original draft for the YAML specification defined the language as Yet Another Markup Language, but later the current backronym was adopted to more accurately describe the language’s purpose.

An actual markup language, such as Markdown or HTML, lets you annotate text with formatting or processing instructions intermixed with your content. Markup languages are, therefore, primarily concerned with text documents, whereas YAML is a data serialization format that integrates well with common data types native to many programming languages. There’s no inherent text in YAML, only data to represent.

YAML was originally meant to simplify Extensible Markup Language (XML), but in reality, it has a lot more in common with JavaScript Object Notation (JSON). In fact, it’s a superset of JSON.

Even though XML was initially designed to be a metalanguage for creating markup languages for documents, people quickly adopted it as the standard data serialization format. The HTML-like syntax of angle brackets made XML look familiar. Suddenly, everyone wanted to use XML as their configuration, persistence, or messaging format.

As the first kid on the block, XML dominated the scene for many years. It became a mature and trusted data interchange format and helped shape new concepts like building interactive web applications. After all, the letter X in AJAX, a technique for getting data from the server without reloading the page, stands for none other than XML.

Ironically, it was AJAX that ultimately led to XML’s decline in popularity. The verbose, complex, and redundant XML syntax wasted a lot of bandwidth when data was sent over the network. Parsing XML documents in JavaScript was slow and tedious because of XML’s fixed document object model (DOM), which wouldn’t match the application’s data model. The community finally acknowledged that they’d been using the wrong tool for the job.

That’s when JSON entered the picture. It was built from the ground up with data serialization in mind. Web browsers could parse it effortlessly because JSON is a subset of JavaScript, which they already supported. Not only was JSON’s minimalistic syntax appealing to developers, but it also made porting to other platforms easier than XML did. To this day, JSON remains the slimmest, fastest, and most versatile textual data interchange format on the Internet.

YAML came into existence the same year as JSON, and by pure coincidence, it was almost a complete superset of JSON on the syntactical and semantic levels. Starting from YAML 1.2, the format officially became a strict superset of JSON, meaning that every valid JSON document also happens to be a YAML document.

In practice, however, the two formats look different, as the YAML specification puts more emphasis on human readability by adding a lot more syntactic sugar and features on top of JSON. As a result, YAML is more applicable to configuration files edited by hand rather than as a transport layer.

Comparison With XML and JSON

If you’re familiar with XML or JSON, then you might be wondering what YAML brings to the table. All three are major data interchange formats, which share some overlapping features. For example, they’re all text based and more or less human readable. At the same time, they differ in many respects, which you’ll find out next.

Note: There are other, less popular textual data formats like TOML, which the new build system in Python is based on. Currently, only external packaging and dependency management tools like Poetry can read TOML, but Python 3.11 will soon have a TOML parser in the standard library.

Common binary data serialization formats you’d find in the wild include Google’s Protocol Buffers and Apache’s Avro.

Now have a look at a sample document expressed in all three data formats but representing the same person. You can click to expand the collapsible sections and reveal data serialized in those formats:

<?xml version=”1.0″ encoding=”UTF-8″ ?>
<person firstName=“John” lastName=“Doe”>
<person firstName=“Jane” lastName=“Doe”>
<dateOfBirth/> <!- This is a comment –>

“person”: {
“dateOfBirth”: “1969-12-31”,
“firstName”: “John”,
“lastName”: “Doe”,
“married”: true,
“spouse”: {
“dateOfBirth”: null,
“firstName”: “Jane”,
“lastName”: “Doe”

Read the full article at https://realpython.com/python-yaml/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]