How to read and write YAML in Rust with Serde

In this article, we will learn how to read and write YAML files in Rust with the Serde framework. Being able to read YAML files enables you to, for example, use YAML files as configuration files for your projects. This article will not be very in-depth, just a simple example of how to read and write YAML data from a file.

The full code for this article can be found on my GitHub.

Prerequisites

Some understanding of Rust is required. Familiarity with YAML and its structure is also a plus.

Creating the project and dependencies

Let’s create the project and add the required dependencies.

cargo new rust-yaml-file-tutorial

Then add the following dependencies to Cargo.toml:

  • serde: Serde is a framework for serializing and deserializing Rust data structures efficiently and generically.
  • serde_yaml: This crate is a Rust library for using the Serde serialization framework with data in YAML file format.
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_yaml = "0.8"

We need the derive feature so that we can serialize and deserialize objects.

Read a file and deserialize

In this section, we are going to read a YAML file and deserialize it into an object.

First, let’s create the file we are going to read. Imagine we are creating a simple news scraping app and want to configure some things. For example:

  • how often a scrape happens: every 120 seconds.
  • how many threads to spin off to run in parallel: 1.
  • what urls to scrape

Create a file called config.yml in the root directory of the project and add the following contents:

update_frequency_sec: 120
num_threads: 1
data_sources:
  - news.google.com
  - finance.yahoo.com
  - www.msn.com

The cleanest way to read this data in and use it is to deserialize it into an object that matches this structure. So, let’s create a struct in main.rs that represents this data:

use serde::{Deserialize, Serialize};
use serde_yaml::{self};

#[derive(Debug, Serialize, Deserialize)]
struct Config {
    update_frequency_sec: u32,
    num_threads: u32,
    data_sources: Vec<String>,
}

We create the Config struct here and annotate it with Debug for easy printing to terminal, Serialize to serialize the object to a file, and of course, Deserialize to deserialize from file to an object.

The names of the struct attributes should match the names in the YAML file and the types should also match in this way.

Deserializing is easy with serde_yaml:

fn main() {
    let f = std::fs::File::open("config.yml").expect("Could not open file.");
    let scrape_config: Config = serde_yaml::from_reader(f).expect("Could not read values.");

    println!("{:?}", scrape_config);
}

First, on line 11, we open the YAML file with a normal file object from the standard library: std::fs::File.

Then, we pass that object to serde_yaml::from_reader() to read in the file. Using the type hint for the scrape_config variable tells serde_yaml what to deserialize the data to.

Finally, we print the object using debug print {:?}. The result should look like this:

Config { update_frequency_sec: 120, num_threads: 1, data_sources: ["news.google.com", "finance.yahoo.com", "www.msn.com"] }

Of course, we can also access the attributes individually with this object.

    println!(
        "update_frequency_sec: {}",
        scrape_config.update_frequency_sec
    );

    for data_source in scrape_config.data_sources.iter() {
        println!("{}", data_source);
    }

Serialize and write to a YAML file

In this final section, we are going to change the values of the object we loaded in and then serialize and write it to a different file. Thanks to having deserialized the data to a struct, this is very easy.

We first have to update line 12 so that the object becomes mutable. Then we can simply change values, or push a value onto the Vec for the data_sources, and then write to a YAML file.

The following lines change values in the object and then write the object to a YAML file:

    scrape_config.num_threads = 2;

    scrape_config
        .data_sources
        .push("www.nytimes.com".to_string());
    scrape_config
        .data_sources
        .push("news.yahoo.com".to_string());

    let f = std::fs::OpenOptions::new()
        .write(true)
        .create(true)
        .open("new_config.yml")
        .expect("Couldn't open file");
    serde_yaml::to_writer(f, &scrape_config).unwrap();

This will open a file and then use serde_yaml::to_writer to write the serialized object to it. For those who are not familiar with the OpenOptions object: OpenOptions allows us to configure what should happen when opening a file. In this case, we are telling it to create the file if it does not exist.

Deserializing and Serializing arrays of objects

In our example, we have an array of URLs for what should be scraped. But what if we wanted to provide more information about the URL, such as the name of the site and what type of site it is?

We can do this by changing the array of strings to an array of objects. Let’s update the config struct to look like this:

#[derive(Debug, Serialize, Deserialize)]
struct DataSource {
    name: String,
    url: String,
    source_type: String,
}

#[derive(Debug, Serialize, Deserialize)]
struct Config {
    update_frequency_sec: u32,
    num_threads: u32,
    data_sources: Vec<DataSource>,
}

We now have an extra struct called DataSource representing the additional information we want to convey. Of course, this has implications for the existing code. Because of that, we have to update the code for reading and writing this extra information:

fn main() {
    let f = std::fs::File::open("config.yml").expect("Could not open file.");
    let mut scrape_config: Config = serde_yaml::from_reader(f).expect("Could not read values.");

    println!("{:?}", scrape_config);

    println!(
        "update_frequency_sec: {}",
        scrape_config.update_frequency_sec
    );

    for data_source in scrape_config.data_sources.iter() {
        println!(
            "name: {}, type: {}, url {}",
            data_source.name, data_source.source_type, data_source.url
        );
    }

    scrape_config.num_threads = 2;

    scrape_config.data_sources.push(DataSource {
        name: "NYTimes".to_string(),
        url: "www.nytimes.com".to_string(),
        source_type: "news".to_string(),
    });
    scrape_config.data_sources.push(DataSource {
        name: "Yahoo News".to_string(),
        url: "news.yahoo.com".to_string(),
        source_type: "news".to_string(),
    });

    let f = std::fs::OpenOptions::new()
        .write(true)
        .create(true)
        .open("new_config.yml")
        .expect("Couldn't open file");
    serde_yaml::to_writer(f, &scrape_config).unwrap();
}

The updated lines are highlighted. Because the data_source is an object now we can use the object instance to access information on line 31. And we need to create new objects in order to push new data onto the data_sources array instead of strings.

We have to update config.yml as well to reflect the new expected structure:

update_frequency_sec: 120
num_threads: 1
data_sources:
  - name: Google News
    url: news.google.com
    source_type: news
  - name: Yahoo Finance
    url: finance.yahoo.com
    source_type: financial news
  - name: Microsoft News
    url: www.msn.com
    source_type: news

Conclusion

We now know how to read and write to YAML file in Rust, with the help of the serde framework. Now we can deserialize and serialize objects. This will be useful for bigger projects where we do not want to hard code certain values, but instead, make them configurable. For example for a crypto triangle arbitrage backend project, see my article here.

The complete code for this article can be viewed on my GitHub.

Please follow me on Twitter if you want updates on upcoming Rust programming articles:

Leave a Reply

Your email address will not be published. Required fields are marked *