Rust vs Python. Rust impl & final results

May 16, 2024

Hello, dear readers,

Welcome to the second part of our series comparing the performance of Rust and Python when solving the same problem. If you missed the first part, you can find it here.

Rust is a powerful language that ensures you write correct and efficient code. The official website for Rust highlights the following key benefits:

Performance
Reliability
Productivity

The only notable downside of Rust is its steep learning curve. However, the compiler provides helpful suggestions, and with time, you will get accustomed to the borrow checker and ownership model.
Rust is highly praised for its ability to avoid bugs related to manual memory management. For example, you can check out Rust vs Common C++ Bugs. Rust also allows fearless concurrency (though it is race condition-free, it is not deadlock-free), has zero-cost abstractions, and can be compiled to WebAssembly. However, let's return to our main topic. The Rust version of our project relies on several tools:

cargo as the build system
cargo-llvm-cov to measure code coverage
criterion.rs for benchmarking

The full code for the Rust version is available at msgpack-py-vs-rs .

Data model

Let's define the Item model that represents a single message:

use rmpv::Value;

#[derive(Debug, PartialEq)]
pub struct Item<'a> {
    pub id: i32,
    pub process_id: i32,
    pub thread_id: i32,
    pub timestamp_ns: i64,
    pub line: i32,
    pub value: f32,
    pub filename: &'a str,
    pub path: &'a str,
}

impl<'a> Item<'a> {
    pub fn from_list(arr: &'a [Value]) -> Item<'a> {
        Item {
            id: arr[0].as_i64().unwrap() as i32,
            process_id: arr[1].as_i64().unwrap() as i32,
            thread_id: arr[2].as_i64().unwrap() as i32,
            timestamp_ns: arr[3].as_i64().unwrap(),
            line: arr[4].as_i64().unwrap() as i32,
            value: arr[5].as_f64().unwrap() as f32,
            filename: arr[6].as_str().unwrap(),
            path: arr[7].as_str().unwrap(),
        }
    }
}

Let's recap what 'a means here. The Item struct has two fields, filename and path, which are references with the lifetime 'a. This lifetime 'a is a generic lifetime parameter that specifies how long the references in these fields are valid. The rest of the fields (id, process_id, thread_id, timestamp_ns, line, value) are not references, so they do not have lifetimes associated with them. When I create an instance of Item, the 'a lifetime must be connected to the lifetime of the data that filename and path point to. This means that the references filename and path cannot outlive the data they point to.

Method Item::from_list accepts &'a [Value](a slice of Value) with lifetime 'a and returns Item<'a>, this prevents any potential dangling references, ensuring that the Item struct cannot outlive the data it references in the arr slice.

I'm using &str(string slice) instead of String to avoid unnecessary copies, the actual string is already allocated before in Value during parsing!

Parser

Let's define how we want to use the parser/reader from the caller side:

fn main() -> Result<()> {  
    let s = Instant::now();  
    let f = File::open("items.msgpack")?;  
    let rdr = BufReader::new(f);  
  
    let mut msgs: usize = 0;  
    let mut sum_of_ids: i64 = 0;  
  
    let mut parser: ItemMsgPackParser<BufReader<File>> = ItemMsgPackParser::new(rdr);  
    let on_next = |value: &Item| {  
        sum_of_ids += value.id as i64;  
        msgs += 1;  
  
        Ok(())  
    };  
    parser.parse(on_next)?;  
    println!("Read {msgs} messages, sum_of_ids is {sum_of_ids} in {} ms", s.elapsed().as_millis());  
  
    Ok(())  
}

on_next is a closure that accepts Item as a reference and returns Result<()> type.
Let's define generic MsgPackParser type first

pub struct MsgPackParser<R> {  
    reader: R,  
}  
  
impl<R: std::io::Read> MsgPackParser<R> {  
    pub fn new(reader: R) -> Self {  
        MsgPackParser { reader }  
    }  
  
    pub fn parse(  
        &mut self,  
        mut on_next: impl FnMut(&Value) -> errors::Result<()>,  
    ) -> errors::Result<()> {  
        loop {  
            match rmpv::decode::read_value(&mut self.reader) {  
                Ok(value) => on_next(&value)?,  
                Err(err) => {  
                    if err.kind() != UnexpectedEof {  
                        // In properly written msgpack files this should not happen, log and return error  
                        warn!("Failed with err: {err}, kind: {}", err.kind());  
                        return Err(errors::Error::from(err));  
                    } else {  
                        // Reached EOF, we can stop the loop  
                        break;  
                    }  
                }  
            }  
        }  
        Ok(())  
    }  
}

As you may noticed it has generic type R that must implement std::io::Read. This way I can pass anything that implements that trait, for example, std::fs::File or std::io::Cursor . This simplifies unit testing and benchmarking.

Note the signature of on_next:

mut on_next: impl FnMut(&Value) -> errors::Result<()>

It must be FnMut so that provided closure on_next can be called repeatedly and may mutate state.

MsgPackParser is too generic, I want to have a type that knows how to convert rmpv::Value to our Item<'a>, let's define a type ItemMsgPackParser<R> for that. It uses MsgPackParser<R> and Item::from_list from above to be able to parse and convert generic Value to Item type:

pub struct ItemMsgPackParser<R> {  
    parser: MsgPackParser<R>,  
}  
  
impl<R: std::io::Read> ItemMsgPackParser<R> {  
    pub fn new(reader: R) -> Self {  
        ItemMsgPackParser {  
            parser: MsgPackParser::new(reader),  
        }  
    }  
  
    pub fn parse(  
        &mut self,  
        mut on_next: impl FnMut(&Item) -> errors::Result<()>,  
    ) -> errors::Result<()> {  
        self.parser.parse(|value| match value {  
            Value::Array(arr) => {  
                let item = Item::from_list(arr.as_slice());  
                on_next(&item)?;  
                Ok(())  
            }  
            other => {  
                let t = match other {  
                    Value::Nil => "Nil",  
                    Value::Boolean(_) => "Boolean",  
                    Value::Integer(_) => "Integer",  
                    Value::F32(_) => "F32",  
                    Value::F64(_) => "F64",  
                    Value::String(_) => "String",  
                    Value::Binary(_) => "Binary",  
                    Value::Array(_) => "Array",  
                    Value::Map(_) => "Map",  
                    Value::Ext(_, _) => "Ext",  
                };  
                let msg = format!("Expected `Array` type but got `{}`", t);  
                return Err(Error::from(ErrorKind::ItemMsgPackParser(msg)));  
            }  
        })?;  
        Ok(())  
    }  
}

Error handling

If you're new to Rust, you may noticed the question mark operator ?. It unwraps valid values or returns errornous values, propagating them to the calling function, check The question mark operator, ? to understand it better. The last piece of the code is the type errors::Result. This type represents all known errors that can happen during parsing, it uses thiserror crate to simplify error handling:

use thiserror::Error;  
  
#[derive(Error, Debug)]  
#[error(transparent)]  
pub struct Error(Box<ErrorKind>);  
  
#[derive(Error, Debug)]  
#[error(transparent)]  
pub enum ErrorKind {  
    #[error("IoError: {0}")]  
    IoError(#[from] std::io::Error),  
    #[error("MsgPackDecodeError: {0}")]  
    MsgPackDecodeError(#[from] rmpv::decode::Error),  
    #[error("ItemMsgPackParser: {0}")]  
    ItemMsgPackParser(String),  
}  
  
impl<E> From<E> for Error  
where  
    ErrorKind: From<E>,  
{  
    fn from(err: E) -> Self {  
        Error(Box::new(ErrorKind::from(err)))  
    }  
}  
  
pub type Result<T> = std::result::Result<T, Error>;

You may notice that the ErrorKind is boxed, the reason is to limit the maximum size of Result<T> . The size of enum ErrorKind is equal to the largest variant + padding, in some cases it can become quite large and method returning a type that contains ErrorKind will need to create such large space in the stack for returning value (check my article Rust: enum, boxed error and stack size mystery) for deep dive on the topic.

Benchmark

Let's use criterion.rs to write a benchmark that parse msgpack from memory, the same way as did for Python. criterion.rs requires Rust project to be library, not binary. It is quite easy to add a new benchmark, requires two steps:

Modify rust/msgpack-core/Cargo.toml#L23 by adding [[bench]] section
Create benches folder and put the source code of bench with the name from [[bench]] section

The code of the benchmark:

use criterion::{black_box, criterion_group, criterion_main, Criterion};  
use msgpack_core::msgpack_parser::ItemMsgPackParser;  
use std::io::Cursor;  
  
fn parse_items(bytes: &[u8]) {  
    let mut parser = ItemMsgPackParser::new(Cursor::new(bytes));  
    parser  
        .parse(|v| {  
            // Let's consume `v` using `black_box` to make sure compiler won't get rid of unused arg  
            black_box(v);  
            Ok(())  
        })  
        .unwrap();  
}  
  
pub fn criterion_benchmark(c: &mut Criterion) {  
    // The file contains exact same data as for Python benchmark  
    let bytes = include_bytes!("../test_resources/10000_items.msgpack").to_vec();  
  
    c.bench_function("in_memory_stream_benchmark for 10000 messages", |b| {  
        b.iter(|| parse_items(bytes.as_slice()))  
    });  
}  
  
criterion_group!(benches, criterion_benchmark);  
criterion_main!(benches);

Benchmark results

Initial results were quite odd for Windows OS, it was twice slower than on Ubuntu 20.04 that runs on the same Windows host machine with WSL2 , I suspected that memory allocator is the root cause, decided to add two allocators snmalloc-rs and jemallocator to verify the hypothesis.

The results are interesting, especially for Windows:

snmalloc-rs that uses Microsoft's snmalloc makes Rust code to run almost twice faster, 48.6%
snmalloc-rs outperforms jemallocator on Ubuntu 24.04 LTS as well
with default allocator the version compiled in Ubuntu 24.04 LTS is 35% faster than Windows 10

MAD is Median absolute deviation
SD is Standard deviation
Average throughput in MBytes/s is calculated from average message size 55.9439 bytes/msg (559439 bytes is the size of 10000 messages)

Comparison with Python

Let's combine the best results from Python and Rust to see which is faster. The winner is Rust (🦀) , it is 35 times faster than Python (🐍).

Comments and suggestions are welcome! Thank you for your time.

Art’s Substack

Discussion about this post