Updating Wren's benchmarks

2023Q2.

Wren is a "small, fast, class-based, concurrent scripting language", originally designed by Bob Nystrom (who you might recognise as the author of Game Programming Patterns and Crafting Interpreters. It's a really fun language to study - the implementation is compact and easily readable, and although class-based languages aren't considered very hip these days there's a real elegance to its design. I saw Wren's performance page hadn't been updated for a very long time, and especially given the recent upstream interpreter performance work on Python, was interested in seeing how performance on these microbencharks has changed. Hence this quick post to share some new numbers.

New results

To cut to the chase, here are the results I get running the same set of benchmarks across a collection of Python, Ruby, and Lua versions (those available in current Arch Linux).

Method Call:

wren0.4
0.079s 
luajit2.1 -joff
0.090s 
ruby2.7
0.102s 
ruby3.0
0.104s 
lua5.4
0.123s 
lua5.3
0.156s 
python3.11
0.170s 
lua5.2
0.184s 
mruby
0.193s 
python3.10
0.313s 

Delta Blue:

wren0.4
0.086s 
python3.11
0.106s 
python3.10
0.202s 

Binary Trees:

luajit2.1 -joff
0.073s 
ruby2.7
0.113s 
ruby3.0
0.115s 
python3.11
0.137s 
lua5.4
0.138s 
wren0.4
0.144s 
mruby
0.163s 
python3.10
0.186s 
lua5.3
0.195s 
lua5.2
0.196s 

Recursive Fibonacci:

luajit2.1 -joff
0.055s 
lua5.4
0.090s 
ruby2.7
0.109s 
ruby3.0
0.117s 
lua5.3
0.126s 
lua5.2
0.138s 
wren0.4
0.148s 
python3.11
0.157s 
mruby
0.185s 
python3.10
0.252s 

I've used essentially the same presentation and methodology as in the original benchmark, partly to save time pondering the optimal approach, partly so I can redirect any critiques to the original author (sorry Bob!). Benchmarks do not measure interpreter startup time, and each benchmark is run ten times with the median used (thermal throttling could potentially mean this isn't the best methodology, but changing the number of test repetitions to e.g. 1000 seems to have little effect).

The tests were run on a machine with an AMD Ryzen 9 5950X processor. wren 0.4 as of commit c2a75f1 was used as well as the following Arch Linux packages:

The Python 3.10 and 3.11 packages were compiled with the same GCC version (12.2.1 according to python -VV), though this won't necessarily be true for all other packages (e.g. the lua52 and lua53 packages are several years old so will have been built an older GCC).

I've submitted a pull request to update the Wren performance page.

Old results

The following results are copied from the Wren performance page (archive.org link ease of comparison. They were run on a MacBook Pro 2.3GHz Intel Core i7 with Lua 5.2.3, LuaJIT 2.0.2, Python 2.7.5, Python 3.3.4, ruby 2.0.0p247.

Method Call:

wren2015
0.12s 
luajit2.0 -joff
0.16s 
ruby2.0
0.20s 
lua5.2
0.35s 
python3.3
0.78s 
python2.7
0.85s 

DeltaBlue:

wren2015
0.13s 
python3.3
0.48s 
python2.7
0.57s 

Binary Trees:

luajit2.0 -joff
0.11s 
wren2015
0.22s 
ruby2.0
0.24s 
python2.7
0.37s 
python3.3
0.38s 
lua5.2
0.52s 

Recursive Fibonacci:

luajit2.0 -joff
0.10s 
wren2015
0.20s 
ruby2.0
0.22s 
lua5.2
0.28s 
python2.7
0.51s 
python3.3
0.57s 

Observations

A few takeaways:

Appendix: Benchmark script

Health warning: this is incredibly quick and dirty (especially the repeated switching between the python packages to allow testing both 3.10 and 3.11):

#!/usr/bin/env python3

# Copyright Muxup contributors.
# Distributed under the terms of the MIT license, see LICENSE for details.
# SPDX-License-Identifier: MIT

import statistics
import subprocess

out = open("out.md", "w", encoding="utf-8")


def run_single_bench(bench_name, bench_file, runner_name):
    bench_file = "./test/benchmark/" + bench_file
    if runner_name == "lua5.2":
        bench_file += ".lua"
        cmdline = ["lua5.2", bench_file]
    elif runner_name == "lua5.3":
        bench_file += ".lua"
        cmdline = ["lua5.3", bench_file]
    elif runner_name == "lua5.4":
        bench_file += ".lua"
        cmdline = ["lua5.4", bench_file]
    elif runner_name == "luajit2.1 -joff":
        bench_file += ".lua"
        cmdline = ["luajit", "-joff", bench_file]
    elif runner_name == "mruby":
        bench_file += ".rb"
        cmdline = ["mruby", bench_file]
    elif runner_name == "python3.10":
        bench_file += ".py"
        subprocess.run(
            [
                "sudo",
                "pacman",
                "-U",
                "--noconfirm",
                "/var/cache/pacman/pkg/python-3.10.10-1-x86_64.pkg.tar.zst",
            ],
            check=True,
        )
        cmdline = ["python", bench_file]
    elif runner_name == "python3.11":
        bench_file += ".py"
        subprocess.run(
            [
                "sudo",
                "pacman",
                "-U",
                "--noconfirm",
                "/var/cache/pacman/pkg/python-3.11.3-1-x86_64.pkg.tar.zst",
            ],
            check=True,
        )
        cmdline = ["python", bench_file]
    elif runner_name == "ruby2.7":
        bench_file += ".rb"
        cmdline = ["ruby-2.7", bench_file]
    elif runner_name == "ruby3.0":
        bench_file += ".rb"
        cmdline = ["ruby", bench_file]
    elif runner_name == "wren0.4":
        bench_file += ".wren"
        cmdline = ["./bin/wren_test", bench_file]
    else:
        raise SystemExit("Unrecognised runner")

    times = []
    for _ in range(10):
        bench_out = subprocess.run(
            cmdline, capture_output=True, check=True, encoding="utf-8"
        ).stdout
        times.append(float(bench_out.split(": ")[-1].strip()))
    return statistics.median(times)


def do_bench(name, file_base, runners):
    results = {}
    for runner in runners:
        results[runner] = run_single_bench(name, file_base, runner)
    results = dict(sorted(results.items(), key=lambda kv: kv[1]))
    longest_result = max(results.values())
    out.write(f"**{name}**:\n")
    out.write('<table class="chart">\n')
    for runner, result in results.items():
        percent = round((result / longest_result) * 100)
        out.write(
            f"""\
  <tr>
    <th>{runner}</th><td><div class="chart-bar" style="width: {percent}%;">{result:.3f}s&nbsp;</div></td>
  </tr>\n"""
        )
    out.write("</table>\n\n")


all_runners = [
    "lua5.2",
    "lua5.3",
    "lua5.4",
    "luajit2.1 -joff",
    "mruby",
    "python3.10",
    "python3.11",
    "ruby2.7",
    "ruby3.0",
    "wren0.4",
]
do_bench("Method Call", "method_call", all_runners)
do_bench("Delta Blue", "delta_blue", ["python3.10", "python3.11", "wren0.4"])
do_bench("Binary Trees", "binary_trees", all_runners)
do_bench("Recursive Fibonacci", "fib", all_runners)
print("Output written to out.md")