Skip to content

Performance

This page summarizes results of Fusion performance benchmarks.

Test Environment

ComponentSpecification
CPUAMD Ryzen 9 9950X3D 16-Core Processor
RAM96 GB DDR5
OSWindows 11
.NET10.0.1

Note that Ryzen 9 9950X3D has 32 logical cores due to SMT.

Run-PerformanceTest.cmd from Fusion Test suite

The benchmark measures throughput of a simple repository-style user lookup service (UserService.Get(userId)) that retrieves user records from a database. The test compares two scenarios:

  1. With Fusion: UserService.Get is a [ComputeMethod], so its results are cached, and thus a majority of database calls are avoided (unless they happen right after a mutation).

  2. Without Fusion: UserService.Get is a regular method, so every call to it executes a simple SQL query.

Test Scenarios

  • Multiple readers, 1 mutator: Simulates a realistic high-intensity workload with ~640 concurrent reader tasks (20 per CPU core) performing lookups, while a single mutator task periodically updates random user records. This tests how well Fusion handles cache invalidation under concurrent load.

  • Single reader, no mutators: A single task performs sequential lookups with no concurrent mutations. This measures the peak lookup throughput per CPU core.

The test uses a pool of 1,000 pre-populated user records. Each run performs multiple iterations, and the best result from 3 runs is reported.

Results

Multiple Readers + 1 Mutator (all cores)

TestSQLitePostgreSQL
Without Fusion155.68K calls/s38.61K calls/s
With Fusion316.34M calls/s313.75M calls/s
Speedup2,032x8,126x

Single Reader, No Mutators

TestSQLitePostgreSQL
Without Fusion55.70K calls/s1.78K calls/s
With Fusion19.54M calls/s19.66M calls/s
Speedup351x11,045x

Key Observations

  • With Fusion + concurrent readers: ~315M calls/s regardless of the database, because most calls are served from Fusion's in-memory cache. This is approximately 2,000x faster than direct PostgreSQL access and 8,000x faster than the single-reader baseline without Fusion.

  • Without Fusion: Performance is entirely database-bound. SQLite (in-process) outperforms PostgreSQL (network round-trip) significantly, especially for single-threaded access.

  • Concurrent access amplifies the difference: With many readers, Fusion's lock-free cache scales linearly with CPU cores, while database access becomes the bottleneck.

Benchmark.cmd from ActualLab.Fusion.Samples

The benchmark measures throughput of a simple repository-style user lookup service that retrieves and updates user records from a database: UserService.Get(userId) and Update(userId, ...).

Local Services

TestResultSpeedup
Regular Service135.44K calls/s
Fusion Service266.58M calls/s~1,968x

Remote Services

TestResultSpeedup
HTTP Client → Regular Service100.72K calls/s
HTTP Client → Fusion Service431.35K calls/s~4.3x
ActualLab.Rpc Client → Fusion Service6.92M calls/s~69x
Fusion Client → Fusion Service226.73M calls/s~2,251x

RpcBenchmark.cmd from ActualLab.Fusion.Samples

This benchmark compares ActualLab.Rpc with gRPC, SignalR, and other RPC frameworks. The tables below include only ActualLab.Rpc, gRPC, and SignalR. Other options, such as StreamJsonRpc and RESTful API, are way slower, so we omit them.

Calls

TestActualLab.RpcgRPCSignalRSpeedup
Sum9.33M calls/s1.11M calls/s5.30M calls/s1.8..8.4x
GetUser8.37M calls/s1.10M calls/s4.43M calls/s1.9..7.6x
SayHello5.99M calls/s1.04M calls/s2.25M calls/s2.7..5.8x

Streams

TestActualLab.RpcgRPCSignalRSpeedup
Stream1101.17M items/s39.59M items/s17.17M items/s2.6..5.9x
Stream10047.53M items/s21.19M items/s14.00M items/s2.2..3.4x
Stream10K955.44K items/s691.20K items/s460.80K items/s1.4..2.1x

Test names indicate item size: Stream1 = 1-byte items, Stream100 = 100-byte items, Stream10K = 10KB items.

Throughput (items/s × item size)

TestActualLab.RpcgRPCSignalR
Stream1101.17 MB/s39.59 MB/s17.17 MB/s
Stream1004.75 GB/s2.12 GB/s1.40 GB/s
Stream10K9.78 GB/s7.08 GB/s4.72 GB/s

Docker-Based RPC Benchmarks

These benchmarks run in Docker containers with CPU limits to measure 4-core server performance. The server container is limited to 4 CPUs while client containers have 24 CPUs available, ensuring the server is the bottleneck. This setup matches grpc_bench, SayHello w/ gRPC is identical to what grpc_bench measures.

Docker Calls

FrameworkSumGetUserSayHello
ActualLab.Rpc1.49M calls/s1.40M calls/s1.13M calls/s
SignalR1.31M calls/s1.14M calls/s667.69K calls/s
gRPC480.48K calls/s476.97K calls/s447.06K calls/s
MagicOnion453.41K calls/s448.39K calls/s417.47K calls/s
StreamJsonRpc279.14K calls/s236.43K calls/s107.29K calls/s
HTTP164.10K calls/s156.26K calls/s129.30K calls/s

Docker Streams

Test names indicate item size: Stream1 = 1-byte items, Stream100 = 100-byte items, Stream10K = 10KB items.

FrameworkStream1Stream100Stream10K
ActualLab.Rpc34.24M items/s15.56M items/s432.72K items/s
gRPC12.60M items/s6.15M items/s259.20K items/s
SignalR5.28M items/s3.93M items/s202.32K items/s
StreamJsonRpc144.00K items/s144.00K items/s86.40K items/s

Reference: Redis Benchmark

Reference benchmark using redis-benchmark tool on the same machine (500K requests, best of 5 runs). Optimal client count (12) was determined via binary search over 1-1000 range.

OperationResult
PING_INLINE231.59K req/s
GET229.25K req/s
SET229.67K req/s