Skip to content

Performance

Updated: 2026-04-25
ActualLab.Fusion Version: 12.3.76

This page summarizes results of Fusion performance benchmarks.

Test Environment

ComponentSpecification
CPUAMD Ryzen 9 9950X3D 16-Core Processor
RAM96 GB DDR5
OSWindows 11
.NET10.0.7

Note that Ryzen 9 9950X3D has 32 logical cores due to SMT.

Run-PerformanceTest.cmd from Fusion Test suite

The benchmark measures throughput of a simple repository-style user lookup service (UserService.Get(userId)) that retrieves user records from a database. The test compares two scenarios:

  1. With Fusion: UserService.Get is a [ComputeMethod], so its results are cached, and thus a majority of database calls are avoided (unless they happen right after a mutation).

  2. Without Fusion: UserService.Get is a regular method, so every call to it executes a simple SQL query.

Test Scenarios

  • Multiple readers, 1 mutator: Simulates a realistic high-intensity workload with ~640 concurrent reader tasks (20 per CPU core) performing lookups, while a single mutator task periodically updates random user records. This tests how well Fusion handles cache invalidation under concurrent load.

  • Single reader, no mutators: A single task performs sequential lookups with no concurrent mutations. This measures the peak lookup throughput per CPU core.

The test uses a pool of 1,000 pre-populated user records. Each run performs multiple iterations, and the best result from 3 runs is reported.

Results

Multiple Readers + 1 Mutator (all cores)

TestSQLitePostgreSQL
Without Fusion155.68K calls/s38.61K calls/s
With Fusion316.34M calls/s313.75M calls/s
Speedup2,032x8,126x

Single Reader, No Mutators

TestSQLitePostgreSQL
Without Fusion55.70K calls/s1.78K calls/s
With Fusion19.54M calls/s19.66M calls/s
Speedup351x11,045x

Key Observations

  • With Fusion + concurrent readers: ~315M calls/s regardless of the database, because most calls are served from Fusion's in-memory cache. This is approximately 2,000x faster than direct PostgreSQL access and 8,000x faster than the single-reader baseline without Fusion.

  • Without Fusion: Performance is entirely database-bound. SQLite (in-process) outperforms PostgreSQL (network round-trip) significantly, especially for single-threaded access.

  • Concurrent access amplifies the difference: With many readers, Fusion's lock-free cache scales linearly with CPU cores, while database access becomes the bottleneck.

Benchmark.cmd from ActualLab.Fusion.Samples

The benchmark measures throughput of a simple repository-style user lookup service that retrieves and updates user records from a database: UserService.Get(userId) and Update(userId, ...).

Local Services

TestResultSpeedup
Regular Service118.15K calls/s
Fusion Service261.32M calls/s~2,212x

Remote Services

TestResultSpeedup
HTTP Client → Regular Service80.43K calls/s
HTTP Client → Fusion Service393.65K calls/s~4.9x
ActualLab.Rpc Client → Fusion Service7.92M calls/s~98x
Fusion Client → Fusion Service215.45M calls/s~2,679x

RpcBenchmark.cmd from ActualLab.Fusion.Samples

This benchmark compares ActualLab.Rpc with gRPC, SignalR, and other RPC frameworks. The tables below include only ActualLab.Rpc, gRPC, and SignalR. Other options, such as StreamJsonRpc and RESTful API, are way slower, so we omit them.

Calls

TestActualLab.RpcgRPCSignalRSpeedup
Sum10.16M calls/s1.29M calls/s5.31M calls/s1.9..7.9x
GetUser9.03M calls/s1.26M calls/s4.43M calls/s2.0..7.2x
SayHello6.16M calls/s1.18M calls/s2.24M calls/s2.7..5.2x

Call Latency Under Peak Throughput

FrameworkSum (p50 / p95 / p99)GetUser (p50 / p95 / p99)SayHello (p50 / p95 / p99)
ActualLab.Rpc2.0ms / 2.7ms / 6.4ms2.2ms / 3.1ms / 8.7ms3.2ms / 4.3ms / 10.4ms
gRPC3.3ms / 5.1ms / 13.7ms3.3ms / 5.9ms / 16.7ms3.5ms / 4.7ms / 14.2ms
SignalR3.9ms / 8.1ms / 11.1ms4.6ms / 5.8ms / 11.1ms9.0ms / 12.8ms / 15.0ms

Streams

TestActualLab.RpcgRPCSignalRSpeedup
Stream196.96M items/s43.78M items/s18.30M items/s2.2..5.3x
Stream10043.01M items/s25.87M items/s14.25M items/s1.7..3.0x
Stream10K820.08K items/s572.76K items/s414.72K items/s1.4..2.0x

Test names indicate item size: Stream1 = 1-byte items, Stream100 = 100-byte items, Stream10K = 10KB items.

Throughput (items/s × item size)

TestActualLab.RpcgRPCSignalR
Stream196.96 MB/s43.78 MB/s18.30 MB/s
Stream1004.30 GB/s2.59 GB/s1.43 GB/s
Stream10K8.40 GB/s5.86 GB/s4.25 GB/s

Docker-Based RPC Benchmarks

These benchmarks run in Docker containers with CPU limits to measure 4-core server performance. The server container is limited to 4 CPUs while client containers have 24 CPUs available, ensuring the server is the bottleneck. This setup matches grpc_bench, SayHello w/ gRPC is identical to what grpc_bench measures.

Docker Calls

FrameworkSumGetUserSayHello
ActualLab.Rpc4.75M calls/s4.38M calls/s2.52M calls/s
SignalR2.23M calls/s1.82M calls/s842.45K calls/s
gRPC437.85K calls/s441.44K calls/s399.32K calls/s
MagicOnion392.59K calls/s402.85K calls/s362.84K calls/s
StreamJsonRpc265.72K calls/s226.77K calls/s99.09K calls/s
HTTP105.25K calls/s103.12K calls/s88.18K calls/s

Docker Call Latency Under Peak Throughput

FrameworkSum (p50 / p95 / p99)GetUser (p50 / p95 / p99)SayHello (p50 / p95 / p99)
ActualLab.Rpc3.8ms / 8.1ms / 23.7ms4.3ms / 7.6ms / 16.2ms6.5ms / 35.5ms / 39.9ms
SignalR5.2ms / 12.2ms / 50.7ms7.0ms / 12.0ms / 37.3ms19.9ms / 51.0ms / 58.1ms
gRPC3.3ms / 32.3ms / 45.4ms3.5ms / 6.8ms / 32.1ms4.2ms / 9.6ms / 36.2ms
MagicOnion4.5ms / 8.3ms / 24.1ms5.0ms / 10.8ms / 23.1ms5.5ms / 8.8ms / 12.3ms
StreamJsonRpc43.4ms / 56.4ms / 59.6ms58.9ms / 70.1ms / 72.5ms107.3ms / 212.3ms / 222.0ms
HTTP33.9ms / 51.4ms / 54.9ms34.2ms / 44.3ms / 45.9ms30.4ms / 44.0ms / 45.5ms

Docker Streams

Test names indicate item size: Stream1 = 1-byte items, Stream100 = 100-byte items, Stream10K = 10KB items.

FrameworkStream1Stream100Stream10K
ActualLab.Rpc31.80M items/s12.66M items/s279.72K items/s
gRPC11.27M items/s6.04M items/s125.64K items/s
SignalR5.42M items/s3.62M items/s106.20K items/s
StreamJsonRpc115.20K items/s115.20K items/s0 items/s

Reference: Redis Benchmark

Reference benchmark using redis-benchmark tool on the same machine (500K requests, best of 5 runs). Optimal client count (12) was determined via binary search over 1-1000 range.

OperationResult
PING_INLINE231.59K req/s
GET229.25K req/s
SET229.67K req/s