GOGC advice.
Not long ago I needed to benchmark the performance of Golang on a many-core machine. I took several of the benchmarks that are bundled with the Go source code, copied them, and modified them to run on all available threads. In that case the machine has 24 cores and 48 threads.
Bumping up the GC threshold in our Kafka consumers was a big fat Turbo button.
TL;DR benchmark showed that (in that case) Go spends a most of the time garbage collecting because the program generated a huge amount of short lived data and so the speedup is not nearly linear. Playing with GOGC, a value that sets the GC target percentage (threshold), revealed how to achieve a linear speedup with the numbver of cores.
Vlad writes about benchmarking Go crypto performance on multiple cores:
