Thanks for reading and for your smart questions!
1. I ran a batch, to test all the region in a row. The time is the same for the tester.
2. I tested all the service (10 sec * 20 region, about 200s to wait) and then start the next test. So, it's a matter of minutes, not hours/days
However, I tested again few days after (and surely at different hour of the day). The result was very similar (+/- 300ms) and the repartition was the same. No big change in the raking.
3. You should have partly right. When I test, I use us-central1 by default, event if I'm in europe (mainly because most of Alpha/preview are mainly released in us-central1 first).
However, the product is serverless. When you get 1 vCPU, it doesn't depends on the datacenter workload, hour of the day, or the usage of the legacy user. 1 vCPU is 1 vCPU, it's for you, you pay for it.
I could have 429 because of lack of resource for example, or huge cold start (I also ran a batch of test with fibo(1) to evaluate the cold start/latency, and it is small and can be minimized)
I also try to get some hypothesis to explain this difference and, except a different CPU generation, I didn't find other relevant explanation.
But I'm eager to have another propositions/explanations!