I love Google Cloud Run and highly recommend it as the best option[1]. The Cloud Run GPU, however is not something I can recommend. It is not cost effective (instance based billing is expensive as opposed to request based billing), GPU choices are limited, and the general loading/unloading of model (gigabytes) from GPU memory makes it slow to be used as server less.
Once you compare the numbers it is better to use a VM + GPU if the utilization of your service is even only for 30% of the day.
1 - https://ashishb.net/programming/free-deployment-of-side-proj...
google vp here: we appreciate the feedback! i generally agree that if you have a strong understanding of your static capacity needs, pre-provisioning VMs is likely to be more cost efficient with today's pricing. cloud run GPUs are ideal for more bursty workloads -- maybe a new AI app that doesn't yet have PMF, where you really need that scale-to-zero + fast start for more sparse traffic patterns.
Appreciate the thoughtful response! I’m actually right in the ICP you described — I’ve run my own VMs in the past and recently switched to Cloud Run to simplify ops and take advantage of scale-to-zero. In my case, I was running a few inference jobs and expected a ~$100 bill. But due to the instance-based behavior, it stayed up the whole time, and I ended up with a $1,000 charge for relatively little usage.
I’m fairly experienced with GCP, but even then, the billing model here caught me off guard. When you’re dealing with machines that can run up to $64K/month, small missteps get expensive quickly. Predictability is key, and I’d love to see more safeguards or clearer cost modeling tooling around these types of workloads.
Apologies for the surprise charge there. It sounds like your workload pattern might be sitting in the middle of the VM vs. Serverless spectrum. Feel free to email me at (first)(last)@google.com and I can get you some better answers.
> But due to the instance-based behavior, it stayed up the whole time, and I ended up with a $1,000 charge for relatively little usage.
Indeed. IIRC, if you get a single request every 15 mins (~100 requests a day), you will pay for Cloud Run GPU for the full day.
Has this changed? When I looked pre-ga the requirements were you need to pay for the CPU 24x7 to attach a GPU so that is not really scaling to zero unless this requirement has changed...
Speaking from my experience, it does scale to zero except you pay for 15 mins after the last request.
So if you get all your requests in a 2 hours window then that's great. It will scale to zero for rest of the 22 hours.
However, if you get at least one request every 15 mins then you will pay for 24 hours and it is ~3X more expensive then equivalent VM on Google Cloud.
How does that compare to spinning up some ec2s with amazon trainium gpus?
Depending on your model, you may spend a lot of time trying to get it to work with Trainium
Cloud Run is a great service. I find it much easier to work with than AWS's equivalent (ECS/Fargate.)
AWS AppRunner is the closest equivalent to Cloud Run. Its really not close though, AppRunner is an unloved service at AWS and is missing a lot of the features that make Cloud Run nice.
AppRunner was Amazon's answer to AppEngine a full decade+ later. Cloud Run is miles ahead.
I agree with the unloved part. It was a great middle ground between Lambda and Fargate (zero cold start, reasonable pricing), but has seemingly been in maintenance mode for quite a while now. Really sad to see.
i am biased, but i agree :)
hah. I looked at your comments and saw you were a google VP! I've migrated some small systems from AWS to GCP for various POCs and prototypes, mostly Lambda and ECS to Cloud Run, and find GCP provides a better developer experience overall.
love that you're enjoying the devex. we put a lot of sweat into it, especially in services like cloud run.
Yeah, anyone who uses GCP and AWS thoroughly will agree that GCP is a superior developer experience.
The problem is continuous product churn. This was discussed at length at https://news.ycombinator.com/item?id=41614795
I think Lambda is more or less the AWS equivalent.
It's not. Cloud Run can be longer running: you can have batch and services. Lambda is closer to Cloud Functions.
I think Cloud Run Functions would be the direct equivalent to Lambda.
I agree, but in the GCP world, a lot of these things are merging. My understanding is that Cloud Run, Cloud Run Functions (previously known as Cloud Functions Gen2) and even App Engine Flexible all run in the same underlying cloud run infrastructure, so it's essentially just some interface differences that to me now seem more like historical legacy/backwards compatibility reasons than meaningful functionality differences (e.g. Functions can now handle multiple concurrent requests).
FWIW, App Engine Flexible is a different product that runs on GCE VM.
Other products (App Engine standard, Cloud Functions gen1, Cloud Run, Cloud Run Functions) share many underlying infrastructures.
Oh, thanks! I guess I had it backwards - I thought App Engine standard was the one on a different infrastructure.
Oh, you’re probably right.
Eh idk Cloud Run is much better suited to long running instances than Lambda. You would use Cloud Functions for those types of workloads in GCP.
For those who don't know, AWS Lambda functions have a hard limit of 15 minutes.
The problem is you can't reliably get VMs on GCP.
All the major clouds are suffering from this. AWS you can't ever get an 80gb gpu without a long term reserve and even then it's wildly expensive. GCP you can sometimes but its also insanely expensive.
These companies claim to be "startup friendly", they are anything but. All the neo-clouds somehow manage to do this well (runpod, nebius, lambda) but the big clouds are just milking enterprise customers who won't leave and in the process screwing over the startups.
This is a massive mistake they are making, which will hurt their long term growth significantly.
To massively increase the reliability to get GPUs, you can use something like SkyPilot (https://github.com/skypilot-org/skypilot) to fall back across regions, clouds, or GPU choices. E.g.,
$ sky launch --gpus H100
will fall back across GCP regions, AWS, your clusters, etc. There are options to say try either H100 or H200 or A100 or <insert>.
Essentially the way you deal with it is to increase the infra search space.
We've hit into this a lot lately too, even on AWS. "Elastic" compute, but all the elasticity's gone. It's especially bitter since splitting the costs for spare capacity is the major benefit of scale here...
Enterprises are just gobbling up all the supply on reserves so they see no need to lower the price.
All the while saying they are "startup friendly".
Agreed. Pricing is insane and availability generally sucks.
If anyone is curious about these neo-clouds, a YC startup called Shadeform has their availability and pricing in a live database here: https://www.shadeform.ai/instances
They have a platform where you can deploy VMs and bare metal from 20 or so popular ones like Lambda, Nebius, Scaleway, etc.
I had the opposite experience with cloud run. Mysterious scale outs/restarts - I had to buy a paid subscription to cloud support to get answers and found none. Moved to self managed VMs. Maybe things have changed now.
Sadly this is still the case. Cloud Run helped us get off the ground. But we've had two outages where Google Enhanced Support could give us no suggestion other than "increase the maximum instances" (not minimum instances). We were doing something like 13 requests/min on this instance at the time. The resource utilization looked just fine. But somehow we had a blip in any containers being available. It even dropped below our min containers. The fix was to manually redeploy the latest revision.
We're now investigating moving to Kubernetes where we will have more control over our destiny. Thankfully a couple people on the team have experience with this.
Something like this never happened with Fargate in the years my previous team had used that.
https://github.com/claceio/clace is project I am building which gives a Cloud Run type deployment experience on your own VMs. For each app, it supports scale down to zero containers (scaling up beyond one is being built).
The authorization and auditing features are designed for internal tools, any app can be deployed otherwise.
Have a look at Knative
Knative is amazing!
You don't go to cloud services because they are cheaper.
You go there because you are already there or have contracts etc etc
Does Cloud Run still use a fake Linux kernel emulated by Go, rather than a real VM?
Does Cloud Run give you root?
You're thinking of gvisor. But no, the "gen2" runtime is a microvm ala firecracker and performs a lot better as a result.
Ah, that's great.
And it looks like Cloud Run can do something Lambda can't: https://cloud.google.com/run/docs/create-jobs . "Unlike a Cloud Run service, which listens for and serves requests, a Cloud Run job only runs its tasks and exits when finished. A job does not listen for or serve requests."
https://github.com/cloud-hypervisor/cloud-hypervisor or something else?
Possibly? I haven't found any public documentation that says specifically what hypervisor is used.
Google built crosvm which was the initial inspiration for firecracker, but Cloud Run runs on top of Borg (this fact is publicly documented). Borg is closed source, so it's possible the specific hypervisor they're using is as well.
We (I work on Cloud Run) are working on root access. If you'd like to know more you can reach me rpei@google.com
Awesome! I'll reach out to you, thank you.
> I love Google Cloud Run and highly recommend it as the best option
I'd love to see the numbers for Cloud Run. It's nice for toy projects, but it's a money sink for anything serious, at least from my experience. On one project, we had a long-standing issue with G regarding autoscaling - scaling to zero sounds nice on paper, but they will not mention you the warmup phases where CR can spin up multiple containers for a single request and keep them for a while. And good luck hunting for unexplainedly running containers when there are no apparent cpu or network uses (G will happily charge you for this).
Additionally, startup is often abysmal with Java and Python projects (although it might perform better with Go/C++/Rust projects, but I don't have experience running those on CR).
> It's nice for toy projects, but it's a money sink for anything serious, at least from my experience.
This is really not my experience with Cloud Run at all. We've found it to actually be quite cost effective for a lot of different types of systems. For example, we ended up helping a customer migrate a ~$5B/year ecommerce platform onto it (mostly Java/Spring and Typescript services). We originally told them they should target GKE but they were adamant about serverless and it ended up being a perfect fit. They were paying like $5k/mo which is absurdly cheap for a platform generating that kind of revenue.
I guess it depends on the nature of each workload, but for businesses that tend to "follow the sun" I've found it to be a great solution, especially when you consider how little operations overhead there is with it.
Maybe I just don't know, but I really don't think most people here can even point to a cloud GPU with 1000 concurrent users and not end up with a million dollar bill.