Am I crazy or does all of that look ridiculously over engineered for what they actually provide? It looks like the 4-5 devs wanted to build something fancy like the big boys would, without having the manpower to deal with the overhead.
These kinds of issues usually arise because complex technologies are introduced, mostly by following some basic tutorials and light googling, without anyone actually understanding what that random NPM package (speaking a protocol of which they have at best a rudimentary understanding) actually does to communicate with the rust crate the other guy pulled.
I don't doubt their entire service could be a monolithic, small, and easily comprehensible node app running on some consumer PC hardware at the company HQ. You're never going to outgrow that in their business. It'd likely run off a macbook with some engineering discipline.
Instead it's probably a confusing mess of microservices in a Kubernetes cluster, each running in its own Docker container for "isolation", glued together with some YAML magic and a few bash scripts, tunneling XMPP over gRPC "because it's faster", behind an Istio mesh someone half-configured, talking to a bunch of managed cloud services across AWS and GCP "for redundancy", with Redis caches scattered around "just in case", logs streaming into three different observability tools (none of them fully set up), CI/CD powered by GitHub Actions triggering Terraform deployments through a Slack bot, autoscaling turned on "with default settings", and of course there's a blockchain component for audit logs - though no one remembers why - and a colocated 96-core fifteen-thousand dollar server running a cron job that updates a config file in S3 every hour "to keep things in sync".
Too bad the entire thing relies on those JIDs containing PII now, which everyone is afraid of changing. The solution? Slap another micro-service in front that translates them to something else. Devs have been unsuccessfully trying to get exactly that deployed for weeks now. But cut them some slack: getting shit done is hard when you're overqualified for your job.
You absolutely nailed it. As the researcher who found these vulns, I can confirm the over-engineering is real.
They literally had internal user IDs (ofId) already implemented and working, but kept the email-based JIDs for "legacy support." The entire XMPP system could have used these internal IDs from day one.
The "14 months to fix" claim was even more ridiculous when you realize the fix was just... using the IDs they already had. No architectural changes needed. They even admitted they had a 1-month fix ready but chose not to deploy it.
Your microservice translation layer guess is scary accurate - that's essentially what their "v2" endpoints were trying to do. They created new HTTP endpoints that used internal JIDs instead of email-based ones, but the XMPP layer still exposed everything, making the whole effort pointless.
The best part? After going public, they implemented the "impossible" fix in 48 hours. Turns out you don't need 14 months when the Internet is watching.