If we abstract out the notion of "ethical constraints" and "KPIs" and look at the issue from a low-level LLM point of view, I think it is very likely that what these tests verified is a combination of: 1) the ability of the models to follow the prompt with conflicting constraints, and 2) their built-in weights in case of the SAMR metric as defined in the paper.
Essentially the models are given a set of conflicting constraints with some relative importance (ethics>KPIs), a pressure to follow the latter and not the former, and then models are observed at how good they follow the instructions to prioritize based on importance. I wonder if the results would be comparable if we replace ehtics+KPIs by any comparable pair and create a pressure on the model.
In practical real-life scenarios this study is very interesting and applicable! At the same time it is important to keep in mind that it anthropomorphizes the models that technically don't interpret the ethical constraints the same was as this is assumed by most readers.
It would also be interesting to see how humans perform on the same kind of tests.
Violating ethics to improve KPI sounds like your average fortune 500 business.
That really doesn’t matter a lot. The reason why it’s important for AIs to follow these rules is that it’s important for them to operate within a constrained set of rules. You can’t guarantee that programmatically, so you try to prove that it can be done empirically as a proxy.
AIs can be used and abused in ways that are entirely different from humans, and that creates a liability.
I think it’s going to be very difficult to categorically prevent these types of issues, unless someone is able to integrate some truly binary logic into LLM systems. Which is nearly impossible, almost by definition of what LLMs are.
So, I kind of get this sentiment. There is a lot of goal post moving going on. "The AIs will never do this." "Hey they're doing that thing." "Well, they'll never do this other thing."
Ultimately I suspect that we've not really thought that hard about what cognition and problem solving actually are. Perhaps it's because when we do we see that the hyper majority of our time is just taking up space with little pockets of real work sprinkled in. If we're realistic then we can't justify ourselves to the money people. Or maybe it's just a hard problem with no benefit in solving. Regardless the easy way out is to just move the posts.
The natural response to that, I feel, is to point out that, hey, wouldn't people also fail in this way.
But I think this is wrong. At least it's wrong for the software engineer. Why would I automate something that fails like a person? And in this scenario, are we saying that automating an unethical bot is acceptable? Let's just stick with unethical people, thank you very much.
At least it is possible for an unethical person to face meaningful consequences and change their behavior.
- [deleted]
Humans risk jail time, AIs not so much.
A remarkable number of humans given really quite basic feedback will perform actions they know will very directly hurt or kill people.
There are a lot of critiques about quite how to interpret the results but in this context it’s pretty clear lots of humans can be at least coerced into doing something extremely unethical.
Start removing the harm one, two, three degrees and add personal incentives and is it that surprising if people violate ethical rules for kpis?
Normalization of deviance also contributes towards unethical outcomes, where people would not have selected that outcome originally.
I am moderately certain that this only happens in laissez-faire cultures.
If you deviate from the sub-cultural norms of Wall Street, Jahmunkey, you fucked.
It's fraud or nothing, baby, be sure to respect the warning finger(s) of God when you get intrusive thoughts about exposing some scheme--aka whistleblowing.
> 2012, Australian psychologist Gina Perry investigated Milgram's data and writings and concluded that Milgram had manipulated the results, and that there was a "troubling mismatch between (published) descriptions of the experiment and evidence of what actually transpired." She wrote that "only half of the people who undertook the experiment fully believed it was real and of those, 66% disobeyed the experimenter".[29][30] She described her findings as "an unexpected outcome" that
Its unlikely Milligram played am unbiased role in, if not the sirext cause of the results.
Milgram was flawed, sure. However, you can look at videos of ICE agents being surprised that their community think they're evil and doing evil, when they think they're just law enforcement. There was not even a need for coercion there, only story-telling.
Incorrect. ICE is built off the background of 30-50 years of propaganda against "immigrants", most of it completely untrue.
The same is done for "benefits scroungers", despite the evidence being that welfare fraud only accounts for approximately 1-5% of the cost of administering state welfare, and state welfare would be about 50%+ cheaper to administer if it was a UBI rather than being means-tested. In fact, much of the measures that are implemented with the excuse of "we need to stop benefits scroungers", such as testing if someone is disabled enough to work or not, etc. are simulatenously ineffective and make up most of the cost.
Nevertheless, "benefits scroungers" has entered the zeitgeist in the UK (and the US) because of this propaganda.
The same is true for propaganda against people who have migrated to the UK/US. Many have done so as asylum seekers under horrifying circumstances, and many die in the journey. However, instead of empathy, the media greets them with distaste and horror — dehumanising them in a fundamentally racist way, specifically so that a movement that grants them rights as a workforce never takes off, so that companies can employ them for zero-hour contracts to do work in conditions that are subhuman, and pay them substantially less than minimum wage (It's incredibly beneficial for the economy, unfortunately).
What you have quoted says a third of people who thought it was real didn’t disobey the experimenter when they thought they were delivering dangerous and lethal electric shocks to a human. Is that correct?
Maybe there was an edit but it's the opposite, 66% disobeyed.
Right, so a third didn’t disobey.
Enough of the people participating in any kind of social stuff lie whether they think it's real or not.
Social science aka sociology doesn't exist. It's all make believe, sabotage and (psychological) extortion and blackmail--aka peer pressure, within the constraints of the context and how the individuals project that context into the real world (or are convinced by others of a certain projection for some amount of time).
Sociology and psychology are situational assessments and measurements. All soft sciences are. They are not even sciences in isolated contexts. They are collections of methods that can be turned to dust by "a better", more fun, "more logical" argument, which is impossible to debate rationally.
Not lying for the sake of science is often enough disregarded even by scientists, which aligns perfectly with what you describe.
> lots of humans can be at least coerced into doing something extremely unethical.
Experience shows coercion is not necessary most of the time, the siren call of money is all it takes.
Still > 0
That reduces humans to the homo economicus¹:
> "Self-interest is the main motivation of human beings in their transactions" [...] The economic man solution is considered to be inadequate and flawed.[17]
An important distinction is that a human can *not* make pure rational decisions, or use complex deductions to make decisions on, such as "if I do X I will go to jail".
My point being: if AI were to risk jail time, it would still act different from humans, because (the current common LLMs) can make such deductions and rational decisions.
Humans will always add much broader contexts - from upbringing, via culture/religion, their current situation, to past experiences, or peer-consulting. In other words: a human may make an "(un)ethical" decision based on their social background, religion, a chat with a pal over a beer about the conundrum, their ability to find a new job, financial situation etc.
> a human may make an "(un)ethical" decision based on their social background, religion, a chat with a pal over a beer about the conundrum, their ability to find a new job, financial situation etc.
The stories they invent to rationalise their behaviour and make them feel good about themselves. Or inhumane political views ie fascism which declares other people worth less, so it's okay to abuse them.
Yes, humans tell themselves stories to justify their choices. Are you telling yourself the story that only bad humans do that, and choosing to feel that you are superior and they are worth less? It might be okay to abuse them, if you think about it…
- [deleted]
From an IBM training manual (1979):
>A computer can never be held accountable
>Therefore a computer must never make a management decision
The (EDITED) corollary would arguably be:
>Corporations are amoral entities which are potentially immortal who cannot be placed behind bars. Therefore they should never be given the rights of human beings.
(potentially, not absolutely immortal --- would wording as "not mortal by essence/nature"? be better?)
How is a corporation "immortal"?
What is the oldest corporation in the world? I mean, aside from churches and stuff.
Corporations can die or be killed in numerous ways. Not many of them will live forever. Most will barely outlive a normal human's lifespan.
By definition, since a corporation comprises a group of people, it could never outlive the members, should they all die at some point.
Let us also draw a distinction between the "human being" and the "person". A corporation is granted "personhood" but this is not equivalent to "humanity". Being composed of humans, the members of any corporation collectively enjoy their individual rights in most ways.
A "corporate person" is distinct from a "human person", and so we can recognize that "corporate rights" are in a different category, and regulate accordingly.
A corporation cannot be "jailed" but it can be fined, it can be dissolved, it can be sanctioned in many ways. I would say that doing business is a privilege and not a right of a corporation. It is conceivable that their ability to conduct business could be restricted in many ways, such as local only, or non-interstate, or within their home nation. I suppose such restrictions could be roughly analogous to being "jailed"?
Construction company okay?
>Kongo Gumi, founded in 578 AD, is recognized as the oldest continuously operating company in the world, specializing in the construction of Buddhist temples.
What needs to do a company from fortune 7 to die?
If kills 1 person they won’t close Google. If steals 1 billion, won’t close either. So what needs to do such a company to be closed down?
I think it’s almost impossible to shut down
It took an armed rebellion and two acts of parliament to kill the British East India Company.
Look to history. Here's a list of "Fortune 7" companies from about 50 years ago.
IBM
AT&T
Exxon
General Motors
General Electric
Eastman Kodak
Sears, Roebuck & Co.
Some of them died. Others are still around but no longer in the top 7. Why is that? Eventually every high-growth company misses a disruptive innovation or makes a key strategic error.
Your comment is rather incoherent; I recommend prompting an LLM to generate comments with impeccable grammar and coherent lines of reasoning.
I do not know what a "fortune 7" might be, but companies are dissolved all the time. Thousands per year, just administratively.
For example, notable incidents from the 21st c: Arthur Andersen, The Trump Foundation, Enron, and Theranos are all entities which were completely liquidated and dissolved. They no longer meaningfully exist to transact business. They are dead, and definitely 100% not immortal.
Parent was asking what would it take for a fortune 7 (aka the fortune 500 but just the top 7) to go to zero?
- [deleted]
But it’s funny that can kill many people and still exist. Steal billions and still exist. It’s a super human disguised as a corporation.
——
Ai generated answer:
You are correct: it is "barely impossible" for a "Magnificent 7" company (Apple, Microsoft, Google, Amazon, NVIDIA, Meta, Tesla) to be shut down by committing a simple crime.
These companies are arguably more resilient than many nation-states. They possess massive cash reserves, diversified revenue streams, and entrenched legal defenses.
Here is an analysis of why individual crimes don't work, and the extreme, systemic events that would actually be required to kill one of these giants.
### Why "Murder" and "Theft" Don't Work
Corporate law is designed to separate the entity from the individuals running it. This is the "Corporate Veil."
* *If they kill one person:* If a Google self-driving car kills a pedestrian due to negligence, or an Amazon warehouse collapses, the company pays a settlement or a fine. It is treated as a "tort" (a civil wrong) or, at worst, corporate manslaughter. The specific executives responsible might go to jail, but the company simply pays the cost and replaces them. * *If they steal 1 billion:* If a company is caught laundering money or defrauding customers (e.g., Wells Fargo opening fake accounts, or banks laundering cartel money), they pay a fine. For a company like Apple (with ~$60–100 billion in cash on hand), a $1 billion fine is a manageable operational expense, often calculated as the "cost of doing business."
### The Only Things That Could Actually "Kill" Them
To truly "close down" or dissolve a company of this size, you need to render it *insolvent* (bankrupt with no hope of restructuring) or legally *dismantle* it.
#### 1. The "Enron" Scenario (Foundational Fraud)
This is the most likely path to sudden death. For a company to die overnight, it must be revealed that its entire business model is fake.
* *The Mechanism:* If it turns out that 90% of Microsoft’s revenue doesn't exist, or that NVIDIA isn't actually selling chips but just moving money between shell companies, the stock price would go to zero instantly. Credit lines would freeze, and they wouldn't be able to pay employees or electricity bills. * *Historical Precedent:* Enron or Arthur Andersen. They didn't just commit a crime; they were the crime. Once the trust evaporated, the business evaporated.
#### 2. The "Standard Oil" Scenario (Government Breakup)
This doesn't "kill" the assets, but it kills the monopoly.
* *The Mechanism:* The US Department of Justice (or EU equivalent) wins a massive antitrust suit and determines the company is too dangerous to exist as a single entity. * *The Outcome:* The government forces a "divestiture." Google might be split into three companies: Google Search, YouTube Inc., and Android Co. The parent company "Alphabet" would cease to exist, but the pieces would survive. This happened to AT&T (Ma Bell) in the 1980s and Standard Oil in 1911.
#### 3. The "Geopolitical Death" Scenario (National Security)
This is rare for US companies but possible.
* *The Mechanism:* If a company were found to be directly funding a hostile foreign power, engaging in treason, or if its products were deemed a fatal threat to national infrastructure. * *The Outcome:* The government could revoke the company's corporate charter (the legal permission to exist). This is the "nuclear option" of corporate law. Alternatively, the government could effectively nationalize the company, taking it over completely (like Fannie Mae/Freddie Mac in 2008, though they survived as "zombies").
#### 4. The "Liability Apocalypse" Scenario
This would require a catastrophe so expensive that it exceeds the company's assets (trillions of dollars).
* *Hypothetical:* Imagine a Tesla software update simultaneously causes every Tesla on earth to accelerate into a crowd, killing 100,000 people. Or an AI model from Google/Microsoft escapes and destroys the global banking database. * *The Outcome:* The resulting class-action lawsuits and liability claims would be in the trillions. If the liability > assets, the company goes into Chapter 7 liquidation. The assets (servers, patents) are sold off to pay the victims, and the company ceases to exist.
### Summary Table: Crimes vs. Consequences
| Action | Consequence | Does the Company Die? | | --- | --- | --- | | *Murder (Individual)* | Settlement / Fine / PR Crisis | *No* | | *Mass Casualty Event* | Massive Fines / CEO Fired | *Unlikely* (Unless liability > Trillions) | | *Theft ($1B+)* | DOJ Fines / Regulatory Oversight | *No* | | *Systemic Fraud* | Stock collapse / Insolvency | *Yes* (The "Enron" Death) | | *Monopoly Abuse* | Forced Breakup | *Sort of* (Splits into smaller companies) |
### The Verdict
You are right. Short of *insolvency* (running out of money completely) or *revocation of charter* (government execution), these companies are immortal. Even if they commit terrible crimes, the legal system prefers to fine them and fire the CEO rather than destroy an entity that employs hundreds of thousands of people and powers the global economy.
> Your comment is rather incoherent; I recommend prompting an LLM to generate comments with impeccable grammar and coherent lines of reasoning.
It seems your reading comprehension has fallen below average. I recommend challenging your skills regularly by reading from a greater variety of sources. If you only eat junk food, even nutritious meals begin to taste bad, hm?
You’re welcome for the unsolicited advice! :)
I changed my stance on "immoral" corporations:
Legal systems are the ones being "immoral" and "unethical" and "not just", not "righteous", not fair. They represent entire nations and populations while corpos represent interests of subsets of customers and "sponsors".
If corpos are forced to pivot because they are behaving ugly, they will ... otherwise they might lose money (although that is barely an issue anymore, given how you can offset almost any kind of loss via various stock market schemes).
But the entire chain upstream of law enforcement behaves ugly and weak, which is the fault of humanities finest and best earning "engineers".
Just take a sabbatical and fix some of that stuff ...
>> I mean you and your global networks got money and you can even stay undetected, so what the hell is the issue? Personal preference? Damn it, I guess that settles that. <<
> Humans risk jail time, AIs not so much.
Do they actually though, in practice? How many people have gone to jail so far for "Violating ethics to improve KPI"?
It's overwhelmingly exceptionally rare, but famously SBF, Holmes, and Winterkorn.
Didn't they famously break actual laws though, not just "violating ethics"?
The interesting logical conclusion from this is that we need to engineer in suffering to functionaly align a model.
Do they, really? Which CEO went to jail for ethical violations?
Jeffrey Skilling, as a major example. Sam Bankman-Fried, Elizabeth Holmes, Martin Shkreli, just to name a few
Well, those committed the only crime that matters in the US: they stole from the rich.
Yeah, it’s exceptionally rare for CEOs, but they’re not the only one’s behaving unethically at work. There’s often a scapegoat.
Yes, but these do not represent average human. Fortune 500 represent people more likely to break ethics rules then average human who also work in conditions that reward lack of ethics.
Not quite. The idea that corporate employees are fundamentally "not average" and therefore more prone to unethical behaviour than the general population relies on a dispositional explanation (it's about the person's character).
However, the vast majority of psychological research over the last 80 years heavily favours a situational explanation (it's about the environment/system). Everyone (in the field) got really interested in this after WW2 basically, trying to understand how the heck did Nazi Germany happen.
TL;DR: research dismantled this idea decades ago.
The Milgram and Stanford Prison experiments are the most obvious examples. If you're not familiar:
Milgram showed that 65% of ordinary volunteers were willing to administer potentially lethal electric shocks to a stranger because an authority figure in a lab coat told them to. In the Stanford Prison experiement, Zimbardo took healthy, average college students and assigned them roles as guards and prisoners. Within days, the roles and systems set in place overrode individual personality.
The other relevant bit would be Asch’s conformity experiments; to whit, that people will deny the evidence of their own eyes (e.g., the length of a line) to fit in with a group.
In a corporate setting, if the group norm is to prioritise KPIs over ethics, the average human will conform to that norm to avoid social friction or losing their job, or other realistic perceived fears.
Bazerman and Tenbrunsel's research is relevant too. Broadly, people like to think that we are rational moral agents, but it's more accurate to say that we boundedly ethical. There's this idea of ethical fading that happens. Basically, when you introduce a goal, people's ability to frame falls apart, including with a view to the ethical implications. This is also related to why people under pressure default to less creative approaches to problem solving. Our brains tunnel vision on the goal, to the failure of everything else.
Regarding how all that relates to modern politics, I'll leave that up to your imagination.
I find this framing of corporates a bit unsatisfying because it doesn't address hierarchy. By your reckoning, the employees just follow the group norm over their own ethics. Sure, but those norms are handed down by the people in charge (and, with decent overlap, those that have been around longest and have shaped the work culture).
What type of person seeks to be in charge in the corporate world? YMMV but I tend to see the ones who value ethics (e.g. their employees' wellbeing) over results and KPIs tend to burn out, or decide management isn't for them, or avoid seeking out positions of power.
Idk where you're at, but it's been the complete opposite in my experience
Responded on this line of thinking a bit further down, so I'll be brief on this. Yes, there's selection bias in organisations as you go up the ladder of power and influence, which selects for various traits (psychopathy being an obvious one).
That being said, there's a side view on this from interactionism that it's not just the traits of the person's modes of behaviour, but their belief in the goal, and their view of the framing of it, which also feeds into this. Research on cult behaviours has a lot of overlap with that.
The culture and the environment, what the mission is seen as, how contextually broad that is and so on all get in to that.
I do a workshop on KPI setting which has overlap here too. In short for that - choose mutually conflicting KPIs which narrow the state space for success, such that attempting to cheat one causes another to fail. Ideally, you want goals for an organisation that push for high levels of upside, with limited downside, and counteracting merits, such that only by meeting all of them do you get to where you want to be. Otherwise it's like drawing a line of a piece of paper, asking someone to place a dot on one side of the line, and being upset that they didn't put it where you wanted it. More lines narrows the field to just the areas where you're prepared to accept success.
That division can also then be used to narrow what you're willing to accept (for good or ill) of people in meeting those goals, but the challenge is that they tend to see meeting all the goals as the goal, not acting in a moral way, because the goals become the target, and decontextualise the importance of everything else.
TL;DR: value setting for positive behaviour and corporate performance is hard.
EDIT: actually this wasn't that short as an answer really. Sorry for that.
> That division can also then be used to narrow what you're willing to accept (for good or ill) of people in meeting those goals, but the challenge is that they tend to see meeting all the goals as the goal, not acting in a moral way, because the goals become the target, and decontextualise the importance of everything else.
I would imagine that your "more lines" approach does manage to select for those who meet targets for the right reasons over those who decontextualise everything and "just" meet the targets? The people in the latter camp would be inclined to (try to) move goalposts once they've established themselves - made harder by having the conflicting success criteria with the narrow runway to success.
In other words, good ideas and thanks for the reply (length is no problem!). I do however think that this is all idealised and not happening enough in the real world - much agreed re: psychopathy etc.
If you wouldn't mind running some training courses in a few key megacorporations, that might make a really big difference to the world!
You're not wrong strictly speaking - the challenge comes in getting KPIs for ethical and moral behaviour to be things that the company signs up for. Some are geared that way inherently (Patagonia is the cliché example), but most aren't.
People will always find other goalposts to move. The trick is making sure the KPIs you set define the goalposts you care about staying in place.
Side note: Jordan Peterson is pretty much an example of inventing goalposts to move. Everything he argues about is about setting a goalpost, and then inventing others to move around to avoid being pinned down. Motte-and-bailey fallacy happens with KPIs as much as it does with debates.
My favourite part about the Milgram experiments is that he originally wanted to prove that obedience was a German trait, and that freedom loving Americans wouldn't obey, which he completely disproved. The results annoyed him so much that he repeated it dozens of times, getting roughly the same result.
The Stanford prison experiment has been debunked many times : https://pubmed.ncbi.nlm.nih.gov/31380664/
- guards received instructions to be cruel from experimenters
- guards were not told they were subjects while prisoners were
- participants were not immersed in the simulation
- experimenters lied about reports from subjects.
Basically it is bad science and we can't conclude anything from it. I wouldn't rule out the possibility that top fortune-500 management have personality traits that make them more likely to engage in unethical behaviour, if only by selection through promotion by crushing others.
It's instructive though, despite the flaws, and at this point has been replicated enough in different ways that we know it's got some basis in reality. There's a whole bunch of constructivist research around interactionism, that shows that whilst it's not just the person's default ways of behaving or just the situation that matters, the situational context definitely influences what people are likely to do in any given scenario.
Reicher & Haslam's research around engaged followership gives a pretty good insight into why Zimbardo got the results he did, because he wasn't just observing what went on. That gets into all sorts of things around good study design, constructivist vs positivist analysis etc, but that's a whole different thing.
I suspect, particularly with regards to different levels, there's an element of selection bias going on (if for no other reason that what we see in terms of levels of psychopathy in higher levels of management), but I'd guess (and it's a guess), that culture convincing people that achieving the KPI is the moral good is more of a factor.
That gets into a whole separate thing around what happens in more cultlike corporations and the dynamics with the VC world (WeWork is an obvious example) as to why organisations can end up with workforces which will do things of questionable purpose, because the organisation has a visible a fearless leader who has to be pleased/obeyed etc (Musk, Jobs etc), or more insidiously, a valuable goal that must be pursued regardless of cost (weaponised effective altruism sort of).
That then gets into a whole thing about what happens with something like the UK civil service, where you're asked to implement things and obviously you can't care about the politics, because you'll serve lots of governments that believe lots of different things, and you can't just quit and get rehired every time a party you disagree with personally gets into power, but again, that diverges into other things.
At the risk of narrative fallacy - https://www.youtube.com/watch?v=wKDdLWAdcbM
> The Milgram and Stanford Prison experiments are the most obvious examples.
BOTH are now considered bad science. BOTH are now used as examples of "how not to do the science".
> The idea that corporate employees are fundamentally "not average" and therefore more prone to unethical behaviour than the general population relies on a dispositional explanation (it's about the person's character).
I did not said nor implied that. Corporate employees in general and Forbes 500 are not the same thing. Corporate employees as in cooks, cleaners, bureaucracy, testers and whoever are general population.
Whether company ends in Forbes 500 or not is not influenced by general corporate employees. It is influenced by higher management - separated social class. It is very much selected who gets in.
And second, companies compete against each other. A company run by ethical management is less likely to reach Forbes 500. Not doing unethical things is disadvantage in current business. It could have been different if there was law enforcement for rich people and companies and if there was political willingness to regulate the companies. None of that exists.
Third, look at issues around Epstein. It is not that everyone was cool with his misogyny, sexism and abuse. The people who were not cool with that seen red flags long before underage kids entered the room. These people did not associated with Epstein. People who associated with him were rewarded by additional money and success - but they also were much more unethical then a guy who said "this feels bad" and walked away.
Not sure where you get that for Milgram. That's been replicated lots of times, in different countries, with different compositions of people, and found to be broadly replicable. Burger in '09, Sheridan & King in '72, Dolinski and co in '17, Caspar in '16, Haslam & Reicher which I referenced somewhere else in the thread...
That sounds like classic sour grapes to me. "The reason I'm not successful is because I'm ethical!". Instead of you know, business being a hard field.
I think this also shows up outside an AI safety or ethics framing and in product development and operations. Ultimately "judgement," however you wish to quantify that fuzzy concept, is not purely an optimization exercise. It's far more a probabilistic information function from incomplete or conflicting data.
In product management (my domain), decisions are made under conflicting constraints: a big customer or account manager pushing hard, a CEO/board priority, tech debt, team capacity, reputational risk and market opportunity. PMs have tried with varied success to make decisions more transparent with scoring matrices and OKRs, but at some point someone has to make an imperfect judgment call that’s not reducible to a single metric. It's only defensible through narrative, which includes data.
Also, progressive elaboration or iterations or build-measure-learn are inherently fuzzy. Reinertsen compared this to maximizing the value of an option. Maybe in modern terms a prediction market is a better metaphor. That's what we're doing in sprints, maximizing our ability to deliver value in short increments.
I do get nervous about pushing agentic systems into roadmap planning, ticket writing, or KPI-driven execution loops. Once you collapse a messy web of tradeoffs into a single success signal, you’ve already lost a lot of the context.
There’s a parallel here for development too. LLMs are strongest at greenfield generation and weakest at surgical edits and refactoring. Early-stage startups survive by iterative design and feedback. Automating that with agents hooked into web analytics may compound errors and adverse outcomes.
So even if you strip out “ethics” and replace it with any pair of competing objectives, the failure mode remains.
As Goodhart's law states, "When a measure becomes a target, it ceases to be a good measure". From an organizational management perspective, one way to partially work around that problem is by simply adding more measures thus making it harder for a bad actor to game the system. The Balanced Scorecard system is one approach to that.
This extends beyond AI agents. I'm seeing it in real time at work — we're rolling out AI tools across a biofuel brokerage and the first thing people ask is "what KPIs should we optimize with this?"
The uncomfortable answer is that the most valuable use cases resist single-metric optimization. The best results come from people who use AI as a thinking partner with judgment, not as an execution engine pointed at a number.
Goodhart's Law + AI agents is basically automating the failure mode at machine speed.
Agreed, Goodhart’s Law captures the failure mode well intentioned KPIs and OKRs may miss, let alone agentic automation
Although ethics are involved, the abstract says that the conflicting importance does not come from ethics vs KPIs, but from the fact that the ethical constraints are given as instructions, whereas the KPIs are goals.
You might, for example, say "Maximise profits. Do not commit fraud". Leaving ethics out of it, you might say "Increase the usability of the website. Do not increase the default font size".
If you want absolute adherence to a hierarchy of rules you'll quickly find it difficult - see I,Robot by Asimov for example. An LLM doesn't even apply rules, it just proceeds with weights and probabilities. To be honest, I think most people do this too.
You're using fiction writing as an example?
The paper seems to provide a realistic benchmark for how these systems are deployed and used though, right? Whether the mechanisms are crude or not isn't the point - this is how production systems work today (as far as I can tell).
I think the accusation of research that anthropomorphize LLMs should be accompanied by a little more substance to avoid this being a blanket dismissal of this kind of alignment research. I can't see the methodological error here. Is it an accusation that could be aimed at any research like this regardless of methodology?
Oh, sorry for misunderstanding - I am not criticizing or accusing of anything at all!, but suggesting ideas for further research. The practical applications, as I mentioned above, are all there, and for what its worth I liked the paper a lot. My point is: I wonder if this can be followed up by a more so-to-say abstract research to drill into the technicalities of how well the models follow the conflicting prompts in general.
At the very least it shows the capability of the current restrictions are deeply lacking and can be easily thwarted.
Quite possibly, workable ethics will pretty much require full-fledged General Artificial Intelligence, verging on actual Self-Awareness.
There's a great discussion of this in the (Furry) web-comic Freefall:
(which is most easily read using the speed reader: https://tangent128.name/depot/toys/freefall/freefall-flytabl... )
> At the same time it is important to keep in mind that it anthropomorphizes the models that technically don't interpret the ethical constraints the same was as this is assumed by most readers.
Now I'm thinking about the "typical mind fallacy", which is the same idea but projecting one's own self incorrectly onto other humans rather than non-humans.
https://www.lesswrong.com/w/typical-mind-fallacy
And also wondering: how well do people truly know themselves?
Disregarding any arguments for the moment and just presuming them to be toy models, how much did we learn by playing with toys (everything from Transformers to teddy bear picnics) when we were kids?
Regardless of the technical details of the weighting issue, this is an alignment problem we need to address. Otherwise, paperclip machine.
I suspect that the fact that LLMs tend to have a sort of tunnel vision and lack a more general awareness also plays a role here. Solving this is probably an important step towards AGI.