"The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations."
As might any plaintiff. NYT might be the first of many others and the lawsuits may not be limited to copyright claims
Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
What is the purpose of OpenAI storing millions of private conversations
By contrast the purpose of NYT's request is both clear and limited
The documents requested are not being made public by the plaintiffs. The documents will presumably be redacted to protect any confidential information before being produced to the plaintiffs, the documents can only be used by the plaintiffs for the purpose of the litigation against OpenAI and, unlike OpenAI who has collected and stored these conversations for as long as OpenAI desires, the plaintiffs are prohibited from retaining copies of the documents after the litigation is concluded
The privacy issue here has been created by OpenAI for their own commercial benefit
It is not even clear what this benefit, if any, will be as OpenAI continues to search for a "business model"
Wanton data collection
NB. There is no order to "collect". The order is to preserve what is already being collected and stored in the ordinary course of business
https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...
https://ia801404.us.archive.org/31/items/gov.uscourts.nysd.6...
Why does OpenAI collect and retain for 30 days^1 chats that the user wants to be deleted
It was doing this prior to being sued by the NYT and many others
OpenAI was collecting chats even when the user asked for deletion, i.e., the user did not want them saved
That's why a lawsuit could require OpenAi to issue a hold order, retain these chats for longer and produce them to another party in discovery
If OpenAI was not collecting these chats in the ordinary course of its business before being sued by the NYT and many others, then there would be no "deleted chats" for OpenAI to be compelled by court order to retain and produce to the plaintiffs
1. Or whatever period OpenAI decides on. It could change at any time for any reason. However OpenAI cannot change their retention policy to some shortened period after being sued. Google tried this a few years ago. It began destroying chats between employees after Google was on notice it was going to be sued by the US government and state AGs
I'd trust Sam Altman about as far as I could throw him and there is absolutely no way OpenAI should be having sensitive private conversations with anybody. Sooner or later all that data will end up with Microsoft who can then correlate it with a ton of data they already have from other sources (windows, office online, linkedin, various communications services including 'teams', github and so on).
This is an intelligence service's wet dream.
> […] there is absolutely no way OpenAI should be having sensitive private conversations with anybody. Sooner or later all that data will end up with Microsoft who can then […]
I don't think you even need to go as far as to Microsoft (who have earned zero points in the Privacy Protection league), just have a look at Altmans "I want to create a biometric database of every human" Orb/World-coin eye-scanning project: https://www.ft.com/content/0c5c2b8d-b185-40b6-9221-b80ee130b...
- [deleted]
- [deleted]
I'm not commenting on the core point of your comment, only the "why retain for 30 days" question.
Im an age of automated backups and failovers, deleting can be really hard. Part of the answer could simply be that syncing a delete across all the redundancies (while ensuring those redundancies are reliable when a disaster happens and they need to recover or maintain uptime) may take days to weeks. Also the 30 days could be the limit, as oppose to the average or median time it takes.
The most likely explanation is whatever storage solution they’re using has a built in “recycle bin” functionality and deleted data stays the for 30 days before it’s actually deleted. I see this a lot in very large databases. The recycle bin functionality is built in to the data store product.
I'm doubtful that a data store product used at their scale can't be configured to not keep data for 30 days; for large clients that could be TB of deleted data or more. This would be neither cheap or easy to manage.
oh i realize that but deviating from those defaults they have now would require so much testing and all the risk that goes along with it that they'll avoid it at all costs.
That sounds very plausible.
The problem when dealing with any company that has proven itself untrustworthy is that by default the innocent "plausible" option is probably no longer the "likely" one.
And I say this knowing that intentionally deleting data is harder than it looks.
that doesn't sound quite right to me.
Something about game theory, art of war, and the difference between stated intentions and actual intentions.
Trustworthiness comes from alignment of stated intentions, actual intentions, abilities and actions. Someon can have integrity between stated and actual intentions, but fail to follow through. In this case I think we doubt the integrity between openais stated and actual intentions.
So Sam can be saying stuff and then we find out he wasn't being honest. We can learn over time about his intentions by watching actions instead of listening to what he says. Then we can make new assumptions based on what his actual intentions seem like.
Based on what I assume Sam's intentions to be (with some healthy suspicion of the alignment between his stated intentions and actual intentions), I'm still skeptical that the reason for the 30 day thing goes far beyond quality control, the difficulty of balancing deletion and redundancy and the features of the tech stack they are using.
> I'm not commenting on the core point of your comment, only the "why retain for 30 days" question. Im an age of automated backups and failovers, deleting can be really hard.
I doubt it's that. Deletion is hard, but it's not "exactly 30 days" hard.
The most likely explanation is that OpenAI wants the ability to investigate abuse and / or publicly-made claims ("ChatGPT told my underage kid to <x>!" / "ChatGPT praised Hitler!"). If they delete chats right away, they're flying blind and you can claim anything you want.
Now, whether you should have a "delete" button that doesn't really delete stuff is another question.
What is the standard way of being forced to restore from backup while ensuring deleted data does not also become restored? Is every delete request stored so that it can be replayed against any restore?
I have only had to manage this in a startup context with relatively low stakes and it was hard and messy. I don't know what best practice is at the scale that openai operates, but from my limited experience I have an intuition that the challenge is not trivial.
Also I suspect there is a big gap between best practice and common practice. My guess is common practice is dysfunctional. I would also suspect there is no standard way, but there are established practices within different technology stacks that vary between performative, barely compliant and effective at scale.
In one case I saw there was a substantial manual effort to load snapshots into instances run the delete and then save new snapshots. This was over 10 years ago though and it was more of a "we just need to get this done" than a "what's the most elegant way to do this at scale"
- [deleted]
> Why does OpenAI collect and retain for 30 days^1 chats that the user wants to be deleted
When working on an e-commerce gig we would get "delete my data" requests from customers, which we're legally obliged to comply with. A script would delete everything we could from the DB immediately. Since we had 30 day backups, their data would only be deleted from the backups on day 31. I think this was acceptable to the GDPR consultant.
Going in to the backups to delete their data there in insane.
> Going in to the backups to delete their data there in insane.
If I was legally obliged to delete data then I'd make sure I deleted, regardless of the purpose or location of the storage. If you can't handle a delete request you shouldn't collect the data in the first place.
People expect to see their past orders, save their address, keep a shopping cart, a list of favorites etc.
If you don't want your data online then don't put it there.
What you want to do is encrypt/anonymize per user information using a translation layer that also gets backed up. In case of a gdpr request, you delete this mapping / key and voila: data cleanup. The backup data becomes unusable.
But this obviously means building an extensive system to ensure the encoded identifier is the only thing used across your system (or a giant key management system).
In the past I’ve been a part of systems at exabyte scale that had to implement this. Hard but not impossible. I can see how orgs try to ‘legalese’ their way out of doing this though because the only forcing function is judicial.
- [deleted]
Maybe an append only data store where actual hard deletes only happen as an async batch job? Still 30 days seems really long for this.
The two documents you linked are responses to specific parts of OpenAI's objection. They're not good sources for the original order.
Nevertheless, you're generally correct but you don't realize why: A core feature of ChatGPT is that it keeps your conversation history right there so you can click on it, review it, and continue conversations across all of your devices. The court order is to preserve what is already present in the system even if the user asks to delete it.
For those who are confused: A core feature of ChatGPT and other LLM accounts is that your past conversations are available to return to, until you specifically delete them. The problem now is that if a user asks for the conversation to be deleted, OpenAI has to retain the conversation for the court order even though it appears deleted.
Is it possible to install ChatGPT on only one computer ("device")
Is it a requirement that ChatGPT users own multiple computers
Is it a requirement that ChatGPT users use ChatGPT on multiple computers
Is it true that a goal of online advertising services providers is to learn about all of an ad targets' computers and link them to a single identity
Is every software "feature" necessary
Are there "features" in some software that benefit software developers more than software users, e.g., through data colllection, surveilllance and advertising services
Should all software "features" chosen by developers be "opt-out", with default settings chosen by developers not users, or should some be "opt-in"
What if a "feature" chosen by a developer that no user ever requested cannot be implemented as "opt-in". Should users that do not wish to subject themselves to the "feature" use the software
Is ChatGPT chat history a "feature"
- [deleted]
- [deleted]
> What is the purpose of OpenAI storing millions of private conversations
Your previous ChatGPT conversations show up right in the ChatGPT interface.
They have to store the private conversations to enable users to bring them up in the interface.
This isn't a secretive, hidden data collection. It's a clear and obvious feature right in the product. They're fighting for the ability to not retain secret records of past conversations that have been deleted.
The problem with the court order is that it requires them to keep the conversations even after a user presses the 'Delete' button on them.
They could have been stored at the client, and encrypted before optionally synced back to OpenAI servers in a way that the stored chats can only be read back by the user. Signal illustrates how this is possible.
OpenAI made a choice in how the feature was and is implemented.
Signal does End-to-end encryption, so they (Signal) can never read it.
The whole point of ChatGPT conversations is so they can be read by the model on the server.
Conversations are kept around because they can be picked up and continued at any point (I use this feature frequently).
Additionally you can use conversations in their scheduled notification feature, where the conversation is replayed and updates are sent to you, all done on the server.
> OpenAI made a choice in how the feature was and is implemented.
Indeed they did, and it was a sensible choice given how the conversations are used.
You could definitely do this E2EE.
Models should run in ephemeral containers where data is only processed in RAM. For active conversation a unique and temporary key-pair is generated. Saved chats are encrypted client side and stored encrypted server side. To resume a conversation[0], decrypt client side, establish connection to container, generate new temporary key-pair, and so on. There's more details and nuances but this is very doable.
How Mullvad handles your data, for some inspiration: https://mullvad.net/en/help/no-logging-data-policy
I'm not sure why this is a problem. There's no requirement that data at rest needs be unencrypted. Nor is there a requirement that those storing the data need to have the keys to decrypt that data. Encrypted storage is a really common thing...> Conversations are kept around because they can be picked up and continued at any point (I use this feature frequently).
For this we can use the above scenario, or we can use a multi-key setting if you want to ping multiple devices, or you can have data temporarily decrypted. There is still no need to store the data to disk unencrypted or encrypted with keys OAI owns.> Additionally you can use conversations in their scheduled notification feature, where the conversation is replayed and updates are sent to you, all done on the server.Of course, I also don't see OAI pushing the state of Homomorphic Encryption forward either... But there's definitely a lot of research and more than acceptable solutions that allow data to be processed server side while being encrypted for as long as possible and making access to that data incredibly difficult.
Again, dive deep into how Mullvad does it. It is not possible for them to make all their data encrypted, but they make it as close to impossible to get, including by themselves. There doesn't need to be a perfect solution, but there's no real reason these companies couldn't restrict their own access to that data. There's only 2 reasons they are not doing so. Either 1) they just don't care enough about your privacy or 2) they want it for themselves. Considering how OpenAI pushes the "Scale is All You Need" narrative, and "scale" includes "data", I'm far more inclined to believe the reason is option 2.
[0] Remember, this isn't so much a conversation in the conventional sense. The LLMs don't "remember". You send them the entire chat history in each request. In this sense they are Markovian. It's not like they're tuning a model just to you. And even if they were, well we can store weights encrypted too. Doesn't matter if a whole model, LoRA, embeddings, or whatever. That can be encrypted at rest via keys OAI does not have access to.
Services like Mullvad and Signal are in the business of passing along messages between other parties; messages the service isn't a party to. With chatgpt chat histories, the user is talking directly to the service - you're suggesting the service should E2EE messages to and from itself, to prevent itself from spying on data generated by its own service?
You cannot compare these examples. There is currently no way to encrypt the user message and have the model on the server read/process the message without it being decrypted first.
Mullvad and E2EE Messengers do not need to process the contents of the message on their server. All they do is, passing it to another computer. It could be scrambled binary for all they care. But any AI company _has_ to read the content of the message by definition of their service.
It's a solved problem. Lumo.
Lumo never promises encryption while processing a conversation on their servers. Chats HAVE to be decrypted at some point on the server or send already decrypted by the client, even when they are stored encrypted.
Read the marketing carefully and you will notice that there is no word about encrypted processing, just storage - and of course that's a solved problem, because it was solved decades ago.
The agent needs the data decrypted, at least for the moment, I know of no model that can process encrypted data. So as long as the model runs on a server, whoever manages that server has access to your messages while they are being processed.
EDIT: Even found an article where they acknowledge this [0]. Even though there seems to exist models/techniques that can produce output from encrypted messages with 'Homomorphic Encryption' [1], it is not practical, as it would takedays to produce an answer and it would consumes huge amounts of processing power.
> Models should run in ephemeral containers where data is only processed in RAM
Maybe, but letting aside that they are two different kind of products, how can you trust them to really do so? And in any way, in the case of ChatGPT where should I store my client side private key, as I use those bots only in my web browser? Maybe in my password manager and I copy paste it every time I start a new conversation.
My take is that if they went this way we would not be talking about them now, we would be talking about one of their competitors that didn't put hurdles between their product and their customers.
In other words, survivor bias.
As it happens...
I built E2E encrypted LLMs using secure enclaves, so I know a bit about this space.
The tech works, for small LLMs - the sort of thing you can run on your mobile already. It isn't yet (?) there for LLMs the size of ChatGPT.
People are responding in this thread as if ChatGPT is a one-on-one conversation with another person. The data isn’t “shared” with OpenAI. You’re chatting with OpenAI. ChatGPT is just a service. There’s no way to use ChatGPT without sharing all of your chats with OpenAI, that’s what the entire product is.
This doesn’t sound realistic. Signal is end to end encrypted and only sends one message at a time, while ChatGPT needs the entire chat context for every message and they need to decrypt your messages in their services in order to feed them into the LLM.
this is what proton are doing with lumo[0]
> Our long-term roadmap includes advanced security features designed to keep your data private, including client-side encryption for your messages with ChatGPT. We believe these features will help keep your private conversations private and inaccessible to anyone else, even OpenAI.
This sort of thing is pretty trivial to implement from the start, they just chose not to because they wanted the data themselves
- [deleted]
Hah. I seriously doubt it is even close to trivial. Especially when they are to exist on any device you use the service from.
- [deleted]
No it's not. It's literally a court order mandating them to collect this data.
- [1] https://arstechnica.com/tech-policy/2025/08/openai-offers-20...
This article says nothing of the sort. The court order is to preserve existing logs they already have, not to disable logging, and hand all the logs over the plaintiffs. OpenAI's objections are mainly that 1/there are too many logs (so they're proposing a sample instead) and that 2/there's identifying data in the logs and so they are being "forced" to anonymize the logs at their expense (even though it's what they want as a condition of transferring the logs).
There is nothing in the article that mentions OpenAI being forced to create new logs they don't already have.
This response is misleading. Almost all computer services keep logs for a short period of time, so the court order to retain existing information is quite a bit more powerful than a layman would think. Because a huge amount of data is retained for a short period of time and then rapidly deleted in most web services I've worked on for the past 30 years.
This is true in services like Datadog, New Relic, and logging services like Splunk. But even privacy-focused services like Mullvad keep logs for 24 hours to monitor for abuse. So this concept that retaining logs is significantly weaker than not ordering the collection is really a bit of misdirection. I'm not sure whether it's intentional, but it's definitely misleading.
There is an important distinction that relates to a court’s ability to order a defendant to perform work to facilitate discovery. A court can order preservation of records, but they generally cannot order a defendant to create new ones. I was responding to your use of the word “collect,” which implies significantly more effort than merely not destroying logs (i.e. logging new information that they weren’t already).
It’s not misdirection or misleading; it lies in an understanding of the law. There’s plenty of case law out there on the subject if you’re interested.
Both are simply software changes. In one case, they're going to have to alter the software to not delete chats that users request to be deleted. In the other case, they'll alter the software to log new information. Neither of these are particularly difficult.
I understand, but the law still distinguishes between the two cases. In my experience, typically expunging is handled by a process separate from its creation (it depends on the logging framework, of course). And with the increasing trend of generated logs being ingested, processed, and stored by separate services, often disabling log deletion is a mere API call away.
Well, don’t get yourself sued and you won’t have to perform discovery for the plaintiffs.
[flagged]
If OpenAI truly didn't keep conversation records for any length of time, they would not be subject to this kind of order. Lots of stateless services get these and are able to defeat them because they never store the user's data. The fact that they store them at all means that they are in scope for a preservation order. It also means that they are in scope for all manner of usage by OpenAI themselves even if a user requests deletion.
It seems as if the court has forced OpenAI into collecting logs that they weren't otherwise collecting, or that they were deleting at user request.
So in this case not keeping logs as ordered by the court would be contempt of court.
Respectfully, it doesn’t matter the way it “seems,” it matters what is. They were collecting these logs, and as soon as they got the preservation order, they disabled deletion functionality and notified their customers of that.
There is a separate higher-tier private API customers can pay for that never had logging enabled, and the court did not force the company to add it.
- [deleted]
This is an excellent article and source. Thank you.
> What is the purpose of OpenAI storing millions of private conversations
Its needed for the conversation history feature, a core feature of the ChatGPT product
Its like saying "What is the purpose of Google Photos storing millions of private images"
This is true but why retain deleted conversations?
That's the objection: The court order requires them to retain everything they currently have, even if the user requests that it be deleted.
Because the New York Times sued them and made them.
ChatGPT (the app) specifically says they keep deleted conversations for up to 30 days. That's probably why.
yeah but the link states "The 20 million user conversations were randomly sampled from Dec. 2022 to Nov. 2024" so this makes no sense. 2024 was much longer than 30 days ago
Because the court ordered them to retain the records longer than they normally would.
>What is the purpose of OpenAI storing millions of private conversations
Have you used ChatGPT? Your conversation history is on the left rail
I read in the pleadings that OpenAI claims it cannot search its logs without decompressing them first
I can search the logs I keep without decompressing
Every user is different and each is free to use whatever software they want
"Have you used ChatGPT?"
No
Large number of upvotes on the quoted comment however. Maybe some of those voters are ChatGPT users
I do searching from the command line in text mode. The script I use keeps a "log" (a customised SERP) of all query strings and search result URLs. I also have these URLs stored in the logs from the forward proxy. These are compressed using RePair. I can search the compressed logs faster this way than with something like
orztsd -dc log.zst|grep patternrg -z pattern log.zst> No
Given that, I'd suggest not offering "alternatives" to the features described in TFA for a service you've never used. There are people here talking about oranges, a lot of them with domain expertise, and you're not just talking about apples, you're talking about bird migrations.
> No
Okay well it's a chat app where you chat directly with an LLM. The way LLMs work is you feed the entire chat history into it, and it generates the next message. Therefore, there's no way you can chat with it without storing the history. It's impossible
> Large number of upvotes on the quoted comment however.
Sure, and also downvotes - that measures factionalism, not correctness.
But tech wise, you're confused. Functionally speaking chatgpt is a shared document editor - the server needs to store chat histories for the same reason Google Docs stores the content of documents. Users can submit text to chatgpt.com from one browser, and later edit that text from the app or a different browser. Ergo the text is stored on the server, simple as that.
- [deleted]
Downvotes is a tiny faction
3 versus 190+, so far
Many commenters cannot distinguish rhetorical questions from questions that seek an answer
By attempting to answer a rhetorical question one may only strengthen the point being made by the question, for example, poor decision-making, and may reveal an absence self-awareness
Using RePair for compression I can also search inside compressed tarballs full of logs
To do this, I first insert a blank line at the top of each log file before adding to the tarball
IME, RePair is faster than compressing with zstd and the size reduction is almost the same
The only "catch" is that RePair requires more memory during compression
Pardon, but do you have a link for this RePair compressor?
Unfortunately, different searches for this RePair you mentioned have only revealed links to resources for repairing broken air compressors, damaged compressed files, spinal injuries, etc.
They made the feature, now they get to live with it. So they can spare us the feigned surprise and outrage.
Instead of writing open letters they could of course do something about it. Even Google stopped storing your location timeline on their servers and now have it per-device only.
We’re talking about two different things. It would be like Gmail not storing your emails. Expecting ChatGPT to not store your chats is ridiculous
> The documents requested are not being made public by the plaintiffs
In fact, as far as I understand it, they could not be made public by the plaintiffs even if they wanted to do so, or even if one of their employees decided to leak them.
That's because the plaintiffs themselves never actually see the documents. They will only be seen by the plaintiff's lawyers and any experts hired by those lawyers to analyze them.
You are correct. I've operated under many protective orders that require me to redact portions of reports clients paid for because they were not authorized to see those specific parts due to the order.
News Plaintiffs October 15, 2025 Letter Motion to Compel
https://ia801205.us.archive.org/1/items/gov.uscourts.nysd.61...
OpenAI October 30, 2025 Letter Opposing Motion to Compel
https://ia601205.us.archive.org/1/items/gov.uscourts.nysd.61...
November 7, 2025 Order on Motion to Compel
https://ia601205.us.archive.org/1/items/gov.uscourts.nysd.61...
"OpenAI has failed to explain how its consumers privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAIs exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.1
1. As News Plaintiffs point out, OpenAI has spent the last two and a half months processing and deidentifying this 20 million record sample. (ECF 719 at 1 n.1)."
If an analogy to the history of search engines can be made,^1 then we know that log retention policies in the US can change over time. The user has no control over such changes
https://ide.mit.edu/wp-content/uploads/2018/01/w23815.pdf
Companies operating popular www search engines might claim that the need for longer retention is "to provide better service" or some similar reason that focuses on users' interests rather than the company's interests^2
2. Generally, advertising services
This paper attempts to expose such claims as bogus
1. According to some reports OpenAI is sending some queries to Google
Amusingly, this discussion thread is filled with replies that attempt to "answer" the question of "why" OpenAI collects chat histories even when it must have known it would be sued for copyright infringment
For users affected by OpenAI's conduct, an "answer" makes no difference. Anyone can construct any "answer" they want and we can see that in this thread. For users affected by OpenAI's conduct, it does not matter
In the above paper on search engines, the claim was that longer retention of sensitive data leads to better search. This was the "answer" presented in response to the question of "why"
But the "answer" is only misdirection. The companies have no reputation for being honest and their operations are non-transparent. Accordingly, user focus will be on the consequences for users of the company's practices, not "why"
Some readers are probably too young to have read through the AOL search data
https://en.wikipedia.org/wiki/AOL_search_log_release
Did anyone care "why" AOL released the data
IMHO, it is unfortunate that papers like the one above need to published
The question of "why" is rhetorical. It is meant to the draw attention to the consequences for users, not to seek an "answer"
- [deleted]
Instead of asking, "What is the purpose of OpenAI storing milllions of private conversations" and having HN commenters (mis)interpret this as something other than a rhetorical question, one could ask, "What are the consequences for users of OpenAI storing millions of private conversations that users do not wish to save"
HN replies might try to answer this as well but the answer is already known to the world
The conversations will be made available to the plaintiffs' (including New York Times') attorneys and the plaintiffs' attorneys' experts
If OpenAI did not store such conversations as a matter of practice before being sued, then there would be no private conversations to make available to the plaintiffs' attorneys and their experts
275 upvotes
AFAICT, most HN readers did _not_ misintepret the question
HN replies != HN, it is a small subset of the readership
>Why has OpenAI collected and stored 20 million conversations (including "deleted chats")
To train the AI further. Obviously. Simple as.
"Fighting the New York Times' lawyers' and experts' invasion of user privacy"
Is there a technical limitation that prevents chat histories from being stored locally on the user's computer instead of being stored on someone else's computer(s)
Why do chat histories need to be accessible by OpenAI, its service partners and anyone with the authority to request them from OpenAI
If users want this design, as suggested by HN commenters, if users want their chat histories to be accessible to OpenAI, its service providers and anyone with authority to request them from OpenAI, then wouldn't it also be true that these users are not much concerned with "privacy"
If so, then why would OpenAI proclaim they are "fighting the New York Times' invasion of user privacy", knowing that NYT is prohibited from making the logs public and users generally do not care much about "privacy" anyway
The restrictions on plaintiff NYT's use of the logs are greater than the restrictions, if any,^1 on OpenAI's use of them
1. If any such restrictions existed, for example if OpenAI stated "We don't do X" in a "privacy policy" and people interpreted this as a legally enforceable restriction,^2 how would a user verify that the statement was true, i.e., that OpenAI has not violated the "restriction". Silicon Valley companies like OpenAI are highly secretive
2. As opposed to a statement by OpenAi of what OpenAI allegedly does not do. Compare with a potentially legally-enforceable promise such as "OpenAI will not do X". Also consider that OpenAI may do Y, Z, etc. and make no mention of it to anyone. As it happens Silicon Valley companies generally have a reputation for dishonesty
Presumably for cross-device interactivity. If I interact with ChatGPT on my phone, then open it on my desktop. I might be a bit frustrated that I can't get to the chat I was having on my phone previously.
OpenAI could store the chat conversation in an encrypted format that only you, the user, can decrypt, with the client-side determining the amount of previous messages to include for additional context, but there's plenty of user overhead involved in an undertaking like that (likely a separate decryption password would be needed to ensure full user-exclusive access, etc).
I'd appreciate and use a feature like that, but I doubt most "average" users would care.
Syncthing could do that, if the software is designed to store locally.
Ever since I put the effort into Syncthing across my all devices (paired with restic on one of them for backup), I can't help but see how cross-device functionality and cloud this are the Sysco hash potatoes that balloons Big Corp services' profit margins.
Not saying it's easy to set up. But when you get there it's so liberating and you wish all software was bring-your-own-network.
SyncThing syncs only when both clients are running at the same time. Nobody who edits a document on a website expects that they'll need to leave that browser window open in order to see the document in a different browser.
Am I missing something? Is this seriously a heated HN debate over "why does this website need to store the text it sends to people who view the website?"?
We're not talking about collaborative tooling, just a record of what you've asked an AI assistant. If it doesn't sync right away, it's not the end of the world. I find that's true with most things.
And the clients don't need to be running at the same time if you have a third device that's always on and receiving the changes from either (like a backup system). Eventually everything arrives. It's not as robust as what Google or iCloud gives you, but it's good enough for me.
Chatgpt.com is essentially a CRUD app. What you're saying here amounts to saying that it could conceivably have been designed to work dramatically differently from all other CRUD apps. And obviously that's true, but why would it be?
It's a website! You submit text, that you'll view or edit later, so the server stores it. How is that controversial to a HN audience?
Also:
> the clients don't need to be running at the same time if you have a third device that's always on
An always-on device that stores data in order to sync it to clients is a server.
> An always-on device that stores data in order to sync it to clients is a server.
Yes. But it's my server. I burden myself to operate it so that persistence does not come at the cost of control.
I think we might be tilting at different windmills here.
TBH it sounds like you're just imagining a very different service than the one openAI operates. You're imagining something where you send an input, the server returns an output - and after that they're out of the equation, and storing the output somewhere is a separate concern that could be left up to the user.
But the service they actually operate is functionally a collaborative document editor - the chat histories are basically rich text docs that you can view, edit, archive, share with others, and which are integrated with various server-side tools. And the document very obviously needs to be stored on the server to do all those things.
- [deleted]
It's great that you'd enjoy a significantly worse product that requires you to also be familiar with a completely unrelated product.
For some reason, consumers have decided that they prefer a significantly better product that doesn't require any additional applications or technical expertise ¯\_(ツ)_/¯
Facebook messenger tries to marry end to end encryption with multi-device access and it's a horrible mess with some messages not being delivered to some devices for hours , days or ever.
I absolutely want OpenAI to keep all of my chats and I absolutely don't want them to share them ( voluntarily or by force) with any private agent.
I have exactly the same expectation of any document or communication platform. It's been long established as accepted compomise between security and convenience.
> Is there a technical limitation that prevents chat histories from being stored locally on the user's computer
People access ChatGPT through different interfaces: Web, desktop app, their phones, tablets.
Therefore the conversations are stored on the servers. It's really not some hidden plot against users to steal their data. It's just how most users expect their apps to work.
Nonsense. It's easy to design an app where the server stores all information in an encrypted form. If OpenAI "cared about privacy" like this PR piece claims, they would do this. They don't because they (obviously) don't care and they (obviously) want the data for their purposes.
"Easy" does not mean "lowest cost" or "easiest". It's far far far easier to stor conversations as plain text and return them as is, instead of having to encrypt, rotate keys, etc. etc.
That's a tricky system to get right and maintain
(Please do not interpret this as a defense of OpenAI! I just think that we shouldn't trivialize the task of encrypting user data so that it's not visible to the provider).
> It's easy to design an app where the server stores all information in an encrypted form.
If you read the article, you'd see this:
> Our long-term roadmap includes advanced security features designed to keep your data private, including client-side encryption for your messages
Look, Proton somehow baked it into the design.
Open Ai didn't want to.
"We will add privacy features in the future" is hard to reconcile with "we are fighting for privacy now"
If I am sending HTTP POST requests using own choice of software via the command line to some website, e.g., an OpenAI server, then I can save those requests on local storage. I can keep a record of what I have done. This history does not need to be saved by OpenAI and consequently end up being included in a document production when (not if) OpenAI is sued. But I cannot control what OpenAI does, that's their decision
For example, I save all the POST request bodies I send over the internet in the local forward proxy's log. I add logs to tarballs and compress with an algorithm that allows for searching the logs in the tarballs without decompressing them
It does not matter what "reason" or "excuse" or "explanation" anyone presents, technical or otherwise, for why OpenAi does what it does
The issue is what are the consequences
They're very valuable data, and it's convenient to log in to see a previous chat.
If you have ever played with the api, its clear as day that the protocol itself is stateless.