Is there a general solution to this problem? I assume you can only start buffering tokens once you see a construct, for which there are continuations, that once completed, would lead to the previous text being rendered differently. Of course you don't want to keep buffering for too long, since this would defeat the purpose of streaming. And you never know if the potential construct will actually be generated. Also, the solution probably has to be more context sensitive. For example, within code blocks, you'll never want to render links for []() constructs.
EDIT: One library I found is https://github.com/thetarnav/streaming-markdown which seems to combine incremental parsing with optimistic rendering, which works good enough in practice, I guess.
There are a few things in our implementation that make a more general solution unnecessary. We only need the output to support a limited set of markdown which is typically text, bullet points, and links. So we don't need code blocks (yet).
However, the second thing (not mentioned in the post) is that we are not rendering the markdown to HTML on the server, so []() markdown is sent to the client as []() markdown, not converted into <a href=...>. So even if a []() type link exists in a code block, that text will still be sent to the client as []() text, only sent in a single chunk and perhaps with the link URL replaced. The client has its own library to render the markdown to HTML in React.
Also, the answers are typically short so even if OpenAI outputs some malformed markdown links, worst case is that we end up buffering more than we need to and the user experiences a pause after which the entire response is visible at once (the last step is to flush any buffered text to the client).
Yes. You can define a regex matching what you want, and every regex can be compiled into a state machine (https://en.wikipedia.org/wiki/Nondeterministic_finite_automa...). Then at each character you make a step in your state machine. You pause the output while the regex is not matching.
- [deleted]
This exact problem is why I wrote Streamdown https://github.com/day50-dev/Streamdown
Almost every model has a slight but meaningfully different opinion on what markdown is and how creative they can be with it.
Doing it well is a non-trivial problem.
Generating simple HTML instead of markdown would have been a solution. But I guess that ship has sailed.