One interesting bit of context is that the author of this post is a legit world-class software engineer already (though probably too modest to admit it). Former staff engineer at Google and co-founder / CTO of Tailscale. He doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.
Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.
> That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
There is likely to be a great rift in how very talented people look at sharper tools.
I've seen the same division pop up with CNC machines, 3d printers, IDEs and now LLMs.
If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate.
That causes the people who are deliberate & precise about their process to hate the new tool completely - expressing in the actual code (or paint, or marks on wood) is much better than trying to explain it in a less precise language in the middle of it. The only exception I've seen is that engineering folks often use a blueprint & refine it on paper.
There's a double translation overhead which is wasteful if you don't need it.
If you have dealt with a new hire while being the senior of the pair, there's that familiar feeling of wanting to grab their keyboard instead of explaining how to build that regex - being able to do more things than you can explain or just having a higher bandwidth pipe into the actual task is a common sign of mastery.
The incrementalists on the other hand, tend to love the new tool as they tend to build 6 different things before picking what works the best, slowly iterating towards what they had in mind in the first place.
I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals. In Chemistry, if you get a step wrong, you go to the start & start over. Plus even when things work, yield is just a pain there (prove it first, then you scale up ingredients etc).
Just from the name of sketch.dev, it appears that this author is of the 'sketch first & refine' model where the new tool just speeds up that loop of infinite refinement.
> If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate. That causes the people who are deliberate & precise about their process to hate the new tool completely
Wow, I've been there ! Years ago we dragged a GIS system kicking and screaming from its nascent era of a dozen ultrasharp dudes with the whole national fiber optics network in their head full of clever optimizations, to three thousand mostly clueless users churning out industrial scale spaghetti... The old hands wanted a dumb fast tool that does their bidding - they hated the slower wizard-assisted handholding, that turned out to be essential to the new population's productivity.
Command line vs. GUI again... Expressivity vs. discoverability, all the choices vs. don't make me think. Know your users !
This whole thing makes me think of that short story "The Machine Stops".
As we keep burrowing deeper and deeper into an overly complex system that allows people to get into parts of it without understanding the whole, we are edging closer to a situation where no one is left who can actually reason about the system and it starts to deteriorate beyond repair until it suddenly collapses.
We are so, so far beyond that point already. The complexity of the world economy is beyond any one mind to fully comprehend. The microcosm of building black-box LLMs that perform feats we don't understand is yet another instance of us building systems which may forever be beyond human understanding.
How is any human meant to understand a billion lines of code in a single codebase? How is any human meant to understand a world where there are potentially trillions of lines of code operating?
When your house is on fire and someone says "get out", certainly grabbing a jerrycan of gasoline and dousing yourself in fuel is worst than just getting out?
I believe it’s more that people hate trying new tools because they’ve already made their choice and made it their identity.
However, there are also people who love everything new and jump onto the latest hype too. They try new things but then immediately advocate it without merit.
Where are the sane people in the middle?
As an experienced software developer, I paid for ChatGPT for a couple of months, I trialed Gemini Pro for a couple of months, and I've used the current version of Claude.
I'd be happy if LLMs could produce working code as often and as quickly as the evangelist claim, but whenever I try to use LLM to work on my day to day tasks, I almost always walk away frustrated and disappointed - and most of my work is boring on technical merits, I'm not writing novel comp-sci algorithms or cryptography libraries.
Every time I say this, I'm painted as some luddite who just hates change when the reality is that no, current LLMs are just not fit for many of the purposes they're being evangelized for. I'd love nothing more than to be a 2x developer on my side projects, but it just hasn't happened and it's not for the lack of trying or open mindedness.
edit: I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?
I see less "painting as a luddite" in response to statements like this, and more... surprise. Mild skepticism, perhaps!
Your experience diverges from that of other experienced devs who have used the same tools, on probably similar projects, and reached different conclusions.
That includes me, for what it's worth. I'm a graybeard whose current work is primarily cloud data pipelines that end in fullstack web. Like most devs who have fully embraced LLMs, I don't think they are a magical panacea. But I've found many cases where they're unquestionably an accelerant -- more than enough to justify the cost.
I don't mean to say your conclusions are wrong. There seems to be a bimodal distribution amongst devs. I suspect there's something about _how_ these tools are used by each dev, and in the specific circumstances/codebases/social contexts, that leads to quite different outcomes. I would love to read a better investigation of this.
I think it also depends on _what_ the domain is, and also to a certain degree the tools / stack you use. LLMs aren’t coherent or correct when working on novel problems, novel domains or using novel tools.
They’re great for doing something that has been done before, but their hallucinations are wildly incorrect when novelty is at play - and I’ll add they’re always very authoritative! I’m glad my languages of choice have a compiler!
My recent example for where its helpful.
Pretty nice at autocomplete. Like writing json tags in go structs. Can just autocomplete that's stuff for me no problem, it saved me seconds per line, seconds I tell you.
It's stupid as well... Autofilled a function, looks correct. Reread it 10 minutes later and well... Minor mistake that would have caused a crash at runtime. It looked correct but in reality it just didn't have enough context ( the context is in an external doc on my second screen ... ) and there was no way it would ever have guessed the correct code.
It took me longer to figure out why the code looked wrong than if I had just typed it myself.
Did it speed up my workflow on code I could have given a junior to write? Not really, but some parts were quicker while other were slower.
And imagine if that code bad crashed in production next week instead of right now while the whole context is still in my head. Maybe that would be hours of debugging time...
Maybe as parent said, for a domain where you are braking new ground, it can generate some interesting ideas you wouldn't have thought about. Like a stupid pair that can get you out if a local manima but in general doesn't help much it can be a significant help.
But then again you could do what has been done for decades and speak to another human about the problem, at least they may have signed the same NDA as you...
Yeah, absolutely.
LLMs work best for code when both (a) there's sufficient relevant training data aka we're not doing something particularly novel and (b) there's sufficient context from the current codebase to pick up expected patterns, the peculiarities of the domain models, etc.
Drop (a) and get comical hallucinations; drop (b) and quickly find that LLMs are deeply mediocre at top-level architectural and framework/library choices.
Perhaps there's also a (c) related to precision. You can write code to issue a SQL query and return JSON from an API endpoint in multiple just-fine ways. Misplace a pthread_mutex_lock, however, and you're in trouble. I certainly don't trust LLMs to get things like this right!
(It's worth mentioning that "novelty" is a tough concept in the context of LLM training data. For instance, maybe nobody has implemented a font rasterizer in Rust before, but plenty of people have written font rasterizers and plenty of others have written Rust; LLMs seem quite good at synthesizing the two.)
Yesterday i wanted to understand what a team was doing in a go project. I have never really touched go before. I do understand software, because I develop for plus 20 years. But chatgpt was perfectly able to give me a summary on how the implementation worked. Gave me examples and suggestions. And within a day fulltime pasting code and asking question i had a good understanding of the codebase. It would have be a lot more difficult with only google.
how often do you get to learn an unfamiliar language? is it something you need to do every day? so this use case, did it save you much time overall?
Totally respect your position, given that you actually tried the tool and found it didn't work for you. That said, one valid explanation is that the tool isn't good for what you're trying to achieve. But an alternative explanation is that you haven't learned how to use the tool effectively.
You seem open to this possibility, since you ask:
> I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?
I don't know many yet, but Steve Yegge, a fairly famous developer in his own right, has been talking about this for the last few months, and has walked a few people through his "Chat Oriented Programming" (CHOP) ideas. I believe if you search for that phrase, you'll find a few videos, some from him and some from others. Can't guarantee they're all quality videos, though anything Steve himself does is interesting, IMO.
I have, in person. Their code requires a lot of cleanup and then there's the AI pit of death they often are not able to crawl out of due to... Mostly language differences. They don't know enough English to look stuff up and figure out how to fix things. Programming resources in other languages are pretty much non-existent
I have very similar experience. For me LLM are good at explaining someone else's complex code, but for some reason they don't help me write new code well. I would also like to see any LLM-driven developers work in real time.
You're the middle ground I was talking about. You tried it. You know where it works and where it doesn't.
I've used LLM to generate code samples and my IDE (IntelliJ) uses an LLM for auto-suggestions. That's mostly about it for me.
My experience thus far is that LLMs can be quite good at:
* Information lookup
-- when search engines are enshittified and bogged down by SEO spam and when it's difficult to transform a natural language request into a genuinely unique set of search keywords
-- Search-enabled LLMs have the most up to date reach in these circumstances but even static LLMs can work in a pinch when you're searching for info that's probably well represented in their training set before their knowledge cutoff
* Creatively exploring a vaguely defined problem space
-- Especially when one's own head feels like it's too full of lead to think of anything novel
-- Watch out to make sure the wording of your request doesn't bend the LLM too far into a stale direction. For example naming an example can make them tunnel vision onto that example vs considering alternatives to it.
* Pretending to be Stack Exchange
-- EG, the types of questions one might pose on SE one can pose to an LLM and get instant answers, with less criticism for having asked the question in the first place (though Claude is apparently not above gently checking in if one is encountering an X Y problem) and often the LLM's hallucination rate is no worse than that of other SE users
* Shortcut into documentation for tools with either thin or difficult to navigate docs
-- While one must always fact-check the LLM, doing so is usually quicker in this instance than fishing online for which facts to even check
-- This is most effective for tools where tons of people do seem to already know how the tool works (vs tools nobody has ever heard of) but it's just not clear how they learned that.
* Working examples to ice-break a start of project
* Simple automation scripts with few moving parts, especially when one is particular about the goal and the constraints
-- Online one might find example scripts that almost meet your needs but always fail to meet them in some fashion that's irritating to figure out how to coral back into your problem domain
-- LLMs have deep experience with tools and with short snippets of coherent code, so their success rate on utility scripts are much higher than on "portions of complex larger projects".
> Where are the sane people in the middle?
They are the quiet ones.
Yup! I don't have a lot to say about LLMs for coding. There are places where I'm certain they're useful and that's where I use them. I don't think "generate a react app from scratch" helps me, but things like "take a CPU profile and write it to /tmp/pprof.out" have worked well. I know how to do the latter, but would need to look at the docs for the exact function name to call, and the LLM just knows and checks the error on opening the file and all that tedium. It's helpful.
At my last job I spent a lot of time on cleanups and refactoring and never got the LLM to help me in any way. This is the thing that I try every few months and see what's changed, because one day it will be able to do the tedious things I need to get done and spare me the tedium.
Something I should try again is having the LLM follow a spec and see how it does. A long time ago I wrote some code to handle HTTP conditional requests. I pasted the standard into my code, and wrote each chunk of code in the same order as the spec. I bet the LLM could just do that for me; not a lot of knowledge of code outside that file was required, so you don't need many tokens of context to get a good result. But alas the code is already written and works. Maybe if I tried doing that today the LLM would just paste in the code I already wrote and it was trained on ;)
Middle Ground Fallacy
Fallacy fallacy
The middle ground between hyping the new tech and being completely skeptical about it is usually right. New tech is usually not everything it's hyped up to be, but also usually not completely useless or bad for society. It's likely we're not about to usher in the singularity or doom society, but LLMs are useful enough to stick around in various tools. Also it's probably the case that a percentage of they hype is driven by wanting funding.
> New tech is usually not everything it's hyped up to be, but also usually not completely useless or bad for society.
Except for cryptocurrencies (at least their ratio of investments to output) :-p
I can't relate to this comment at all. Doesn't feel like what's said in GP either.
IMO, LLMs are super fast predictive input and hallucinatory unzip; files to be decompressed don't have to exist yet, but input has to be extremely deliberate and precise.
You have to have a valid formula that gives the resultant array that don't require no more than 100 IQ to comprehend, and then they unroll it for you into the whole code.
They don't reward trial and error that much. They don't seem to help outsiders like 3D printers did, either. It is indeed a discriminatory tool as in it mistreats amateurs.
And, by the way, it's also increasingly obvious to me that assuming pro-AI posture more than what you would from purely rational and utilitarian standpoint triggers a unique mode of insanity in humans. People seem to contract a lot of negativity doing it. Don't do that.
This is a good characterization. I'm precision-driven and know what I need to do at any low level. It's the high-level definition that is uncertain. So it doesn't really help to produce a dozen prototypes of an idea and pick one, nor does it help to fill in function definitions.
Intersting.
So engineers that like to iterate and explore are more likely to like LLMs.
Whereas engineers that like have a more rigid specific process are more likely to dislike LLMs.
I frequently iterate and explore when writing code. Code gets written multiple times before being merged. Yet, I still haven't found LLMs to be helpful in that way. The author gives "autocomplete", "search", and "chat-driven programming" as 3 paradigms. I get the most out of search (though a lot of this is due to the decreasing value of Google), autocomplete is pretty weak to me especially as I macro or just use contextual complete, and I've failed miserably at chat-driven programming on every attempt. I spend more time debugging the AI than it would to debug myself. Albeit it __feels__ faster because I'm doing more typing + waiting rather than continuous thinking (but the latter has extra benefits).
FWIW I find LLMs almost useless for writing novel code. Like it can spit out a serviceable UUID generator when I need it, but try writing something with more than a layer or two of recursion and it gets confused. I turn copilot on for boilerplate and off for solving new problems.
[dead]
> I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals.
That is interesting. Asking as a complete ignoramus - is there not a way to do this now? Like start off with a 100 of reagent and at every step use a bit and discard if wrong
But for every step that turns out to be "correct" you now have to go back and redo that in your held-out sample anyways. So it's not like you get to save on repeating the work -- IIUC you just changed it from depth-first execution order to breadth-first execution order.
> International Islamic University Chittagong
??? What's up with native English speakers and random acronyms of stuff that isn't said that often? YMMV, IIUC, IANAL, YSK... Just say it and save everyone else a google search.
I'm not a native English speaker, but IIUC is clearly 'If I Understand Correctly'. If you look at the context it's often fairly easy to figure out what an initialism means. I mean even I can usually deduce the meaning and I'm barely intelligent enough to qualify as 'sentient'.
So just to make sure I'm on the same page: you're bemoaning how commonly people abbreviate uncommon sayings?
I'm bemoaning the fact that I have to google random acronyms every time an American wants to say the most basic shit as if everyone on the internet knows their slang and weird four letter abbreviations
And googling those acronyms usually returns unrelated shit unless you go specifically to urban dictionary
And then it's "If I understand correctly". Oh. Of course. He couldn't be arsed to type that
FWIW IMO YTA
frfr
That likely ends up with 100 failed results all attributed to the same set of causes
Not so sure about those examples and pairing with the idea of quick and dirty work.
Calculators vs slide rules.
I have also many years of programming experience and find myself strongly "accelerated" by LLMs when writing code. But, if you think at it, it makes sense that many seasoned programmers are using LLMs better. LLMs are a helpful tool, but also a hard-to-use tool, and in general it's fair to think that better programmers can do a better use of some assistant (human or otherwise): better understanding its strengths, identifying faster the good and bad output, providing better guidance to correct the approach...
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.
I completely agree that communication skills are critical in extracting useful work or insight from LLMs. The analogy for communicating with people is not far-fetched. Communicating successfully with a specific person requires an understanding of their strengths and weaknesses, their tendencies and blind spots. The same is true for communicating with LLMs.
I have actually found that from a documentation point of view, querying LLMs has made me better and explaining things to people. If, given the documentation for a system or API, a modern LLM can't answer specific questions about how to perform a task, a person using the same documentation will also likely struggle. It's proving to be a good way to test the effectiveness of documentation, for humans and for LLMs.
Communication skills are the keys to using LLMs. Think about it: every type of information you want is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea. If one is casual in their request, using casual language, then the LLM will reply with a casual reply because that matched your request best. To get a hard, factual answer from those that are experts in a subject, use the formal term, use the expert's language and you'll get back a rely more likely to be correct because it's in the same level of formal treatment as correct answers.
>every type of information you want is in them
Actually, I'm afraid that no. It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace, with each of us enjoying life in its own thriving manner. That would be great information to broadcast, though.
Also it equally has ability to produce large pile of completely delusional answers, that mimics just as well genuinely sincere statements. Of course, we can also receive that kind of misguiding answers from humans. But the amount of output that mere humans can throw out in such a form is far more limited.
All that said, it's great to be able to experiment with it, and there are a lot of nice and fun things to do with it. It can be a great additional tool, but it won't be a self-sufficient panacea of information source.
> It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace
That's not anywhere, that's a totally unsolved and open ended problem, why would you think an LLM would have that?
If what you meant was
> Think about it: every type of already solved problem you want information about is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea.
then that was not clear from your comment saying LLMs contain any information you want.
One has to be careful communicating about LLms because the world is full of people that actually believe LLMs are generally intelligent super beings.
I think GP's saying that it must be in your prompt, not in the weights.
If you want LLM make sandwich, you have to tell them you `want triangular sandwiches of standard serving size made with white bread and egg based filling`, not `it's almost noon and I'm wondering if sandwich for lunch is a good idea`. Fine-tuning partially solves that problem but they still like the former.
After a small prompt engineering: https://0bin.net/paste/zolMrjVz#dgZrZzKU-PlxdkJTdG0pZU9bsCM3...
Interesting, thanks for sharing. Could you also give some insights on the process you followed?
Sure. Lately I've found that the "role" part of prompt engineering seems to be the most important. So what I've been doing is telling ChatGPT to play the role of the most educated/wise/knowledgeable/skilled $field $role(advisor, lawyer, researcher etc) in the history of the world and then giving it some context for the task before asking for the actual task.
Sometimes asking it to self reflect on how the prompt itself could be better engineered helps if the initial response isn't quite right.
Hey! Asking because I know you're a fellow vimmer [0]. Have you integrated LLMs into your editor/shell? Or are you largely copy-pasting context between a browser and vim? This context-switching of it all has been a slight hang-up for me in adopting LLMs. Or are you asking more strategic questions where copy-paste is less relevant?
[0] your videos on writing systems software were part of what inspired me to make a committed switch into vim. thank you for those!
You want aider.
> "seasoned programmers are using LLMs better".
I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.
From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?
I'd ask things like "which LLM are you using", and "what language or APIs are you asking it to write for".
For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.
It's mostly Go, some Python, and I'm not asking anything niche. I'm asking for basic utility functions that I could implement in 10-20 lines of code. There's something broken every single time and I spend more time debugging the generated code than actually writing it out.
I'm pretty sure everybody measures "failure rate" differently and grossly exaggerate the success rate. There's a lot of suggestions below about "tweaking", but if I have to "tweak" generated code in any way then that is a failure for me. So the failure rate of generated code is about 99%.
Step 1: https://claude.ai
Step 2: Write out your description of the thing you want to the best of your ability but phrase it as "I would like X, could you please help me better define X by asking me a series of clarifying questions and probing areas of uncertainty."
Step 3: Once both Claude and you are satisfied that X is defined, say "Please go ahead and implement X."
Step 4a: If feature Y is incorrect, go to Step 2 and repeat the process for Y
Step 4b: If there is a bug, describe what happened and ask Claude to fix it.
That's the basics of it, should work most of the time.
I write the prompt as if I’m writing an email to a subordinate that clearly specifies what the code needs to do.
If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.
Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.
If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.
Check my YouTube channel if you have a few minutes. I just published a video about adding a complex feature (UTF-8) to the Kilo editor, using Claude.
dc: not a seasoned dev, with <b> and <h1> tags on "not".
They can't think for you. All intelligent thinking you have to do.
First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.
Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.
They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.
For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.
The story from the article matches my experience. The LLM's first answer is often a little broken, so I tweak it until it's actually correct.
I rarely get back not working code but I've also internalized it's limitations so I no longer ask it for things it's not going to be able to do.
As other commenters have pointed it, there also a lot of variation between different models and some are quite dumb.
I've had no issues with 10-20 line coding problems. I've also had it built a lot of complete shell scripts and had no problem there either.
> the ability to describe problems very clearly
Yes, and to provide enough context.
There's probably a lot that experience is contributing to the interaction as well, for example - knowing when the LLM has gone too far, focusing on what's important vs irrelevant to the task, modularising and refactoring code, testing etc
That's really interesting. What are the most important things you've learned to do with the LLMs to get better results? What do your problem descriptions look like? Are you going back and forth many times, or crafting an especially-high-quality initial prompt?
I'm posting a set of videos on my YT channel where I'll show the process I follow. Thanks!
That's fantastic! I thought about asking if you had streamed any of it, but I didn't want to sound demanding and entitled :)
> [David, Former staff engineer at Google ... CTO of Tailscale,] doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me...
Don't doubt for a second the pedigree of founding engs at Tailscale, but David is careful to point out exactly why LLMs work for them (but might not for others):
I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning. If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
> If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
I am not a software dev I am a security researcher. LLM's are great for my security research! It is so much easier and faster to iterate on code like fuzzers to do security testing. Writing code to do a padding oracle attack would have taken me a week+ in the past. Now I can work with an LLM to write code and learn and break within the day.
It has accelerated my security research 10 fold, just because I am able to write code and parse and interpret logs at a level above what I was able to a few years ago.
I'm in similar situations, I jump between many environments, mainly between Python and Typescript, however, currently testing a new idea of learning algorithm in C++, and I simply don't always remember all syntax. I was very skeptical about LLMs at first. Now, I'm using LLMs daily. I can focus more on thinking rather than searching stackoverflow. Very often I just need simple function, that it is much faster to create with chat.
And if anyone remembers: before Stack Overflow you more or less had to specialize in a domain, become good using a handful of frameworks/API, on one platform. Learning a new language, a new API (god forbid a new platform) was to sail, months long, into seas unknown.
In this regard, with first Stack Overflow and now LLMs, the field has improved mightily.
That approach sounds similar to the Idris programming language with Type Driven Development. It starts by planning out the program structure with types and function signatures. Then the function implementation (aka holes) can be filled in after the function signatures and types are set.
I feel like this is a great approach for LLM assisted programming because things like types, function signatures, pre/post conditions, etc. give more clarity and guidance to the LLM. The more constraints that the LLM has to operate under, the less likely it is to get off track and be inconsistent.
I've taken a shot at doing some little projects for fun with this style of programming in TypeScript and it works pretty well. The programs are written in layers with the domain design, types, schema, and function contracts being figured out first (optionally with some LLM help). Then the function implementations can be figured out towards the end.
It might be fun to try Effect-TS for ADTs + contracts + compile time type validation. It seems like that locks down a lot of the details so it might be good for LLMs. It's fun to play around with different techniques and see what works!
100% this is what I do in python too!
I am not a genius but have a couple of decades experience and finally started using LLMs in anger in the last few weeks. I have to admit that when my free quota from GitHub Copilot ran out (I had already run out of Jetbrains AI as well!! Our company will start paying for some service as the trials have been very successful), I had a slight bad feeling as my experience was very similar to OP: it's really useful to get me started, and I can finish it much more easily from what the AI gives me than if I started from scratch. Sometimes it just fills in boilerplate, other times it actually tells me which functions to call on an unfamiliar API. And it turns out it's really good at generating tests, so it makes my testing more comprehensive as it's so much faster to just write them out (and refine a bit usually by hand). The chat almost completely replaced my StackOverflow queries, which saves me much time and anxiety (God forbid I have to ask something on SO as that's a time sink: if I just quickly type out something I am just asking to be obliterated by the "helpful" SO moderators... with the AI, I just barely type anything at all, leave it with typos and all, the AI still gets me!).
Have you tried using Ollama? You can download and run an LLM locally on your machine.
You can also pick the right model for the right need and it's free.
Yes. If the AI is not integrated with the IDE, it's not as helpful. If there were an IDE plugin that let you use a local model, perhaps that would be an option, but I haven't seen that (Github Copilot allows selecting different models, but I didn't check more carefully whether that also includes a local one, anyone knows?).
> (Github Copilot allows selecting different models, but I didn't check more carefully whether that also includes a local one, anyone knows?).
To my knowledge, it doesn't.
On Emacs there's gptel which integrates quiet nicely different LLM inside Emacs, including a local Ollama.
> gptel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It works in the spirit of Emacs, available at any time and uniformly in any buffer.
This can use Ollama: https://www.continue.dev/
> If there were an IDE plugin that let you use a local model
TabbyML
It’s doable as it’s what I use to experiment.
Ollama + CodeGPT IntelliJ plugin. It allows you to point at a local instance.
I also use Ollama for coding. I have a 32G M2 Mac, and the models I can run are very useful for coding and debugging, as well as data munging, etc. That said, sometimes I also use Claude Sonnet 3.5 and o1. (BTW, I just published an Ollama book yesterday, so I am a little biassed towards local models.)
Thanks for the book!
I’m using ChatGPT4o to convert a C# project to C++. Any recommendation on what Ollama model I could use instead?
The one that does not convert C# at all and asks you to just optimize it in C# instead (and to use the appropriate build option) :D
I’m converting game logic from C# to UE5 C++. So far made great progress using ChatGPT4o and o1
Do you find these working out better for you than Claude 3.5 Sonnet? So far I've not been a fan of the ChatGPT models' output.
I find ChatGPT better with UE4/5 C++ but they are very close.
Biggest advantage is the o1 128k context. I can one shot an entire 1000 line class where normally I’d have to go function by function with 4o.
I'm genuinely curious but what did you use StackOverflow for before? With a couple of decades in the industry I can't remember when the last time I "Google programmed" anything was. I always go directly to the documentation for whatever it is I'm working for, because where else would I find out how it actually works? It's not like I haven't "Google programmed" when I was younger, but it's just such a slow process based on trusting strangers on the internet that it never really made much sense once I started knowing what I was doing. I sort of view LLM's in a similar manner. Why would you go to them rather than the actual documentation? I realize this might sound arrogant or rude, and I really hope you believe me when I say that I don't mean it like this. The reason I'm curious is because we're really struggling getting junior developers to not look, everywhere, but the documentation first. Which means they often actually don't know how what they build works. Which can be an issue when they load every object of a list into memory isntead of using a generator...
As far as using LLMs in anger I would really advice anyone to use them. GitHub copilot hasn't been very useful for me personally, but I get a lot of value out of running my thought process by a LLM. I think better when I "think out loud" and that is obviously challenging when everyone is busy. Running my ideas by an LLM helps me process them in a similar (if not better) fashion, often it won't even really matter what the LLM conjures up because simply describing what I want to do often gives me new ideas, like "thinking out loud".
As far as coding goes. I find it extremely useful to have LLMs write cli scripts to auto-generate code. The code the LLM will produce is going to be absolute shite, but that doesn't matter if the output is perfectly fine. It's reduced my personal reliance on third party tools by quite a lot. Because why would I need a code generator for something (and in that process trust a bunch of 3rd party libraries) when I can have a LLM write a similar tool in half an hour?
I believe you don't mean to be rude, but you just sound completely naive to me. To think that documentation includes everything is just, like, have you actually been coding anything at all that goes just slightly off the happy path? Example from yesterday: I have a modular JavaFX application (i.e. it uses Java JMS modules, not just Maven/Gradle modules). I introduced a call to `url()` in JavaFX CSS. That works when running using the classpath, but not when using the module path. I spent half an hour reading docs to see what they say about modular applications. They didn't mention anything at all. Specially because in my case, I was not just doing `getClass().getResource`... I was using the CSS directive to load a resource from the jar. This is exactly when I would likely go on SO and ask if anyone had seen this before. It used to be highly likely someone who's an expert on JavaFX would see and answer my question, sometimes even people who directly worked on JavaFX!
StackOverflow was not really meant for juniors, as juniors usually can indeed find answers on documentation, normally. It was, like ExpertsExchange before it, a place for veterans to exchange tribal knowledge like this. If you think only juniors use SO, you seem to have arrived at the scene just yesterday and just don't know what you're talking about.
> Why would you go to them rather than the actual documentation?
Not every documentation is made equal. For example: Android docs are royal shit. They cover some basic things, e.g. show a button, but good look finding esoteric Bluetooth information or package management, etc. Most of it is a mix of experimentation and historical knowledge (baggage).
> Not every documentation is made equal.
They are wildly different. I'm not sure the Android API reference is that bad, but that is mainly because I've spent a good amount years with the various .Net API references and the Android one is a much more shiny turd than those. I haven't had issues with Bluetooth myself, the Bluetooth SIG has some nice specification PDF's but I assume you're talking about the ones which couldn't be found? I mean this in a "they don't seem to exist" kind of way and not that, you specifically, couldn't find them.
I agree though. It's just that I've never really found internet answers to be very useful. I did actually search for information a few years back when I had to work with a solar inverter datalogger, but it turned out that having the ridicilously long German engineering manual scanned, OCR processed and translated was faster. Anyway, we all have our great white whales. I'm virtually incapable of understanding the SQLAlchemy documentation as an example, luckily I'll probably never have to use it again.
I have been using LLM to generate functional code from *pseudo-code* with excellent results. I am starting to experiment with UML diagrams, both with LLM and computer vision to actually generate code from UML diagrams; for example a simple activity diagram could be the prompt on LLM 's, and might look like:
Start -> Enter Credentials -> Validate -> [Valid] -> Welcome Message -> [Invalid] -> Error Message
Corresponding Code (Python Example):
class LoginSystem:
*Edited for claritydef validate_credentials(self, username, password): if username == "admin" and password == "password": return True return False def login(self, username, password): if self.validate_credentials(username, password): return "Welcome!" else: return "Invalid credentials, please try again."
This example illustrates one of the risks of using LLMs without subject expertise though. I just tested this with claude and got that exact same validation method back. Using string comparison is dangerous from a security perspective [1], so this is essentially unsafe validation, and there was no warning in the response about this.
1. https://sqreen.github.io/DevelopersSecurityBestPractices/tim...
Are you talking about the timing based attacks on that website which fails miserably at rendering a useable page on mobile?
Could you add to the prompt that the password is stored in an sqlite database using argon2 for encryption, the encryption parameters are stored as environment variables.
You would like it to avoid timing based attacks as well as dos attacks.
It should also generate the functions as pure functions so that state is passed in and passed out and no side effects(printing to the console) happen within the function.
Then also confirm for me that it has handled all error cases that might reasonably happen.
While you are doing that, just think about how much implicit knowledge I just had to type into the comment here and that is still ignoring a ton of other knowledge that needs to be considered like whether that password was salted before being stored. All the error conditions for the sqlite implementation in python, the argon2 implementation in the library.
TLDR: that code is useless and would have taken me the same amount of time to write as your prompt.
I think what you're describing is basically "interface driven development" and "test driven development" taken to the extreme: where the formal specification of an implementation is defined by the test suite. I suppose a cynic would say that's what you get if you left an AI alone in a room with Hyrum's Law.
> His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow
Regardless of language, that's basically how you approach the design of a new large project - top down architecture first, then split the implementation into modules, design the major data types, write function signatures. By the time you are done what is left is basically the grunt work of implementing it all, which is the part that LLMs should be decent at, especially if the functions/methods are documented to level (input/output assertions as well as functionality) where it can also write good unit tests for them.
> the grunt work of implementing it all
you mean the fun part. I can really empathize with digital artists. I spent twenty years honing my ability to write code and love every minute of it and you're telling me that in a few years all that's going to be left is PM syncs and OKRs and then telling the bot what to write
if I'm lucky to have a job at all
I think it depends on the size of the project. To me, the real fun of being a developer is the magic of being able to conceive of something and then conjure it up out of thin air - to go from an idea to reality. For a larger more complex project the major effort in doing this is the solution conception, top-down design (architecture), and design of data structures and component interfaces... The actual implementation (coding), test cases and debugging, then does become more like drudgework, not the most creative or demanding part of the project, other than the occasional need for some algorithmic creativity.
Back in the day (I've been a developer for ~45 years!) it was a bit different as hardware constraints (slow 8-bit processors with limited memory) made algorithmic and code efficiency always a primary concern, and that aspect was certainly fun and satisfying, and much more a part of the overall effort than it is today.
>> where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them
AIUI that’s where idris is headed
> designed around filling in the implementations for you. 20 years ago that would have been from a live online database
This reminds me a bit of PowerBuilder (or was it PowerDesigner?) from early 1990s. They sold it to SAP later, I was told it's still being used today.
Isn't that the idea behind UML? Which didn't work out so well, however, with the advent of LLMs today, I think that premise could work.
I knew he was a world-class engineer the moment I saw that his site didn't bother with CSS stylesheets, ads, pictures, or anything beyond a rudimentary layout.
The whole article page reads like a site from the '90s, written from scratch in HTML.
That's when I knew the article would go hard.
Substantive pieces don't need fluffy UIs - the idea takes the stage, not the window dressing.
I wonder what he uses, I noticed the first paragraph took over a second to load... Largest Contentful Paint element 1,370 ms This is the largest contentful element painted within the viewport. Element p
Looks like it loads all the Google surveillance without asking. Should IP-block the EU.
Glad to know I was a world class engineer at the age of 8, when all I knew were the <h1> and <b> tags!
he is using llm for coding. you don't become staff engineer by being a badass coder. Not sure how they are related.
[dead]
Being a dev at a large company is usually the sign that you're not very good though. And anyone can start a company with the right connections.
That's a terrible blanket statement, very US-centric. Not everyone wants to start a company and you can't just reduce ones motivations to your measure of success.
God knows many of the best devs I've known would be an absolute nightmare on the business side, they'd rather have a capable business person if they could avoid it
You've just disproved your own assertion. Either that or you believe everyone who's any good has the right connections.