Building a shell is a great exercise, but honestly having to deal with string parsing is such a bother that it robs like 2/3 of the joy along the way. I once built a very simple one in Go [0] as a learning exercise and I stopped once I started getting frustrated with all the corner cases.
[0] https://github.com/lourencovales/codecrafters/blob/master/sh...
A common problem I noticed is that if you took certain courses in computer science, you may have a pre-conceived notion of how to parse programming languages, and the shell language doesn't quite fit that model
I have seen this misconception many times
In Oils, we have some pretty minor elaborations of the standard model, and it makes things a lot easier
How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html
Everything I wrote there still holds, although that post could use some minor updates (and OSH is the most bash-compatible shell, and more POSIX-compatible than /bin/sh on Debian - e.g. https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp... )
---
To summarize that, I'd say that doing as much work as possible in the lexer, with regular languages and "lexer modes", drastically reduces the complexity of writing a shell parser
And it's not just one parser -- shell actually has 5 to 15 different parsers, depending on how you count
I often show this file to make that point: https://oils.pub/release/0.37.0/pub/src-tree.wwz/_gen/_tmp/m...
(linked from https://oils.pub/release/0.37.0/quality.html)
Fine-grained heterogenous algebraic data types also help. Shells in C tend to use a homogeneous command* and word* kind of representation
https://oils.pub/release/0.37.0/pub/src-tree.wwz/frontend/sy... (~700 lines of type definitions)
Author here, and yeah, I agree. I skipped writing a parser altogether and just split on whitespace and `|` so that I could get to the interesting bits.
For side-projects, I have to ask myself if I'm writing a parser, or if I'm building something else; e.g. for a toy programming language, it's way more fun to start with an AST and play around, and come back to the parser if you really fall in love with it.
Can say the same for control characters in terminals. I even think maybe it's just easier to ditch them all and use QT to build a "terminal" with clickable urls, something similar to what TempleOS does.