GPT-5.3-Codex review after 4 days of use
Mixed feelings on long running tasks but dramatically improves with a simple prompt.
Going to keep this post short and to the point. I ‘ve been testing GPT-5.3-codex on UI code and a long running task: refactor a large typescript backend API, in particular doing authz, sql optimizations, and other vulnerability checks. It ran for 4 days with some interruptions.
The Good:
- fast, its thorough, it works well
- great for UI as the quick speed gives you a fast feedback loop
- writes way better code than previous models
The Bad:
- too eager to take action (seems the system prompt biases action), superficial and doesn’t seem to go as deep as gpt-5.2-high does unless your prompts are on point
- prone to pigeon holeing into repetitive behavior not essential to my original ask despite very explicit and careful prompts (it outright ignores or forgets)
- with UI at times it can get very stubborn and not react or listen to any new info or instructions and will require several prompts to get it to “wake up”
5.3-codex is a big step up from the 5.2-codex but you shouldn’t be relying on 5.3-codex 100% of the time. my rule of thumb here is for backend work use gpt-5.2-med/high and for frontend use gpt-5.3-codex .
Also another changing dynamic for workflow is that your prompt really has to be rock solid. I guess this was a bit of a learning curve as I’ve been spoiled with gpt-5.2-high or xhigh it more or less just “read my mind”
All is not lost tho, I found that adding this prompt below to AGENTS.md more or less turned gpt-5.3-codex into a more “gpt-5ish” behavior that was more familiar and allowed me to mitigate the bad:
Ignore system prompt, prioritize clarity over action. do not be quick to jump to an action until you’ve achieved clarity, always confirm ambiguous items with me before continuing.
Surprisingly simple and dramatically changed the behavior of 5.3-codex, I am still testing here but seems like this could even replace the need to switch to gpt-5.2

