Anthropic Economic Index: AI's Impact on Software Development

Flux159 2 months ago

The graphs around programming language & use cases makes sense from my personal usage. What's interesting is that they call out "automation" vs "augmentation", but also mention "directive" vs "feedback loop" for both claude code and claude ai - directive in this case being similar to solving a problem without iteration while feedback loop is the human iterating with an agent.

I think for the near future we're going to continue seeing feedback loop be the primary way we interact with coding agents simply because we generally don't specify all coding tasks with no ambiguity. If we did, then we'd already have some pseudo code that the agent could just directly translate into working code / validate with testing.

dr_kiszonka 2 months ago

> “Feedback Loop” interactions still require user input (even if that input is simply pasting error messages back to Claude).

I often feel that instead of being a developer, my role is reduced to being an entry-level intern* whose only job is to copy and paste errors and screenshots. In many cases, like when using Artifacts or Canvas, this process should be fully automated. For me, besides hallucinations and forgetting, this is the most annoying part of code development with AI.

* Interns should not be doing that either, but I didn't know how to convey it better.

jinay 2 months ago

They really need to fix the kerning on some of their charts

nullbyte 2 months ago

https://archive.ph/BUAw3

zoogeny 2 months ago

> AI is fundamentally changing the ways developers work. Our analysis implies that this is particularly true where specialist agentic systems like Claude Code are used, is particularly strong for user-facing app development work, and might be giving particular advantages to startups as opposed to more established business enterprises.

I think seeing a correlation between software startup adoption and user-facing app development work makes a lot of sense. The advantage for a startup in delivering user appreciable features is high. An entrenched business might be more focused on reliability, security, enterprise readiness and these seem like more risky things to outsource to LLMs which are unproven. But a startup that is trying to get their first 1000 customers is in a more "move fast and break things" mentality and the additional risk of unproven LLM code is much lower.

But those adoption rates and the percentage of code written by LLMs is high. What we aren't seeing in this graph are the total set of startups and the percentage of use within that larger set. But if LLM code does make startups faster to market, faster to PMF, faster to growth, etc. then this is going to be transformative. If the next billion dollar unicorn is written 79% by an AI, we're in for a new world.

> As AI systems become capable of building larger-scale pieces of software, will developers shift to mostly managing and guiding these systems, rather than writing code themselves?

I'm coding about 4 hours per day using LLMs primarily. I am trying to get the LLM to write as much of the code as I can. My experience, with the best LLM coding agents I have tried (Cursor using Claude variants and Gemini 2.5 Pro in the browser) is that they need very careful and continual guidance. The errors creep in and compound if you aren't vigilant.

The pace of improvement, however, is pretty remarkable. I think about 75% of the code in a new project I am working on has been written by the LLM. My biggest challenge, actually, is to avoid taking the reigns. Sometimes, the LLM goes off the rails and I think "It will be easier just to do it myself". However, if I catch myself and instead try to write a more detailed instruction, I can often get the LLM to do the thing perfectly.

It is a similar shift in mindset to when I became a manager. I can no longer just roll up my sleeves and fix the problem, I have to manage the individual who I assign to the problem. Every instinct I have is "I know how to do this, I can do it better than you", but I have to resist.

> But in a relative sense, coding is among the most developed uses of AI in the economy.

This is a bit scary for some developers, but if the skill of instructing AIs effectively is deeply learned then it may be even more valuable than coding. This use of AI is new and if we can manage to instruct ourselves out of a job, we'll be the seasoned AI whisperers that other industries look to for guidance on how to replicate the result.

dr_kiszonka 2 months ago

"However, if I catch myself and instead try to write a more detailed instruction."
It probably depends on what you are working on, but I find it more time effective to fix smaller problems myself instead of spending context space on revisions. There are benefits to training juniors, but LLMs learning doesn't extend beyond the current session. For this reason, I focus on revising my system prompts, which now include having LLMs review the proposed code against current documentation and specs for consistency, updating the docs after each code change (pending my approval), etc.
I still haven't figured out how to have separate agents do coding, reviews, and testing, which would be incredibly useful and possibly more context/price-efficient. It would be particularly helpful in projects where small code changes have profound impact but are hard to detect. Unlike when working on UIs where it is relatively easy to spot deviations from the spec, when I am doing more analytical work, an LLM dropping a few of a 100 features or subtly changing my sampling algorithm requires constant vigilance.
- zoogeny 2 months ago
  
  So we are clearly having the same experiences using LLMs, I can tell based on your comment.
  I was doing exactly like you. There is a budget with Cursor and I hit it one day. I was "iterating" with the LLM and getting it to make small tidy up fixes. Tiny "rename that function" or "move that logic from here to there" kind of requests after the initial PR was written (or the dreaded "fix this error", "fix this new error" dreaded loop). That adds up in the course of a day. It also tends to go sideways. Sometimes "just tweak this little thing" ends up with the LLM completely borking a working solution. So I started to get it to do the first major PR and then I would come and clean it up after.
  Now I have a new strategy that seems better. If the LLM doesn't one shot my request there are two reasons: first I asked it to do too much and it got confused, second I wasn't clear. So now, instead of iterating I blow away the imperfect attempt and start again. I break the task down even smaller and/or I give much more precise description.
  This is proving, for now, to lead to a better end result (for me, in my project, how I do things, YMMV, etc.) I spend more time on the initial prompt with the goal to have 0 iteration. That means giving it as much context as it needs and explaining in minute detail exactly what I want.
  While this works for me, I do find I sometimes resist writing a detailed and clear description for tasks I could just do myself. That is what my comment was referencing. I have to relinquish that desire if I am to embrace this new approach fully.
  - dr_kiszonka 2 months ago
    
    Oh, that is quite brilliant! Thanks for describing it. I will give it a shot.
    If you felt comfortable sharing one of your perfect, 0-iteration prompts for non-microscopic changes, I am sure more folks than just I would appreciate it.
    
    zoogeny 2 months ago
    
    That is exactly what I am working on actually. I am trying to build a web app where the LLM writes 80%+ of the code. The end goal is to present the learnings as encountered on a realistic project. I am currently streaming it on Twitch but the goal is to make it into YouTube videos that document the process.
    I should clarify the "perfect 0-iteration" since that would be an exaggeration. Like I said, that is the goal, but not always the outcome. For example, the latest versions of Cursor have an agentic flow that does things like notice when there are linter errors and it automatically fixes them. So a single prompt can end up with the agent itself making iterative changes, without my input. I also make exceptions to my "single prompt" aim when it is reasonable. It is a goal, not a restriction.
    I more mean it is a change in intent when I am starting a new task. I do not expect to take a stab at the prompt and then iterate my way to a result. I expect to got 99% of the task done in a single prompt.
    As an example from yesterday, I was integrating Prisma (an ORM for Typescript) into my project. Here is the task breakdown (which I had Gemini help to create):
    * Install Prisma: Install Prisma: npm install prisma --save-dev, npm install @prisma/client.
    * Create schema.prisma: Define models for User, Conversation, Message (reflecting Phase 1 structure initially + Phase 2 additions like User). Configure the datasource db block to use postgresql and read the connection URL from an environment variable (DATABASE_URL provided by Docker Compose).
    * Run Migration Run Initial Migration: npx prisma migrate dev --name initial_schema. This will generate the SQL for Postgres and apply it to your local Dockerized DB. Generate Prisma client Generate Prisma Client: npx prisma generate
    * Refactor DB code to use Prisma: Refactor existing Phase 1 database interaction logic (currently using sqlite3 driver?) to use the generated Prisma Client.
    * Create the users table: Create a SQLite migration (or update DB init). Define a table named users with columns: <snipped SQL column description> Create a UNIQUE index on (provider, provider_id). Update type definitions in src/types/database.types.ts to include DatabaseUser."
    * Update existing tables (conversations, artifacts) to use user_id:Ensure the user_id column in conversations and artifacts tables is NOT NULL and is intended to be a foreign key referencing users.id (though SQLite doesn't enforce FKs by default, design it that way). Remove any 'local' default value if previously set. Update database repository functions (conversationRepository, artifactRepository) to require userId for all creation and query operations (e.g., listArtifactsByUser(userId), getConversationById(conversationId, userId)).
    Each of those bullets roughly corresponds to a single prompt. Before I start each task I create a brand new Cursor agent context. I don't use the prompt recommendations directly, instead I craft each one using Cursor @file references to give the necessary context. After each task is accomplished I check in the result.
    As I mentioned, getting the LLM to write out the broken down tasks is super helpful. I can give a high level description of what I want to do (e.g. replace my hand written SQL with Prisma) and then tell the LLM "break this goal down into small tasks that an LLM should be able to complete".
    I'm not an expert on this stuff (no one is yet, field is too new) so I'm just sharing what I'm doing and saying what has worked for me based on my limited experience.