Can a system of agents replace the traditional SDLC? should it? - The Software Factory (tsf) is an approach to answer this question.
The goal is to improve productivity through automation and augmentation of the current SDLC process.
The desired outcomes are:
How do we establish trust if the human is taken out of the loop? How do we trust an agent will both create the code and tell us that the code is correct? What do we do about hallucinations?
How can we demonstrate a given requirement will result in an expected outcome, repeatably?
How can we "show our workings" to demonstrate why a given outcome occurred?
TSF approach is to model the roles and processes we have now then automate them. Applying this process across the software engineering stack is the experiment.
I thought that software went from an IDEA through some PROCESS to REALITY. Zooming in one level of abstraction, we apply different roles, lifecycles and methods of describing. For example, with Roles:

That is, the job more-or-less is like a baton, handed on to the next in line.
In fact what happens is everyone talks to everyone - sometimes. The Engineer clarifies a point with the QA who talks to the BA and the release manager coordinates with the Engineering Director. And so on.

A Product Owner contradicts a prior feature, changing the acceptance criteria on the fly.
The engineer has a bad day, writes a bug which slips through to release because then QA is on vacation.
Four of the team members get together in an adhoc meeting to decide what to do about some ambiguity of design or some impediment that has been discovered, decide on an outcome and don't rewrite the specification.
For the new system to obtain trust, it cannot be unpredictable with regards to the validity of the outcome.
The prior examples will be addressed through modelling of system processes. When the modelling is insufficient, the system should call this out and stop, or do something useful about it.
An LLM will hallucinate and/or output poor quality. The purpose of a COUNCIL is to assess a given task outcome and decide on the next best step.
This means task outcomes that result in poor quality are both accepted as a risk and mitigated through the council themselves.
In fact we are not attempting to change a foundational model here - hallucinations are actually baked into the system of LLMs. Indeed we accept this as a risk and then mitigate it via decomposition of work to reasonably described tasks, council and consensus.
How does software get built to any reasonable quality?
What happens is roles get together in a little council and decide to agree or to disagree - on the definition of the job, on how many tasks it is decomposed into, on the acceptance criteria, the outcome of work being successful or not, whether the tests are useful or not.

This council happens in micro-meetings with perhaps some established processes or ceremonies - (set ticket FOO-123 status to "in test") - but it also happens adhoc in methods we have not properly captured.
The progression of any work is as a result of CONSENSUS. Where work proceeds with consensus of 1 (That is, the engineer says: it works, trust me), this where there is high risk.
Where there is NO consensus, escalation occurs. It is possible we will learn that some outcomes cannot be decided autonomously - the system could escalate/defer to the HUMAN in the loop. This is not strictly a failure - rather an outcome we will learn from. This escalation may inform higher quality role descriptions, yielding fewer escalations.
Work is carried out by a discretely named Role - for example a Tech Lead. A role has motivation, skills and relationships.

The output of the work is informed by all the aspects of the role itself, combined with the task at hand.
This output - the work - is then received by other Roles with different motivations Those Roles will then appraise the quality of work.
This is as true for a written document as it is for a unit test or a video game.
Our system then must capture the Role definition, motivations, skills and relationships accurately.
Memory in a company is "institutional" - stored in the employees. The challenge is that the HUMAN currently IS the ROLE - meaning they are not differentiated effectively, so "tribal knowledge" remains in the head of a specific human.
This means there is no such thing as a "QA Engineer" - rather there is a "QA Engineer called Sally".
Here is the conundrum - Sally has years of experience and value which are what contributes to the high-quality outcome of completed tasks with rigour. Sally is the memory. Sally "knows" the different ROLES to assume or to go find in the company to achieve the outcome.
This "institutional" memory needs to be modelled within our system:
A ROLE containing as much of the institutional memory as possible - describes the job function, motivation, objectives, boundaries and relationships between other ROLES. The expectation of the performance, inputs and outputs.
SKILLs - reusable technology capabilities that a ROLE calls upon to achieve a given objective.
STORY - A given pece of work contains the description of the objective itself, acceptance criteria, its place in the hierarchy, then it also contains its history, which roles did what to it, and their outcomes. The data model for a task then is significant as it will contain sections that are effectively a living document of change.
The theory is the state - the memory - is the aggregation of ROLES, SKILLS and a STORY.
For any piece of work then, it will have lineage - the effort taken, by who and when, the outcome of that work. This state represents the "history of the story" and can be useful informing future decisions.
This means any piece of work would have a history of effort and outcome for all the roles that "worked" on the work.
Caution will be needed as there is no upperbound to the history of a given problem - so using contemporary LLMs we will need to address this with regards to the relevant context sizes. My assumption is that as a story itself presents its own challenge, another task is then created to "prepare this story for the role" which may act as a summarisation agent.
As WORK is completed by a ROLE, the outcome will then be assessed by a COUNCIL. This is effectively another role - a "role of roles", which acts as the motivation to decide what is the appropriate next step.
INTERACTION (ASCII)
+--------+ +---------+ +----------------------+
| ROLE 1 |-> | OUTCOME |-> | COUNCIL |
| works | | "done" | | Role A/B/C assess |
+--------+ +---------+ +----------+-----------+
/ \
/ \
OK NOT OK
| |
+------------+ +------------+
| ROLE 2 | | ROLE 3 |
| next work | | rework |
+------------+ +------------+
It is NOT that the work is DONE, it is more that the work of the prior ROLE is finished for now and the decision to progress the work in any direction is the result of the council decision.
A story itself will be part of an EPIC, so there are hints in the story itself as to what the intent is. Additionally, an initial WORKFLOW structure will be provided by the OPERATOR which will act as advice on the available roles and various acceptance criteria that must be met.
This means that the outcome of a given piece of work by a given ROLE is just some claimed state - "I am a Programmer and I have finished my work according to the specification and acceptance criteria". But really the proof and quality of work is decided by the recipient - the COUNCIL.
A COUNCIL is an appropriate set of ROLES in any given moment - so it is not a fixed decision making process - different roles and motivations will apply depending on the memory of the work at that point in time.
Critical to the whole endeavour then is use of large language models. The decomposition of work into ROLES, SKILLS, TASKs and the nonlinearity of the work means I expect we will see in the first outcomes:
The introduction of evolutionary roles in a subsequent phase may then allow for novel role and skills descriptions however I think they are beyond the scope of our V1.
The use of LLMs then ARE the whole decision making process. This endeavour is about strictly controlling the entire context for any given prompt by modelling roles, skills, tasks.
Referring to the question at the top of the document, the outcomes will be directional:
An SDLC process is not cast in stone - it changes. The method of change is based on the outcomes, objectives, learning, experimentation, prior experience.
We will model this adaptive system with:
In V1, we won't address evolution but prepare for it. Once we have built up a corpus of tasks and their outcomes, we will then be able to use this as the learning material to see if we can create new ROLE descriptions or SKILLs that will yield a better outcome.
This document proposed an approach to automate SDLC. The challenges of Trust, Repeatability and Provability are addressed through the decomposition of ROLEs, TASKs and COUNCILs.
COMPONENTS (ASCII)
+----------------+ +------------------+ +----------------+
| OPERATOR |-->| ROLE DEFINITIONS |<->| SKILLS LIBRARY |
| (human) | | BA/Arch/Eng/QA | | shared skills |
+----------------+ +------------------+ +----------------+
+--------+ -> +---------+ -> +--------------+ -> +----------+
| STORY | | ROLE:BA | | COUNCIL (OK) | | ROLE:ENG |
+---+----+ +----+----+ +------+-------+ +-----+----+
^ | | |
| +----------------+------------------+
| rework loop (as needed)
+------------------------------------------------------------+
|
v
+-----------------+
| LIVING DOCUMENT |
| history/log |
+--------+--------+
|
v
+--------------+
| ESCALATION |
| no consensus |
+--------------+
The first version, tsf-1, will attempt to implement a reference factory and stop as it arrives at evolutionary roles. This means tsf1 will be
In the first iterations of tsf, a HUMAN operator will need to describe the ROLEs, SKILLS, conditions under which a COUNCIL operates, so as to bootstrap the system - as well as provide requirements.
This means the first version of the system will need simple to use tools (CLI, Website) to allow an operator to define, refine and experiment.
Consensus loosely can be described as "all ROLES in the current state of the work accept the outcome meets their objectives". This assumes each ROLE is of a high enough quality that we are not collapsing in a house of cards of false positives. Hence iteration in bootstrapping.