Document Type : Research Article
Authors
1
Entrepreneurship Development Department, Faculty of Entrepreneurship, University of Tehran, Tehran, Iran.
2
New Business Department, Faculty of Entrepreneurship, University of Tehran, Tehran, Iran.
Abstract
Introduction: The emergence of generative AI and large language models has turned the development of ChatGPT-based extensions into a main route of software innovation; yet the life-cycle of such tools—shaped by data drift, constant base-model updates and the difficulty of collecting early user feedback—does not fit linear or even classical agile templates. The absence of a process that unifies rapid prompt iteration, heuristic evaluation and LLMOps maintenance creates a marked operational gap. Traditional waterfall and even Scrum assume early, stable user insight, something rarely available when prompts substitute for hard requirements and model updates modify behaviour overnight. Without a dedicated process, teams risk either premature user exposure or endless internal tinkering. This study therefore seeks to bridge the gap by proposing a non-linear, flexible process that guides the creation and evolution of ChatGPT extensions, reconnecting software-engineering theory with the concrete demands of GenAI systems.
Methodology: The research followed a qualitative, inductive content-analysis design. Twelve experts in AI, product design and software engineering participated in semi-structured interviews lasting 40–80 minutes each. Transcripts were double-coded for reliability; inter-coder comparison ensured consistency. The interview guide covered need identification, concept evolution, quality assessment, integration with external agents, process differences and security concerns. Initial codes were mapped onto the Stage-Gate framework and then iteratively refined until theoretical saturation emerged, yielding the final pattern. Coding proceeded in NVivo; disagreements were resolved through discussion until kappa exceeded 0.75, indicating acceptable reliability. This constant-comparison approach preserved analytical rigour while allowing new categories to surface.
Findings: Coding produced the “BML-H cycle”: (1) vision-driven building, (2) internal test-design, (3) heuristic evaluation replacing early external feedback, (4) a composite learning loop—heuristic, qualitative and quantitative, (5) heuristic decision-making for iteration or progression, (6) innovators’ feedback after reaching an MVP, and (7) finalisation coupled with continuous LLMOps maintenance. The process proved parallel, adaptive and capable of keeping pace with rapid base-model change. Unlike classical BML, the cycle starts from a vision rather than a falsifiable hypothesis and treats developer intuition as a first-class evaluation metric until a usable prototype permits metric-based testing.
Conclusion: The BML-H cycle demonstrates that success in designing ChatGPT extensions hinges on fusing developer intuition with systematically staged feedback. By embedding heuristic judgement, multi-metric dashboards such as HELM and ongoing model upkeep in a single loop, the model balances exploration speed with responsibility and quality. Consequently, it narrows the gap between agile theory and the intricate realities of GenAI, offering practitioners, researchers and start-ups a concrete roadmap for delivering generative-AI products to market faster, with lower risk and higher added value. For policy-makers, the framework highlights the need to couple regulatory guidance with tooling that supports traceable prompt evolution. For educators, it offers a scaffold for curricula that merge software-engineering, HCI and machine-learning operations. Future studies can extend the model with quantitative checkpoints and automate parts of the heuristic loop.
Keywords
Subjects