AI-Driven Development: A Framework for Rapid Quality Software Delivery

November 26, 2023 12:11 pm
blog-img

The journey from a code’s inception to its deliverable form is full of challenges—bugs, security vulnerabilities, and often, a ticking clock against tight delivery timelines. The traditional methods of tackling these challenges, like manual code reviews or bug tracking systems now appear to be sluggish against the growing demands of today’s fast-paced technological landscape. In most cases, product managers and their teams have to find a delicate equilibrium between reviewing code, fixing bugs, and adding new features if they actually want to deploy quality software on time. That’s where the capabilities of Large Language Models (LLMs) and Artificial Intelligence can lend a very efficient eye, analyzing more information in less time than even the most expert team of humans could.
According to Google’s State of DevOps Report 2023, speeding up code reviews is one of the most effective actions to improve software delivery performance. Teams that have successfully implemented faster code review strategies have—in average 50%—higher software delivery performance. However, the existence of LLMs and AI tools capable of aiding in these tasks are very recent, and most companies lack enough guidance or frameworks to integrate them in their processes. In fact, in the same Report from Google, when analyzing the importance of different practices in contributing to a variety of software development tasks, the average score companies gave to AI was 3.3/10. Meanwhile, the speed of code review was evaluated with 6.5/10. In other words, businesses are aware of the importance of faster code review speed, but don’t yet have a way to make AI aid them in that task effectively.
With this in mind, I created a framework driven by AI that diligently monitors and enhances the speed and quality of software development. By harnessing the power of source code analysis, this approach assesses the quality of the software being developed, classifies the maturity level of the development process, and provides valuable insights into the potential reductions in costs following quality improvements. Moreover, it encompasses a predictive component, empowering organizations to forecast the potential cost savings that can be achieved through the enhancement of software quality. Armed with these insights, stakeholders can make informed decisions regarding resource allocation and prioritize initiatives that drive quality improvements.

Low-quality software is costly

Numerous factors impact the cost and ease of resolving bugs and vulnerabilities, including their severity, complexity, the stage of the Software Development Life Cycle (SDLC) in which they are identified, availability of resources, quality of the code, communication and collaboration within the team, compliance requirements, impact on users and business, and the testing environment, among others. This multitude of elements makes calculating software development costs directly via algorithms challenging. But we can be certain that the cost of identifying and rectifying defects in software tends to increase exponentially as the software progresses through the SDLC. Fixing bugs during the early stages is more cost-effective and efficient compared to addressing them in later stages.
For instance, the National Institute of Standards and Technology reported that the cost of fixing a bug found during testing is five times more than fixing one identified during design, and the cost to fix bugs found during deployment can be six times higher than that.
Img1
The diverging cost implications of addressing software defects across different stages of the SDLC underline the economic rationale behind a proactive approach towards bug detection and resolution. But not only that, by fostering a culture of continuous improvement and learning, organizations are not merely fixing bugs; they are cultivating a mindset that constantly seeks to push the boundaries of what is achievable in software quality.

Entering AI-Driven Development

In essence, this methodology introduces a straightforward set of AI rules, driven by extensive code analysis data, to evaluate code quality and optimize it using a pattern matching-based machine learning approach. We estimate bug fix costs by considering developer and tester productivity across SDLC stages, comparing them to resources allocated for feature development: The higher the percentage of resources invested in feature development the lower the cost of bad quality and vice versa.
Img2

First Step: Assessing Quality, Defining a Benchmark

Just like it’s hard for a writer to define when his novel is ready and no further changes are required, the standards for good code quality and when to stop improving it are not easy to define. Code quality is relative and depends on various factors. Any quality assurance (QA) process compares the actual state of a product with something considered “perfect.” In automotive, the QA process matches an assembled car with its original design, considering the average number of imperfections detected all over the sample sets. In fintech, it’s usually defined by identifying transactions misaligned with the legal framework.
But never compare apples to oranges, especially in software development. The quality standard definition is often not so straightforward. If we were to compare the quality of one codebase to another that utilizes a completely different tech stack, serves a different market sector, or differs significantly in terms of maturity level, the conclusions on quality assurance could be misleading.
Let’s simplify this assessment by focusing on six key quality characteristics: defect density, code duplications, hardcoded tokens, security vulnerabilities, outdated packages, and the exposure to non-permissive open-source libraries.
Img3
Companies should prioritize the characteristics most relevant to their clients to minimize change requests and maintenance costs. While there could be more variables, the methodology remains the same.
After completing this internal assessment, it’s time to look for a point of reference of what can be considered high-quality software. Product managers should curate a collection of source code from products within their own same market sector. The code of open-source projects is publicly available and can be accessed from repositories on platforms like GitHub, GitLab, or the project’s own version control system. Subsequently, compute the average, maximum, and minimum values for the previously chosen characteristics.
Example:
Consider the Quality scorecard shown below
Img4
We have inspect the quality of widely used AI frameworks using different quality benchmarks:
Here is the quality score using a benchmark calculated aggregating results from a wide variety of OSS frameworks that have been implemented several tech stacks.
Img5
Compare it with quality inspection results using a C++ based quality benchmark
Img6

Codebase vs Benchmark

Through a comprehensive code analysis process that involves suitable linters (code scanners), SAST (Static Application Security Testing), software composition analysis tools, license compliance checks, and productivity analysis tools, we can calculate the chosen characteristics and assess them against our quality benchmark.
This is not a one-shot process; it’s an iterative one. In each iteration, our method focuses on a specific area, improves the code accordingly, and repeats the process until the quality aligns with the defined quality benchmark
Img7

Identification of the pain points if any

We should focus our source code improvement efforts on the most impactful areas such us the security, license compliance and product’s reliability. Complexity and productivity is expected to improve too after technical debt is calculated.
While numerous articles and books discuss the detrimental effects of managing technical debt, very few provide a concrete calculation method. Quantifying technical debt is indeed challenging unless it has been closely monitored over time alongside development and maintenance costs, as well as its impact on product success, which includes factors like downtime and client-reported bugs.
It’s crucial to encompass all software quality issues within the concept of technical debt, including those that may rarely occur but carry significant consequences or pose risks to a company’s reputation, such as potential regulatory violations.
Img8
Here ‘s an example of technical debt calculation
Img9

Application of the quality improvements suggestion.

Our method compiles discern information for each target audience The developers, the middle layer management and the executives.

Developers report overview

Here is an example overview of the report to the development team. The complete report should include all the details required to inspect and resolve the issues as well the reasoning for each reported issue.
Img10

Middle level management report overview

The Middle level report focuses on the risk and cost estimation. It should also provide enough information for code refactoring resource planning.
Img11

Executives report

The executive report should be short and comprehensive. The focus should be on on risk management and each risky should be associated with with and actionable risk mitigation suggestion.
Img12

Monitoring the software’s quality evolution in-daily basis.

Modify the quality model by adjusting the quality definition and quality benchmark. This adjustment should consider factors such as development costs, operation and maintenance costs, and alignment with business goals.
In early stages of software development, the quality target can be more lenient, but it should become stricter in later stages and after the product is released

The methodology’s implementation

Example

Step 1. Model Criteria selection

  • Defects density,
  • Code duplications,
  • Hardcoded tokens
  • Security vulnerabilities,
  • Outdated packages and
  • Exposure to non permissive open source libraries

Step 2. Calculate the quality benchmark,

Analyze 10 of the most widely used AI OSS frameworks using a general purpose quality benchmark.
Img13
We calculate new quality benchmark considering only the subset of these frameworks where the C++ is the primarily used.
Img14

Step 3. Static Code Analysis

Analyze 10 of the most widely used AI OSS frameworks using a general purpose quality benchmark.
Img15
Re-assess the quality of the C++ based AI open source frameworks considering the newly calculate quality benchmark.
Img16

Step 4.identify the pain points

Let consider the MXNET framework as an example
Let’s start inspecting tech stacks and checking if the composition of the team is aligned with the programming languages used.
Img17
The package analysis is shows below
Img18
The PyAML, pip library is outdated and at least three vulnerabilities are reported against the actually used version.
Img19
Let check also teh defects (bugs, best practices violations)
Img20
We see the the majority of the critical issues appear on the folder “\root\docs\static_site”

Step 5. Report the source code analysis results

  1. Executive’s overview
  2. Img21
  3. Middle-level managed report
  4. Img22
  5. Middle-level managed report
  6. Img23
  7. Developer/Engineers report
  8. Img24

Step 6. Monitor the code quality evolution over the time

We see the the majority of the critical issues appear on the folder “\root\docs\static_site”

Step 7. Understanding the Basics

Unveiling the True Costs of Subpar Software
Low-quality software imposes significant burdens on the development and operational aspects of any digital product. It not only inflates costs but also exposes businesses to regulatory risks, and, in the worst cases, tarnishes the reputation of both the developers and users of the product.
Elevating Software Quality with AI-Powered Solutions
Embracing an AI-driven methodology offers a transformative path to elevate software quality. Through the adoption of AI technology, we can model and automate best practices for code-quality improvement, involving all stakeholders in a manner that ensures the product’s success remains uncompromised.
Apples to Apples: The Benchmark for Software Quality
When evaluating software quality, it’s essential to avoid comparing apples and oranges. The benchmark for quality should be established by assessing codebases that implement similar products utilizing similar technology stacks. This approach provides a fair and relevant standard for assessment.
The Iterative Journey of Quality Improvement
Quality and security are not abstract concepts—they are vital and present throughout the software development lifecycle. Assessing source code quality should be an integral part of every new release, making quality improvement an iterative process that safeguards the integrity and success of your digital products.