An Artificial Intelligence to automate bug detection and correction, capable of reducing debugging and code maintenance time.

The idea

During the "AI-Disruption Hackathon" held by the Reply Group, an international team (including a colleague of Technology Reply) developed, in 48 hours, a demo application to detect bugs in code snippets and to provide a version without them. The idea is that a developer submits pieces of code to this application, which, through an analysis employing a neural network, creates a list of possible errors in the code. The developer can then select which of these errors she or he wants to correct, and the application generates a version without them.
The team developed the two requests (“prompts”), one for detection, one for bug fixes. Repeated sampling from the neural network allow the desired result to be achieved.

Codex: the solution!

The solution is based on Codex, a set of Artificial Intelligence engines developed by OpenAI as a specialization of GPT-3 (Generative Pre-trained Transformer 3). GPT-3 is a machine learning neural network able to generate natural language texts similar to those produced by humans.
Codex is a version of GPT-3 further trained with publicly available code written in a wide variety of programming languages (Python, SQL, etc.). The result is a neural network capable of understanding and generating both natural and programming languages. Such a network can thus be used for analyzing and synthesizing source code from simple instructions written in natural language.


Synthesizing programs from docstrings, the short texts used to document a specific segment of code was the main goal for which GPT-3 has been further trained with machine code. Measuring functional correctness for synthesizing executable code from docstrings (i.e., measuring whether the generated code performs the required operation correctly) has shown that its accuracy is greater than 70 percent.
While Codex interoperates with many programming languages without the need for any specific configuration, it was mainly trained with Python. Therefore, during the Hackathon we tested our application only with Python code, leaving the other programming languages for future case studies.

As modern mail services (GMail, Outlook, etc.) automatically offer suggestions of the text we are composing, it is possible to use Codex to perform automatic completion of the line of code we are typing.
With the aim of enhancing programmers’ productivity, a version of Codex refined for this specific task has already been integrated into some of the tools used to develop code (IDEs, "Integrated Development Environment") such as IntelliJ, Visual Studio, etc.
However, it remains to be measured the real impact on developers' productivity, their quality of life, and their compensation.

Self-contained pieces of code can be restated maintaining functional correctness (i.e., they keep on performing the same task), while increasing readability and thus maintainability.
Repeated invocations of Codex have proven to be a surprisingly effective strategy for improving code readability. Applying this technique to different self-contained portions of a software project can cut down the maintainability time of entire applications.
For example, in an application developed with an object-oriented language, it is possible to replace an intricate implementation of a method with more intelligible a version of it. The method thus becomes more easily adjustable in case of syntax/functional errors, or even implementation-level vulnerabilities

Codex is capable of generating source code in a dozen of the most popular programming languages (Python, SQL, Java, etc.) from as many different natural languages (English, Italian, etc.). This can be leveraged by developers to:

  1. 1) Increase their skills in a specific programming language;
  2. 2) Translate requests formulated in natural language into working code;
  3. 3) Learn how certain functionality can be implemented using specific libraries or API.

However, it is still to be measured the effectiveness of Codex in reducing barriers to entry for the field on the one hand, in educational and career progression of both developers and software engineers, on the other.


Codex's engines, offered on a pay-per-use formula, are extremely efficient and do not require any learning phase to be used. Indeed, they have already been trained on a huge dataset of both text and code. However, as Codex is a "general purpose" neural network, it is essential that the request (“prompt”) for the desired task is formulated in a way that achieves the expected behavior for the specific use case.
Thus, the most interesting challenge has been to fine-tune the set of settings for the two desired tasks: bug detection and bug fixing. This involved choosing the specific Codex "engine," formulating the proper prompt to be executed and, most importantly, setting its parameters correctly.