AI Copilot Evaluation module

The AI Copilot Evaluation Module helps you maximize the value of coding assistants such as Github Copilot, Amazon Code Whisperer and others.

This module provides a view into adoption, developer sentiment, and downstream impact to help your organization:

  • Track adoption and use over time

  • Measure the time savings and economic benefit

  • Identify which teams benefit the most and how saved time is being reinvested

  • Monitor speed, quality, and security to mitigate unintended consequences and maximize value

Accessing the AI Copilot Evaluation Module

The AI Copilot Evaluation Module is accessible under the AI Copilot Evaluation folder within Faros Reports. It is a premium add-on in the Faros platform. Under Dashboards > Dashboards Directory, select Faros reports. It is also accessible via the Modules drop down at the top left of the Faros application. 


AI Copilot Evaluation Module Dashboards

The AI Copilot Evaluation Module is composed of four dashboards, which constitute a coding assistant value journey, from initial roll-out to larger scale deployments and long-term value optimization.

  •  The AI Copilot Evaluation Summary Dashboard provides an overview of the adoption, usage and ROI and deploying coding assistants in your organization

  •  The AI Copilot Evaluation A/B Test Dashboard is helpful for organizations evaluating coding assistants via a small scale pilot. It helps you compare different velocity and quality metrics that are likely to be most immediately impacted by using coding assistants

  •  The AI Copilot Evaluation Rollout Dashboard helps you analyze how key velocity and quality metrics are changing for your developers as coding assistants are rolled out more broadly and for a longer period of time

  •  The AI Copilot Evaluation Impact Dashboard gives insights into impact on more downstream velocity and quality metrics such as lead time or incidents for teams reaching a high level of coding assistant use

We will go into detail about each one and how to set them up below.


AI Copilot Evaluation Summary Dashboard

The AI Copilot Evaluation Summary Dashboard provides an overview of adoption, usage, time savings and ROI of deploying coding assistants for your organization.

All metrics can be filtered down by team and specific time period. For Github Copilot, it can also be filtered down by Github Org.

  • The desired time period is set by the Date filter at the top of the page, and defaults to the previous 3 months. 

  • The desired team(s) can be set using the Teams filter, and defaults to all teams. 

  • The Github Org can be set using the Github Org filter, and defaults to all orgs (if you use Github Copilot only)

When rolling out coding assistants, adoption and usage are the first indicators of how useful they are to your organization. We also recommend monitoring time savings to understand the potential extra capacity released and get a sense of the ROI for your organization of a broad roll-out. The Summary Of Impact section is composed of the following metrics:

Metric Description
AdoptionPercentage of Monthly Active Users, calculated by dividing the number of active users (at least 1 use of a coding assistant in a month) by the total number of users with a license
ActivityPercentage of Daily Active Users, calculated by dividing the number of active users (at least 1 use of a coding assistant in a day) by the total number of users with a license
Cumulative Time Savings Over TimeCumulative Time Savings for all developers in the selected organization, summing up time savings collected from surveys for that time period
Equivalent Economic BenefitROI for your organization to date, calculated by multiplying time savings by a flat engineering rate of $80/hour

The Adoption section helps you track how many licenses have been attributed to developers and how many are actively being used. It is helpful to track which teams are benefiting the most from the tool. It is composed of the following metrics:

Metric Description
Number of Devs with LicensesNumbers of developers with a license for a coding assistant tool
Number of Monthly Active UsersNumber of Monthly Active Users (at least 1 use of a coding assistant in a month) 
Number of Daily Active UsersNumber of Daily Active Users (at least 1 use of a coding assistant in a day) 
Number of Daily Active Users by Sub-OrgNumber of Daily Active Users (at least 1 use of a coding assistant in a day), broken down by sub-org

The Usage section provides a summary of coding assistant usage across all organization members, with a breakdown by language and editor. Note that this data is not available for all coding assistants. It is available for Github Copilot and Amazon Code Whisperer.

You can see lines of code generated and accepted, as well as acceptance rate, by org over time. You can also see a breakdown by programming language and editor.

The Time Savings section provides crucial insights into a key benefit for these tools: saving your developers time. We recommend monitoring time savings on a regular basis via self-reported surveys, at least during the pilot or early rollout sage. These surveys can also give you insights into which tasks are coding assistants most helpful for, which teams are getting the most benefits, and how different teams are reinvesting the extra capacity released.

Faros provides out of the box surveys which are available upon request.

Finally the Unused Licenses section helps you track licenses that are not being actively used to help you identify those you don’t need or teams that may need additional training to increase adoption.


AI Copilot Evaluation A/B Test Dashboard

We recommend evaluating coding assistants via a small scale pilot at first, where licenses are provided to a selected number of developers or teams. Ideally these developers are a diverse and representative set for the broader roll-out. Doing so, you can perform an A/B test between developers leveraging a coding assistant and those that do not, and compare performance and outcomes.

The AI Copilot Evaluation A/B Test Dashboard helps you compare different velocity and quality metrics that are likely to be most immediately impacted by using coding assistants tools.

All metrics can be filtered down by team and specific time period.

  • The desired time period is set by the Date filter at the top of the page, and defaults to the previous 6 months. 

  • The desired team(s) can be set using the Teams filter, and defaults to all teams. 

The Velocity section focuses on a key promise of coding assistants: save developers time while coding. See whether there is notable difference in velocity between developers using a coding assistant and those that do not with the following metrics:

Metric Description
PR Merge RateMerge Rate of Pull Requests (PRs) over time, for developers using a coding assistant and for those that do not. Helpful to track whether developers that use a coding assistant are shipping code faster than those who don’t, and if there is a trend over time
PR Merge Rate by CohortMerge Rate of Pull Requests in the time frame selected, averaged out by cohort. Helpful to track whether developers that use a coding assistant are shipping code faster on average than those who don’t
PR Review TimeAverage review time for pull requests authored by developers using or not using a coding assistant tool, over time. Are PRs authored by developers using a coding assistant reviewed faster because code is cleaner or more documented for example, or does it take longer because code is too concise or the authoring developer has less of an understanding of the code and takes longer to reply to comments.
PR Review Time by CohortAverage review time for pull requests authored in the time frame selected, averaged out by cohort. Helpful to track whether PRs authored by developers using a coding assistant are reviewed faster on average than those authored by developers who don’t
Task Throughput Number of tasks completed per developer over time, for developers using a coding assistant and for those that do not. Helpful to monitor whether developers using coding assistants go through tasks faster than those who do not
Task Throughput  by CohortNumber of tasks completed per developer in the time frame selected, averaged out by cohort. Helpful to track whether on average developers using a coding assistant get more tasks done than those who don’t, thus potentially impacting in a positive way your release schedule

Moving fast can have an adverse impact on quality. But coding assistants can also help writing tests or crisper code. The Quality section helps you see whether there is a notable difference in quality metrics between developers using a coding assistant and those that do not. It is composed of the following metrics:

Metric Description
PR SizeSize of Pull Requests (PRs) over time, for developers using a coding assistant and for those that do not. Smaller PRs are typically better. Are coding assistants making PRs larger or smaller over time?
PR Size by CohortSize of Pull Requests in the time frame selected, averaged out by cohort. Helpful to track whether developers that use a coding assistant are merging smaller PRs on average than those who don’t
PR Test CoverageTest coverage for pull requests authored by developers using or not using a coding assistant tool, over time. Coding Assistants are typically helpful to write tests, but moving fast can also impact test coverage. See which way code coverage is going for developers using a coding assistant tool
PR Test Coverage by CohortTest coverage for pull requests authored in the time frame selected, averaged out by cohort. Helpful to track whether PRs authored by developers using a coding assistant have higher or lower test coverage on average than those authored by developers who don’t
Code SmellsAverage number of code smells in pull requests authored by developers using a coding assistant and for those that do not. Helpful to monitor whether PRs authored by developers using coding assistants are getting more code smells, for example due to duplicate code
Code Smells by CohortAverage number of code smells in pull requests authored in the time frame selected, averaged out by cohort. Helpful to track whether PRs authored by developers using coding assistants are getting more code smells on average, thus potentially impacting negatively long term code quality

AI Copilot Evaluation Rollout Dashboard

As coding assistants are rolled out more broadly in your organization and for a longer period of time, we recommend analyzing how key velocity and quality outcomes are changing for your developers.

The AI Copilot Evaluation Rollout Dashboard compares average performance per user from before they were actively using a coding assistant to after, for developers who have started to use a coding assistant in the last 6 months. It works best when usage has increased significantly during that time period.

All metrics can be filtered down by team. 

  • The desired team(s) can be set using the Teams filter, and defaults to all teams.

The Usage section at the top shows the ratio of active and inactive users of a coding assistant tool in the last 6 months, which helps put the data below in perspective, as it will be more useful and interesting if usage has increased significantly during that time.

The Velocity section focuses on a key promise of coding assistants: save developers time while coding. See whether there is notable difference in velocity before and after using a coding assistant, and where you stand based on what can be expected based on industry standards and proprietary Faros data.

Metric Description
PR Merge Rate Before and After CopilotAverage Merge Rate of Pull Requests (PRs) for developers now using a coding assistant, compared to before they were using it, for the same set of developers. Helpful to track whether developers who use a coding assistant are shipping code faster than before
PR Merge Rate Percent ChangeChange in Merge Rate of Pull Requests before and after using a coding assistant, for developers now using a coding assistant. Is it going up for your organization?
PR Review Time Before and After CopilotAverage review time for pull requests authored by developers now using a coding assistant tool compared to before they were using it, for the same set of developers. Are PRs authored by developers now using a coding assistant reviewed faster because code is cleaner or more documented for example, or does it take longer because code is too concise or the authoring developer has less of an understanding of the code and takes longer to reply to comments.
PR Review Time Percent ChangeChange in review time for pull requests authored by developers now using a coding assistant, compared to before. Helpful to track whether PRs authored by developers now using a coding assistant are reviewed faster on average than before. Is it going down for your organization?
Task Cycle Time Before and After Copilot ****Average time for a task to be completed by developers before and after using a coding assistant. Helpful to monitor whether developers now using coding assistants go through tasks faster than before
Task Cycle Time Percent ChangeChange in task cycle time, before and after using a coding assistant. Is it going down for your organization?

AI Copilot Evaluation Impact Dashboard

Once your organization has deployed coding assistants more broadly and for a longer period of time, you can start evaluating impact on more downstream velocity and quality metrics.

The AI Copilot Evaluation Impact Dashboard helps you analyze this impact over time, for teams reaching a threshold of 50% of team members with active coding assistant use in a month.

All metrics can be filtered down by team and specific time period.

  • The desired time period is set by the Date filter at the top of the page, and defaults to the previous 6 months. 

  • The desired team(s) can be set using the Teams filter, and defaults to all teams. 

The Usage section provides a breakdown of usage by team. It is helpful to identify areas of strong adoption within your organization, and areas lagging behind or not benefiting as much.

The charts below are focused on teams for which usage of a coding assistant has reached at least 50%.

With broader usage and more use within a team, individual gains can start to translate into meaningful impact on downstream velocity metrics such as lead time. In some cases, however, these gains can be erased by bottlenecks in the software development lifecycle, such as longer review times. 

The Velocity section helps you see which teams are seeing meaningful improvements to bottom line velocity metrics and identify existing or new bottlenecks that prevent them from reaping maximum benefits.

Usage % corresponds to the percentage of active contributors in the selected teams who are also active with a coding assistant during the same time period. Impact on the metrics below can typically start being measurable when a team reaches at least 50% sustained usage.

It is composed of the following metrics:

Metric Description
PR Cycle Time vs Usage by Sub-OrgAverage Cycle Time of Pull Requests (PRs) and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
PR Cycle Time over Time and UsageAverage Cycle Time of Pull Requests and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on cycle time?
PR Cycle Time comparison when usage above 50%PR Cycle Time breakdown for months where usage of a coding assistant tool is above or below 50%. Helpful to understand shifting bottlenecks when using coding assistants, which could erase potential gains upstream
Lead Time vs Usage by Sub-OrgLead Time and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
Lead Time over Time and UsageLead time and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on lead time?
Lead Time comparison when usage above 50%Lead time breakdown for months where usage of a coding assistant tool is above or below 50%. Helpful to understand shifting bottlenecks when using coding assistants, which could erase potential gains upstream
Task Cycle Time vs Usage by Sub-OrgTask Cycle Time and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
Task Cycle Time over Time and UsageTask Cycle Time and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on cycle time?
Task Cycle Time comparison when usage above 50%Task Cycle Time breakdown for months where usage of a coding assistant tool is above or below 50%. Helpful to understand shifting bottlenecks when using coding assistants, which could erase potential gains upstream

With broader usage and more use within a team, downstream impact on quality can start to be felt more acutely, either in a positive or in a negative way. 

The Quality section helps you monitor whether coding assistants have an impact on incidents and other quality metrics. It is composed of the following metrics:

Metric Description
Bugs per Developer vs Usage by Sub-OrgBugs per Developer and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
Bugs per Developer over Time and UsageBugs per Developer and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on code quality?
Bugs per Developer comparison when usage above 50%Bugs per Developer for months where usage of a coding assistant tool is above or below 50%. 
Incidents per Developer over time vs Usage by Sub-OrgIncidents per Developer and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
Incidents per Developer over time  and UsageIncidents per Developer and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on quality?
Incidents per Developer comparison when usage above 50%Incidents per Developer for months where usage of a coding assistant tool is above or below 50%. 
Change Failure Rate vs Usage by Sub-OrgChange Failure Rate and Usage of a coding assistant, for each team in the selected org. See if there is a relationship between the two, by seeing where teams fall in this scatter plot. Each bubble is a team and its size represents the size of the team.
Change Failure Rate over Time and UsageChange Failure Rate and usage of a coding assistant tool, over time. As coding assistant usage is ramping up, is there an impact on quality?
Change Failure Rate comparison when usage above 50%Change Failure Rate for months where usage of a coding assistant tool is above or below 50%. 

Setting up the AI Copilot Evaluation Module 

The AI Copilot Evaluation Module is focused on providing a holistic view into the adoption, developer sentiment, and downstream impact of rolling-out a Coding Assistant in your organization.

You can get sarted by simply providing access to usage data from your coding assistant. To get additional insights into time savings, developer sentiment and benefits, you can optionally run developer surveys which can be ingested into Faros (templates are provided). To measure downstream impact on velocity, quality or security, additional data sources can be connected as described below.

Provide access to your coding assistant usage data

💡These instructions are specific to Github Copilot. If you use a different coding assistant, such as Amazon Code Whisperer, please contact the Faros team at [email protected] and we will help you set it up.

Please follow the instructions here to connect to GitHub. This is required to get PR data from GitHub as well as Copilot usage data. 

⚠️ Note: Make sure to add the following scope when generating the GitHub token: “manage_billing:copilot”. This is needed to get Copilot usage data from GitHub.

To be able to analyze the data by team, Faros also needs to capture your organizational structure. The easiest way to do that is to select the “Bootstrap Organization” option when setting up the GitHub Source. This will use the GitHub Orgs and Users to create your Teams in Faros. Alternatively, you can supply a list of Teams and Employees (see next step).

(Optional) Import Your Org Data

As an alternative to using the GitHub Source to create your organization and employees via the “Bootstrap Organization” option described above, you can instead supply a list of team members and the teams they belong to in your organization. You may want to use this approach when your functional teams differ from the way things are organized within GitHub. There are two main steps to this process.

  1. Create Teams and Employees using the instructions for Importing Org Data.

  2. Associate team members with their GitHub accounts. This can be done as part of step 1 if you have each team member’s GitHub user ID. Otherwise, you can use the Employee Bootstrap source to attempt to associate them automatically based on similarities.

(Optional) Set up your Copilot survey

Setting up a survey enables you to quantify time savings and collect qualitative data from your developers on their experience with their coding assistants, such as which tasks they are most helpful for, how they are planning to leverage the time they saved and satisfaction with the tool. You can leverage one of our templates, modify them to your needs, or use your own.

There are two main ways to trigger these surveys:

Triggering the survey when a PR is opened

Pros: Surveys the developers while they are in their flow, and the experience with the coding assistant is fresh in their minds.

Cons: Potential survey fatigue, which can be mitigated by adjusting the logic for when the survey is triggered, e.g. triggered only on large PRs or at most twice a week for each developer.

  1. Create your survey. You can clone this Copilot Survey as a starting point and adjust it to your needs.
  2. Create the trigger mechanism. We recommend setting up a GitHub Action on the desired repositories with a link to the survey, as explained here. Fine-tune the triggering logic to suit your organization.
  3. Ingest the survey results into Faros by following these instructions.

Note: If you significantly modify the survey template, some of the charts in the dashboards are likely to break. Please contact [email protected] if you need help.

Triggering the survey on a cadence (e.g weekly, bi-weekly, monthly)

Pros: Limits survey fatigue. Since developers will be prompted less frequently, you can also afford to ask a few more questions.

Cons: Answers to quantitative questions like ‘how much time was saved’ will likely be less accurate. The out-of-the-box dashboards do not currently support this option and will need to be modified to ingest the survey data. Please contact [email protected] if you need help.

  1. Create your survey. You can clone this Coding Assistant Weekly Survey Template as a starting point and adjust it to your needs.
  2. Ingest the survey results following these instructions.

(Optional) Connect additional data sources

You can optionally connect additional data sources to Faros to provide more holistic insights into the downstream impacts of GenAI tools on velocity and quality, including:

  • CI/CD pipelines: To get insights into the impact of your Coding Assistant on lead time, you can integrate Faros with your CI/CD systems following instructions here.
  • Static Code Analysis tools: To monitor quality data such as code coverage and code smells following the introduction of coding assistants, you can connect static code analysis tools such as SonarQube following instructions here.
  • Task Management Systems: To assess the impact of coding assistants on task throughput or number of bugs created for example, you can connect task management systems such as Jira following instructions here.