Probability rates of completion Scope for Burn-up
Introduction
“The earliest date by which the work could
conceivably be done makes an excellent goal but
an awful schedule.”
Tom DeMarco, Slack: Getting Past Burnout, Busywork,
and the Myth of Total Efficiency
This metric predicts the probability of scope completion using Normal (Gaussian) distribution and math statics.
One of the biggest problems for a project manager is to understand when scope can be accomplished. It’s especially hard in the real-world environment. For instance with constantly changing team performance, floating scope, various types of work.
This metric analyzes statistic of “added” work vs “done” work by short iteration and predict probability of crossing of these trends using Normal Cumulative Distribution.
In other words if we have two curves describing growth of the scope and the amount of tickets in “done“ status we want to know the probability of crossing these curves in the future.
To achieve that the metric analyzes average of scope changes and done work, and standard deviation for them. The probability is calculated by the following formula:
The metric uses following main parameters:
Jira type of work– bugs, stories, epics, etc.
NoEstimate vs Estimate– calculation based on number of tickets or story points
Scope change– take into account or ignore scope change statistics
Iteration duration– length of iteration to analyze in days
Iterations to analyze– statistics of how many iterations will be taken for analysis
Future iterations number– for how many iterations it predicts probability
- 1 Introduction
- 2 Creating a custom metric
- 3 Creating a Custom Selection
- 4 Fill the required parameters
- 5 Start date
- 6 Iteration days
- 7 Status Buckets
- 8 Dates and buckets
- 9 Retrieving data for scope-bucket
- 10 Retrieving data for done-bucket
- 11 Intermediate result
- 12 Last dates of iterations and switching between tickets and story points
- 13 Added tickets and velocity tickets
- 14 Averages for added and done tickets
- 15 Dispersions
- 16 Normal distribution parameters for future iterations
- 17 Dividing line segment between lower and higher integration limits
- 18 Normal Distribution values calculation
- 19 Simpson’s rule formula
- 20 Full code recap
Creating a custom metric
See Custom Metric development for details
Creating a Custom Selection
See Custom Metric development to see how to get to a Custom Selection mode.
Further will be the explanation of the code you should put in the “PerfQL” field
Fill the required parameters
First of all we need to define the parameters to make the forecast. The first and the second queries in the metric code are made to be changed as required by user.
variables as (
select
't' as calculate_by, -- choose the basement of calculation: tickets or story points
7 as iteration_duration, --in days
20 as iterations_to_analyze,
20 as future_iterations_num, --number of iterations to make forecast for
'no' as scope_change, --yes/no
200 as division_step --must be an even number!
),
types as (
select a.* from (values ('bug'), ('task')) a (ticket_type) -- add ticket types to make forecast
)
calculate_by: metric calculation can be based on two types of data - tickets and story points. If there’s a need to make forecast by tickets type (exactly!) ‘t' before “as calculate_by“ (third row in the code snippet above). If it’s not - type any other word/symbol in quotation marks instead.
iteration_duration: the timeline in the metric is divided into weeks, so this parameter is set to 7 days and there’s no need to change it.
iterations_to_analyze: type here a number of previous iterations including current one to use their data in the analysis.
future_iterations_num: the number of iterations to be shown in the chart with related probabilities.
scope_change: this parameter defines if we add tickets or story points to the scope in the future or not. If not - type 'no' before “scope_change“, else - type any other word/symbol in quotation marks before “scope_change“ instead.
division_step: this parameter helps to change the accuracy of calculations. The higher the number, the higher accuracy we can get. Also this number must be even and shouldn’t be too high to slow code down too much (for our purpose to get two decimals after floating point 200 is enough).
ticket_type: this is the list of types defining what types the tickets we retrieve with our query must have. To add new ticket type to the calculation type it in single quotes and parentheses in “values“ clause. Parts of “values“ clause must be separated by commas.
Start date
start_date_ as (
select date_trunc('week', current_date)::date
-
(select iterations_to_analyze*iteration_duration from variables) as start_date
)
In this query we calculate the start date of the first iteration defined by “iterations_to_analyze“ parameter. To do so we subtract number of iterations multiplied by duration of iterations from the first day of current week.
Example of the output:
Iteration days
iteration_days as (
select generate_series(a.min_d,
date_trunc('week', current_date)::date + a.iteration_duration * a.future_iterations_num,
'1 day'
)
as day_
from (
select
sd.start_date::date as min_d,
v.iteration_duration,
v.future_iterations_num
from start_date_ sd,
variables v
) a
)
In this query we generate all days from the start date to the last day of the last future iteration
Example of the output:
Status Buckets
This query creates small table of one column and two rows.
Example of the output:
Dates and buckets
Here we join two previous queries together to get the table of two columns where each date in our timeline has two status buckets
Example of the output:
Retrieving data for scope-bucket
This query retrieves all tickets with required type defined before regardless the status (done or not done).
Example of the output:
Retrieving data for done-bucket
This query retrieves all tickets with required type defined before with done status .
Example of the output:
Intermediate result
In this query we join together our timeline and status-buckets with tickets of scope and tickets with done-status. As a result we get the amount of our tickets and story points distributed by days. Also we calculate the number of every iteration and “N” parameter.
Example of the output:
Last dates of iterations and switching between tickets and story points
In this query we find the last days for every iteration to summarize numbers of every iteration (“sprint_end_date” column). Also we choose if we want to calculate probabilities by tickets or by story points according to the “calculate_by“ parameter we change or don’t change in the “variables“ query above (“scope“ and “done“ columns). “Where“ condition we have in this query helps us to define how many iterations before current iteration we want to take to calculate the parameters of normal distribution.
Example of the output:
Added tickets and velocity tickets
In this query we summarize scope and done columns for every iteration (scope and done columns) and calculate values for “added_in” column where the amount of added to scope tickets is shown, and “velocity“ column that shows how many tickets got “done“ status in every iteration. Values in “added_in” and “velocity“ columns have the same method of calculation: current value of scope/done column minus previous value of scope/done column.
Example of the output:
Averages for added and done tickets
Here we calculate averages for added tickets and for done tickets in “velocity“ column for all iterations in the past and all iterations in the future
Example of the output:
Dispersions
Now using averages we have found in the previous query we will calculate dispersions for added tickets and for velocity. We calculate dispersions accordingly to statistics rules. We use these formulas:
where MEin and MEout are the averages found in the previous step.
Example of the output:
Normal distribution parameters for future iterations
For each iteration having value of “n“ column larger or equal to 0 we need to calculate averages standard deviations. We calculate them accordingly to following formulas:
Every iteration has its own average and Standard deviation depending on the value of “n“ column.
Example of the output:
Dividing line segment between lower and higher integration limits
In order to find our probabilities we need for normal distribution function but we have normal distribution density function only:
To find probability we need to calculate a definite integral of the normal distribution density function:
We will calculate this integral on the line segment between “Work left“ parameter (Work Left = Scope - Done for the iteration with n=0) and the sum of average of the current iteration and standard deviation of the current iteration multiplied by 4. We chose to multiply standard deviation by 4 because four standard deviations from average (from ME - 4*sd to ME + 4*sd) is optimal value to cover almost 99.9% values of normal distribution (Fig. 5) and to provide the required accuracy.
What we want to find is the area under the curve of the normal distribution density function on the line segment between “Work Left” and ME+4*SD (Fig. 6).
To do so we have to divide this line segment into even number of pieces to use them later in normal distribution values calculation. In the query below we use generate_series() function to simulate division into pieces. The number of these parts can be changed with “division_step“ parameter in the “variables“ query described above. Also we calculate the “h” parameter - the length of one piece of the line segment.
Example of the output:
Normal Distribution values calculation
Here we calculate normal distribution values for every value in the “points” column within every iteration. Also we assign a rank (“index” in the query below) for every normal distribution value to use it later in Simpson’s rule formula.
Example of the output:
Simpson’s rule formula
To find the value of the definite integral we use the most accurate method of integration - Simpson’s method. The formula we will use:
We have already calculated all the parts included in this formula. What we should do is to multiply normal distribution values by 4 or by 2 depending on the value of “index“ column (if index is even then we multiply by 2 else by 4) except the first and the last value and then we summarize them and multiply the sum by value in the “h” column divided by 3. As a result we have our probabilities for every iteration.
Example of the output:
Full code recap
Copyright © 2022 EPAM Systems, Inc. |
---|
All Rights Reserved. All information contained herein is, and remains the property of EPAM Systems, Inc. and/or its suppliers and is protected by international intellectual property law. Dissemination of this information or reproduction of this material is strictly forbidden, unless prior written permission is obtained from EPAM Systems, Inc. |