...
Table of Contents | ||||
---|---|---|---|---|
|
Creating a metric
See Custom Metric development for details
Creating a Custom Selection
See Custom Metric development to see how to get to a Custom Selection mode.
Further will be the explanation of the code you should put in the “PerfQL” field
Introduction
We want to build a metric that is capable of predicting the probability of scope completion in several future iterations. In other words if we have two curves describing growth of the scope and the amount of tickets in “done“ status we want to know the probability of crossing these curves in the future.
...
To do so we will use the data about scope and done curves for every iteration in the past and normal distribution law as one of the most widespread law for many processes in the world. Now let’s explore the sql-query calculating required probabilities.
Fill the required parameters
First of all we need to define the parameters to make the forecast. The first and the second queries in the metric code are made to be changed as required by user.
...
calculate_by: metric calculation can be based on two types of data - tickets and story points. If there’s a need to make forecast by tickets type (exactly!) ‘t' before “as calculate_by“ (third row in the code snippet above). If it’s not - type any other word/symbol in quotation marks instead.
iteration_duration:the timeline in the metric is divided into weeks, so this parameter is set to 7 days and there’s no need to change it.
iterations_to_analyze: type here a number of previous iterations including current one to use their data in the analysis.
future_iterations_num: the number of iterations to be shown in the chart with related probabilities.
scope_change: this parameter defines if we add tickets or story points to the scope in the future or not. If not - type 'no' before “scope_change“, else - type any other word/symbol in quotation marks before “scope_change“ instead.
division_step: this parameter helps to change the accuracy of calculations. The higher the number, the higher accuracy we can get. Also this number must be even and shouldn’t be too high to slow code down too much (for our purpose to get two decimals after floating point 200 is enough).
ticket_type: this is the list of types defining what types the tickets we retrieve with our query must have. To add new ticket type to the calculation type it in single quotes and parentheses in “values“ clause. Parts of “values“ clause must be separated by commas.
Start date
Code Block | ||
---|---|---|
| ||
start_date_ as ( select date_trunc('week', current_date)::date - (select iterations_to_analyze*iteration_duration from variables) as start_date ) |
In this query we calculate the start date of the first iteration defined by “iterations_to_analyze“ parameter. To do so we subtract number of iterations multiplied by duration of iterations from the first day of current week.
Example of the output:
...
Iteration days
Code Block | ||||
---|---|---|---|---|
| ||||
iteration_days as ( select generate_series(a.min_d, date_trunc('week', current_date)::date + a.iteration_duration * a.future_iterations_num, '1 day' ) as day_ from ( select sd.start_date::date as min_d, v.iteration_duration, v.future_iterations_num from start_date_ sd, variables v ) a ) |
In this query we generate all days from the start date to the last day of the last future iteration
Example of the output:
...
Status Buckets
Code Block | ||
---|---|---|
| ||
buckets as ( select a.* from (values ('scope'), ('done')) a (bucket) ) |
This query creates small table of one column and two rows.
Example of the output:
...
Dates and buckets
Here we join two previous queries together to get the table of two columns where each date in our timeline has two status buckets
Code Block | ||
---|---|---|
| ||
dates_buckets as ( select itd.day_, b.bucket from iteration_days itd, buckets b ) |
Example of the output:
...
Retrieving data for scope-bucket
This query retrieves all tickets with required type defined before regardless the status (done or not done).
Code Block | ||
---|---|---|
| ||
ticket_keys_created as ( select t.id, t.created, t.type, t.status, t.story_points, 'scope' as category from ticket t where lower(t.type) in (select ticket_type from types) ) |
Example of the output:
...
Retrieving data for done-bucket
This query retrieves all tickets with required type defined before with done status .
Code Block | ||
---|---|---|
| ||
ticket_keys_done as ( select t.id, t.type, t.status, t.done_date, t.story_points, 'done' as category from ticket t where lower(t.status) in ('closed') and lower(t.type) in (select ticket_type from types) and t.done_date >= (select sd.start_date::date from start_date_ sd) ) |
Example of the output:
...
Intermediate result
In this query we join together our timeline and status-buckets with tickets of scope and tickets with done-status. As a result we get the amount of our tickets and story points distributed by days. Also we calculate the number of every iteration and “N” parameter.
Code Block | ||||
---|---|---|---|---|
| ||||
intermediate_result as ( select s.day_::timestamp, count(tkc.id) as scope_tickets, count(tkd.id) as done_tickets, sum(tkc.story_points) as scope_story_points, sum(tkd.story_points) as done_story_points, (s.day_::date - (select sd.start_date::date from start_date_ sd)) / (select sd.iteration_duration from variables sd)+1 as iteration_n, ((s.day_::date - (select sd.start_date::date from start_date_ sd)) / (select v.iteration_duration from variables v)+1) - ((current_date - (select sd.start_date::date from start_date_ sd)) / (select v.iteration_duration from variables v)+1) as "N" from dates_buckets s left join ticket_keys_created tkc on s.day_ between tkc.created and now() and s.bucket = tkc.category left join ticket_keys_done tkd on s.day_ between tkd.done_date and now() and s.bucket = tkd.category group by s.day_ order by s.day_ ) |
Example of the output:
...
Last dates of iterations and switching between tickets and story points
In this query we find the last days for every iteration to summarize numbers of every iteration (“sprint_end_date” column). Also we choose if we want to calculate probabilities by tickets or by story points according to the “calculate_by“ parameter we change or don’t change in the “variables“ query above (“scope“ and “done“ columns). “Where“ condition we have in this query helps us to define how many iterations before current iteration we want to take to calculate the parameters of normal distribution.
...
Code Block | ||
---|---|---|
| ||
iteration_end_dates_ as ( select day_, max(day_) over (partition by iteration_n) as sprint_end_date, case when (select v.calculate_by from variables v) = 'tickets' then scope_tickets else scope_story_points end as scope, case when (select v.calculate_by from variables v) = 'tickets' then done_tickets else done_story_points end as done, n from intermediate_result where n >= - (select v.iterations_to_analyze from variables v) ) |
Example of the output:
...
Added tickets and velocity tickets
In this query we summarize scope and done columns for every iteration (scope and done columns) and calculate values for “added_in” column where the amount of added to scope tickets is shown, and “velocity“ column that shows how many tickets got “done“ status in every iteration. Values in “added_in” and “velocity“ columns have the same method of calculation: current value of scope/done column minus previous value of scope/done column.
Code Block | ||
---|---|---|
| ||
added_tickets_velocity_tickets as ( select day_, scope, done, case when n<=0 then done - (lag(done) over (order by n)) end as velocity, case when (select v.scope_change from variables v) = 'no' then 0 else case when n<=0 then scope - (lag(scope) over (order by n)) end end as added_in, n from iteration_end_dates_ where day_ = case n when 0 then current_date else sprint_end_date end ) |
Example of the output:
...
Averages for added and done tickets
Here we calculate averages for added tickets and for done tickets in “velocity“ column for all iterations in the past and all iterations in the future
Code Block | ||
---|---|---|
| ||
avg_in_avg_out as ( select day_, scope, done, added_in, velocity, avg(added_in) over () as AverageIn, avg(velocity) over () as AverageOut, n from added_tickets_velocity_tickets ) |
Example of the output:
...
Dispersions
Now using averages we have found in the previous query we will calculate dispersions for added tickets and for velocity. We calculate dispersions accordingly to statistics rules. We use these formulas:
...
Code Block | ||||
---|---|---|---|---|
| ||||
disp_in_disp_out as ( select day_, scope, done, added_in, velocity, AverageIn, coalesce( (sum(case when n <= 0 then (added_in - AverageIn)*(added_in - AverageIn) end ) over () )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0), 0) as disp_in, --sigma(2) AverageOut, coalesce( (sum(case when n <= 0 then (velocity - AverageOut)*(velocity - AverageOut) end ) over () )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0), 0) as disp_out, n from avg_in_avg_out ) |
Example of the output:
...
Normal distribution parameters for future iterations
For each iteration having value of “n“ column larger or equal to 0 we need to calculate averages standard deviations. We calculate them accordingly to following formulas:
...
Code Block | ||
---|---|---|
| ||
meanX_standdevX as ( select day_, scope, done, case when n = 0 then scope - done when n > 0 then lag(scope - done, n) over (order by n) end as work_left, n, (case when n>= 0 then n end) * (AverageOut - AverageIn) as me, --average sqrt( (case when n>= 0 then n end) * (disp_in + disp_out) ) as sd --standard deviation from disp_in_disp_out ) |
Example of the output:
...
Dividing line segment between lower and higher integration limits
In order to find our probabilities we need for normal distribution function but we have normal distribution density function only:
...
Code Block | ||
---|---|---|
| ||
points as ( select day_, scope, done, n, work_left, me, sd, generate_series((work_left)::numeric, (me+4*sd)::numeric, (me+4*sd - work_left)::numeric / (select v.division_step from variables v) ) as points, (me+4*sd - work_left)::numeric / (select v.division_step from variables v) as h from meanX_standdevX where n > 0 ) |
Example of the output:
...
Normal Distribution values calculation
Here we calculate normal distribution values for every value in the “points” column within every iteration. Also we assign a rank (“index” in the query below) for every normal distribution value to use it later in Simpson’s rule formula.
Code Block | ||||
---|---|---|---|---|
| ||||
normal_dist_values as ( select day_, scope, done, n, work_left, me, sd, points, h, dense_rank() over (partition by n order by points) as index, case when h > 0 then 1::numeric/(sd*sqrt(2*pi())) * round(cast(exp(-power((points - me), 2)/(2*power(sd, 2))) as numeric), 4) else 0 end as normal_dist_value from points ) |
Example of the output:
...
Simpson’s rule formula
To find the value of the definite integral we use the most accurate method of integration - Simpson’s method. The formula we will use:
...
Code Block | ||
---|---|---|
| ||
simpson_rule as ( select day_, scope, done, n, work_left, me, sd, points, h, index, case when index = 1 or index = (select v.division_step from variables v)+1 then normal_dist_value when mod(index, 2) = 0 then normal_dist_value*2 when mod(index, 2) = 1 then normal_dist_value*4 end as simpson_values from normal_dist_values ) select to_char(day_, 'yyyy')||' - '||to_char(day_, 'ww') as week_number, round((max(h)::numeric/3)*sum(simpson_values)::numeric, 2) as probability from simpson_rule group by day_ order by day_ |
Example of the output:
...
Full code recap
Expand | |||||
---|---|---|---|---|---|
|
Copyright © 2022 EPAM Systems, Inc. |
---|
All Rights Reserved. All information contained herein is, and remains the property of EPAM Systems, Inc. and/or its suppliers and is protected by international intellectual property law. Dissemination of this information or reproduction of this material is strictly forbidden, unless prior written permission is obtained from EPAM Systems, Inc. |