Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 11 Current »

Introduction

“The earliest date by which the work could

conceivably be done makes an excellent goal but

an awful schedule.”

Tom DeMarco, Slack: Getting Past Burnout, Busywork,

and the Myth of Total Efficiency

This metric predicts the probability of scope completion using Normal (Gaussian) distribution and math statics.


One of the biggest problems for a project manager is to understand when scope can be accomplished. It’s especially hard in the real-world environment. For instance with constantly changing team performance, floating scope, various types of work.


This metric analyzes statistic of “added” work vs “done” work by short iteration and predict probability of crossing of these trends using Normal Cumulative Distribution.

In other words if we have two curves describing growth of the scope and the amount of tickets in “done“ status we want to know the probability of crossing these curves in the future.

To achieve that the metric analyzes average of scope changes and done work, and standard deviation for them. The probability is calculated by the following formula:

The metric uses following main parameters:

Jira type of work– bugs, stories, epics, etc.
NoEstimate vs Estimate– calculation based on number of tickets or story points
Scope change– take into account or ignore scope change statistics
Iteration duration– length of iteration to analyze in days
Iterations to analyze– statistics of how many iterations will be taken for analysis
Future iterations number– for how many iterations it predicts probability

Creating a custom metric

See Custom Metric development for details

Creating a Custom Selection

See Custom Metric development to see how to get to a Custom Selection mode.

Further will be the explanation of the code you should put in the “PerfQL” field

Fill the required parameters

First of all we need to define the parameters to make the forecast. The first and the second queries in the metric code are made to be changed as required by user.

variables as (
select 
        't' as calculate_by,      -- choose the basement of calculation: tickets or story points 
        7    as iteration_duration,     --in days
        20   as iterations_to_analyze,
  		20   as future_iterations_num,  --number of iterations to make forecast for
  	    'no' as scope_change,           --yes/no
  		200  as division_step           --must be an even number!
),

types as (
select  a.* from (values ('bug'), ('task')) a (ticket_type) -- add ticket types to make forecast
)

  1. calculate_by: metric calculation can be based on two types of data - tickets and story points. If there’s a need to make forecast by tickets type (exactly!) ‘t' before “as calculate_by“ (third row in the code snippet above). If it’s not - type any other word/symbol in quotation marks instead.

  2. iteration_duration: the timeline in the metric is divided into weeks, so this parameter is set to 7 days and there’s no need to change it.

  3. iterations_to_analyze: type here a number of previous iterations including current one to use their data in the analysis.

  4. future_iterations_num: the number of iterations to be shown in the chart with related probabilities.

  5. scope_change: this parameter defines if we add tickets or story points to the scope in the future or not. If not - type 'no' before “scope_change“, else - type any other word/symbol in quotation marks before “scope_change“ instead.

  6. division_step: this parameter helps to change the accuracy of calculations. The higher the number, the higher accuracy we can get. Also this number must be even and shouldn’t be too high to slow code down too much (for our purpose to get two decimals after floating point 200 is enough).

  7. ticket_type: this is the list of types defining what types the tickets we retrieve with our query must have. To add new ticket type to the calculation type it in single quotes and parentheses in “values“ clause. Parts of “values“ clause must be separated by commas.

Start date

start_date_ as (
select date_trunc('week', current_date)::date 
- 
(select iterations_to_analyze*iteration_duration from variables) as start_date
)

In this query we calculate the start date of the first iteration defined by “iterations_to_analyze“ parameter. To do so we subtract number of iterations multiplied by duration of iterations from the first day of current week.

Example of the output:

Iteration days

iteration_days as (
select  generate_series(a.min_d, 
                        date_trunc('week', current_date)::date + a.iteration_duration * a.future_iterations_num,
                        '1 day'
                        ) 
    as  day_
  from  (
select 
        sd.start_date::date  as min_d,
      	v.iteration_duration,
      	v.future_iterations_num
  from  start_date_ sd,
      	variables v
    )   a
)

In this query we generate all days from the start date to the last day of the last future iteration

Example of the output:

Status Buckets

buckets as (
select  a.* from (values ('scope'), ('done')) a (bucket)
)

This query creates small table of one column and two rows.

Example of the output:

Dates and buckets

Here we join two previous queries together to get the table of two columns where each date in our timeline has two status buckets

dates_buckets as (
 select 
        itd.day_, b.bucket
   from iteration_days itd, buckets b
)

Example of the output:

Retrieving data for scope-bucket

This query retrieves all tickets with required type defined before regardless the status (done or not done).

ticket_keys_created as (
 select 
        t.id,
        t.created,
        t.type,
        t.status,
        t.story_points,
        'scope' as category
   from 
        ticket t
  where
        lower(t.type) in (select ticket_type from types)
)

Example of the output:

Retrieving data for done-bucket

This query retrieves all tickets with required type defined before with done status .

ticket_keys_done as (
 select 
        t.id,
        t.type,
        t.status,
        t.done_date,
        t.story_points,
        'done' as category
   from 
        ticket t 
  where 
        lower(t.status) in ('closed')
    and lower(t.type) in (select ticket_type from types)
    and t.done_date >= (select sd.start_date::date from start_date_ sd)
)

Example of the output:

Intermediate result

In this query we join together our timeline and status-buckets with tickets of scope and tickets with done-status. As a result we get the amount of our tickets and story points distributed by days. Also we calculate the number of every iteration and “N” parameter.

intermediate_result as (
select 

        s.day_::timestamp,
        count(tkc.id)         as scope_tickets,
        count(tkd.id)         as done_tickets,
        sum(tkc.story_points) as scope_story_points,
        sum(tkd.story_points) as done_story_points,

        (s.day_::date - (select sd.start_date::date from start_date_ sd))
        /
        (select sd.iteration_duration from variables sd)+1 as iteration_n, 
     	 
        ((s.day_::date - (select sd.start_date::date from start_date_ sd))
        /
        (select v.iteration_duration from variables v)+1)
        -
        ((current_date - (select sd.start_date::date from start_date_ sd))
        /
        (select v.iteration_duration from variables v)+1) as "N"
           
   from 
        dates_buckets s
   left join 
        ticket_keys_created tkc
     on s.day_ between tkc.created and now()
    and s.bucket = tkc.category
   left join
        ticket_keys_done tkd
     on s.day_ between tkd.done_date and now()
    and s.bucket = tkd.category
  group by s.day_
  order by s.day_
  )

Example of the output:

Last dates of iterations and switching between tickets and story points

In this query we find the last days for every iteration to summarize numbers of every iteration (“sprint_end_date” column). Also we choose if we want to calculate probabilities by tickets or by story points according to the “calculate_by“ parameter we change or don’t change in the “variables“ query above (“scope“ and “done“ columns). “Where“ condition we have in this query helps us to define how many iterations before current iteration we want to take to calculate the parameters of normal distribution.

iteration_end_dates_ as (
select 
        day_, 
        max(day_) over (partition by iteration_n) as sprint_end_date,
        case 
            when (select v.calculate_by from variables v) = 'tickets'
            then scope_tickets
            else scope_story_points
        end as scope,
    
        case 
            when (select v.calculate_by from variables v) = 'tickets'
            then done_tickets
            else done_story_points
        end as done,
        n
   from intermediate_result
  where n >= - (select v.iterations_to_analyze from variables v)
)

Example of the output:

Added tickets and velocity tickets

In this query we summarize scope and done columns for every iteration (scope and done columns) and calculate values for “added_in” column where the amount of added to scope tickets is shown, and “velocity“ column that shows how many tickets got “done“ status in every iteration. Values in “added_in” and “velocity“ columns have the same method of calculation: current value of scope/done column minus previous value of scope/done column.

added_tickets_velocity_tickets as (    
select 
        day_,
        scope,
        done,
  		case 
            when n<=0 
            then done - (lag(done) over (order by n)) 
        end as velocity,
  		case 	
  			when 
                (select v.scope_change from variables v) = 'no'
  			then 0
  			else 
        		case 
                    when n<=0 
                    then scope - (lag(scope) over (order by n)) 
                 end 
  		 end as added_in,
        n
   from iteration_end_dates_
  where day_ = case n when 0 then current_date else sprint_end_date end
)

Example of the output:

Averages for added and done tickets

Here we calculate averages for added tickets and for done tickets in “velocity“ column for all iterations in the past and all iterations in the future

avg_in_avg_out as (    
select 
        day_,
        scope,
        done,
		added_in,
  	    velocity,
  	    avg(added_in) over () as AverageIn,
        avg(velocity) over () as AverageOut,
        n
  from  
        added_tickets_velocity_tickets
)

Example of the output:

Dispersions

Now using averages we have found in the previous query we will calculate dispersions for added tickets and for velocity. We calculate dispersions accordingly to statistics rules. We use these formulas:

where MEin and MEout are the averages found in the previous step.

disp_in_disp_out as (
select 
        day_,
        scope,
        done,  
        added_in,
  		velocity,
        AverageIn,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (added_in - AverageIn)*(added_in - AverageIn)
        end
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_in, --sigma(2)
        AverageOut,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (velocity - AverageOut)*(velocity - AverageOut)
        end 
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_out,
        n
from avg_in_avg_out
)

Example of the output:

Normal distribution parameters for future iterations

For each iteration having value of “n“ column larger or equal to 0 we need to calculate averages standard deviations. We calculate them accordingly to following formulas:

Every iteration has its own average and Standard deviation depending on the value of “n“ column.

meanX_standdevX as (
select 
        day_,
        scope,
        done,
        case 
            when n = 0
            then scope - done
  		    when n > 0
  		    then lag(scope - done, n) over (order by n)
         end as work_left,
        n,
        (case 
             when n>= 0
             then n
          end) * (AverageOut - AverageIn) as me,    --average
        
        sqrt(
            (case 
                when n>= 0
                then n
            end) * (disp_in + disp_out)
            ) as sd                                 --standard deviation
  from 
        disp_in_disp_out
)

Example of the output:

Dividing line segment between lower and higher integration limits

In order to find our probabilities we need for normal distribution function but we have normal distribution density function only:

To find probability we need to calculate a definite integral of the normal distribution density function:

We will calculate this integral on the line segment between “Work left“ parameter (Work Left = Scope - Done for the iteration with n=0) and the sum of average of the current iteration and standard deviation of the current iteration multiplied by 4. We chose to multiply standard deviation by 4 because four standard deviations from average (from ME - 4*sd to ME + 4*sd) is optimal value to cover almost 99.9% values of normal distribution (Fig. 5) and to provide the required accuracy.

What we want to find is the area under the curve of the normal distribution density function on the line segment between “Work Left” and ME+4*SD (Fig. 6).

To do so we have to divide this line segment into even number of pieces to use them later in normal distribution values calculation. In the query below we use generate_series() function to simulate division into pieces. The number of these parts can be changed with “division_step“ parameter in the “variables“ query described above. Also we calculate the “h” parameter - the length of one piece of the line segment.

points as (
select 
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        generate_series((work_left)::numeric,
                        (me+4*sd)::numeric,
                        (me+4*sd - work_left)::numeric
                        /
                        (select v.division_step from variables v)
                       
                       ) as points,
        
        (me+4*sd - work_left)::numeric
        /
        (select v.division_step from variables v) as h
   from 
        meanX_standdevX 
  where n > 0
  )

Example of the output:

Normal Distribution values calculation

Here we calculate normal distribution values for every value in the “points” column within every iteration. Also we assign a rank (“index” in the query below) for every normal distribution value to use it later in Simpson’s rule formula.

normal_dist_values as (    
select
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        dense_rank() over (partition by n order by points) as index,
        case 
            when h > 0
            then
            1::numeric/(sd*sqrt(2*pi())) * round(cast(exp(-power((points - me), 2)/(2*power(sd, 2))) as numeric), 4)
  		    else 0
  		   end as normal_dist_value
  from  points
  )

Example of the output:

Simpson’s rule formula

To find the value of the definite integral we use the most accurate method of integration - Simpson’s method. The formula we will use:

We have already calculated all the parts included in this formula. What we should do is to multiply normal distribution values by 4 or by 2 depending on the value of “index“ column (if index is even then we multiply by 2 else by 4) except the first and the last value and then we summarize them and multiply the sum by value in the “h” column divided by 3. As a result we have our probabilities for every iteration.

simpson_rule as (
select  
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        index,
        case 
                when index = 1 or index = (select v.division_step from variables v)+1
                then normal_dist_value
                when mod(index, 2) = 0
                then normal_dist_value*2
                when mod(index, 2) = 1
                then normal_dist_value*4
           end  as simpson_values
from
normal_dist_values
  )

select 
      to_char(day_, 'yyyy')||' - '||date_part('week', day_) as week_number,
      round((max(h)::numeric/3)*sum(simpson_values)::numeric*100, 2) as probability
 from simpson_rule
group by day_
order by day_

Example of the output:

Full code recap

 Click here to expand...
with

variables as (
 select 
        't'  as calculate_by,      -- choose the basement of calculation: tickets or story points
        7    as iteration_duration,     --in days
        20   as iterations_to_analyze,
  	    20   as future_iterations_num,  --number of iterations to make forecast for
  	   'no'  as scope_change,           --yes/no ()
  	   200   as division_step           --must be an even number!
),

types as (
select  a.* from (values ('bug'), ('task')) a (ticket_type) -- add ticket types to make forecast
),

start_date_ as (
select date_trunc('week', current_date)::date 
- 
(select iterations_to_analyze*iteration_duration from variables) as start_date
),

iteration_days as (
select  generate_series(a.min_d, 
                        date_trunc('week', current_date)::date + a.iteration_duration*a.future_iterations_num,
                        '1 day'
                        ) 
    as  day_
  from  (
select 
        sd.start_date::date  as min_d,
      	v.iteration_duration,
      	v.future_iterations_num
  from  start_date_ sd,
      	variables v
    )   a
),

buckets as (
select  a.* from (values ('scope'), ('done')) a (bucket)
),

dates_buckets as (
 select 
        itd.day_, b.bucket
   from iteration_days itd, buckets b
),

ticket_keys_created as (
 select 
        t.id,
        t.created,
        t.type,
        t.status,
        t.story_points,
        'scope' as category
   from 
        ticket t
  where
        lower(t.type) in (select ticket_type from types)
),

ticket_keys_done as (
 select 
        t.id,
        t.type,
        t.status,
        t.done_date,
        t.story_points,
        'done' as category
   from 
        ticket t 
  where 
        lower(t.status) in ('closed')
    and lower(t.type) in (select ticket_type from types)
    and t.done_date >= (select sd.start_date::date from start_date_ sd)
),

intermediate_result as (
select 

        s.day_::timestamp,
        count(tkc.id)         as scope_tickets,
        count(tkd.id)         as done_tickets,
        sum(tkc.story_points) as scope_story_points,
        sum(tkd.story_points) as done_story_points,

        (s.day_::date - (select sd.start_date::date from start_date_ sd))/(select sd.iteration_duration from variables sd)+1 as iteration_n, --also a week number or an iteration number
     	 
        ((s.day_::date - (select sd.start_date::date from start_date_ sd))/(select v.iteration_duration from variables v)+1)
        -
        ((current_date - (select sd.start_date::date from start_date_ sd))/(select v.iteration_duration from variables v)+1) as "N"
           
   from 
        dates_buckets s
   left join 
        ticket_keys_created tkc
     on s.day_ between tkc.created and now()
    and s.bucket = tkc.category
   left join
        ticket_keys_done tkd
     on s.day_ between tkd.done_date and now()
    and s.bucket = tkd.category
  group by s.day_
  order by s.day_
  ),

iteration_end_dates_ as (
select 
        day_, 
        max(day_) over (partition by iteration_n) as sprint_end_date,
        case 
            when (select v.calculate_by from variables v) = 't'
            then scope_tickets
            else scope_story_points
        end as scope,
    
        case 
            when (select v.calculate_by from variables v) = 't'
            then done_tickets
            else done_story_points
        end as done,
        n
   from intermediate_result
  where n >= - (select v.iterations_to_analyze from variables v)
),

added_tickets_velocity_tickets as (    
select 
        day_,
        scope,
        done,
  		  case 
                        when n<=0 
                        then done - (lag(done) over (order by n)) 
                   end  as velocity,
  		  case 	
  			when 
                        (select v.scope_change from variables v) = 'no'
  			then  0
  			else 
        		        case 
                                when n<=0 
                                then scope - (lag(scope) over (order by n)) 
                                end 
  		   end  as added_in,
        n
   from iteration_end_dates_
  where day_ = case n when 0 then current_date else sprint_end_date end
),

avg_in_avg_out as (    
select 
        day_,
        scope,
        done,
	added_in,
  	velocity,
  	avg(added_in) over () as AverageIn,
        avg(velocity) over () as AverageOut,
        n
  from  
        added_tickets_velocity_tickets
),

disp_in_disp_out as (
select 
        day_,
        scope,
        done,  
        added_in,
  		  velocity,
        AverageIn,
  coalesce(
(sum(   case 
            when 
            n <= 0 
            then 
            (added_in - AverageIn)*(added_in - AverageIn)
        end
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
    0) as disp_in, --sigma(2)
AverageOut,
  coalesce(
(sum(   case 
            when 
            n <= 0 
            then 
            (velocity - AverageOut)*(velocity - AverageOut)
        end 
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
    0) as disp_out,
n
from avg_in_avg_out
),

meanX_standdevX as (
select 
        day_,
        scope,
        done,
        case 
            when n = 0
            then scope - done
  		      when n > 0
  		      then lag(scope - done, n) over (order by n)
        end as work_left,
        n,
        (case 
                when n>= 0
                then n
        end)*(AverageOut - AverageIn) as me,
        
        sqrt(
            (case 
                when n>= 0
                then n
            end)*(disp_in + disp_out)
            ) as sd
  from 
        disp_in_disp_out
),

points as (
select 
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        generate_series((work_left)::numeric,
                        (me+4*sd)::numeric,
                        (me+4*sd - work_left)::numeric
                        /
                        (select v.division_step from variables v)
                       
                       ) as points,
        
        (me+4*sd - work_left)::numeric/(select v.division_step from variables v) as h
   from 
        meanX_standdevX 
  where n > 0
  ),
    
normal_dist_values as (    
select
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        dense_rank() over (partition by n order by points) as index,
        case 
            when h > 0
            then
            1::numeric/(sd*sqrt(2*pi())) * round(cast(exp(-power((points - me), 2)/(2*power(sd, 2))) as numeric), 4)
  		      else 0
  		   end as normal_dist_value
  from  points
  ),
 
simpson_rule as (
select  
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        index,
        case 
                when index = 1 or index = (select v.division_step from variables v)+1
                then normal_dist_value
                when mod(index, 2) = 0
                then normal_dist_value*2
                when mod(index, 2) = 1
                then normal_dist_value*4
           end  as simpson_values
from
normal_dist_values
  )

select 
      to_char(day_, 'yyyy')||' - '||date_part('week', day_) as week_number,
      round((max(h)::numeric/3)*sum(simpson_values)::numeric*100, 2) as probability
 from simpson_rule
group by day_
order by day_

Copyright © 2022 EPAM Systems, Inc.

All Rights Reserved. All information contained herein is, and remains the property of EPAM Systems, Inc. and/or its suppliers and is protected by international intellectual property law. Dissemination of this information or reproduction of this material is strictly forbidden, unless prior written permission is obtained from EPAM  Systems, Inc.

  • No labels