Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents
minLevel1
maxLevel1

Creating a metric

See Custom Metric development for details

Creating a Custom Selection

See Custom Metric development to see how to get to a Custom Selection mode.

Further will be the explanation of the code you should put in the “PerfQL” field

Introduction

We want to build a metric that is capable of predicting the probability of scope completion in several future iterations. In other words if we have two curves describing growth of the scope and the amount of tickets in “done“ status we want to know the probability of crossing these curves in the future.

...

To do so we will use the data about scope and done curves for every iteration in the past and normal distribution law as one of the most widespread law for many processes in the world. Now let’s explore the sql-query calculating required probabilities.

Fill the required parameters

First of all we need to define the parameters to make the forecast. The first and the second queries in the metric code are made to be changed as required by user.

...

  1. calculate_by: metric calculation can be based on two types of data - tickets and story points. If there’s a need to make forecast by tickets type (exactly!) ‘t' before “as calculate_by“ (third row in the code snippet above). If it’s not - type any other word/symbol in quotation marks instead.

  2. iteration_duration:the timeline in the metric is divided into weeks, so this parameter is set to 7 days and there’s no need to change it.

  3. iterations_to_analyze: type here a number of previous iterations including current one to use their data in the analysis.

  4. future_iterations_num: the number of iterations to be shown in the chart with related probabilities.

  5. scope_change: this parameter defines if we add tickets or story points to the scope in the future or not. If not - type 'no' before “scope_change“, else - type any other word/symbol in quotation marks before “scope_change“ instead.

  6. division_step: this parameter helps to change the accuracy of calculations. The higher the number, the higher accuracy we can get. Also this number must be even and shouldn’t be too high to slow code down too much (for our purpose to get two decimals after floating point 200 is enough).

  7. ticket_type: this is the list of types defining what types the tickets we retrieve with our query must have. To add new ticket type to the calculation type it in single quotes and parentheses in “values“ clause. Parts of “values“ clause must be separated by commas.

Start date

Code Block
languagesql
start_date_ as (
select date_trunc('week', current_date)::date 
- 
(select iterations_to_analyze*iteration_duration from variables) as start_date
)

In this query we calculate the start date of the first iteration defined by “iterations_to_analyze“ parameter. To do so we subtract number of iterations multiplied by duration of iterations from the first day of current week.

Example of the output:

...

Iteration days

Code Block
breakoutModewide
languagesql
iteration_days as (
select  generate_series(a.min_d, 
                        date_trunc('week', current_date)::date + a.iteration_duration * a.future_iterations_num,
                        '1 day'
                        ) 
    as  day_
  from  (
select 
        sd.start_date::date  as min_d,
      	v.iteration_duration,
      	v.future_iterations_num
  from  start_date_ sd,
      	variables v
    )   a
)

In this query we generate all days from the start date to the last day of the last future iteration

Example of the output:

...

Status Buckets

Code Block
languagesql
buckets as (
select  a.* from (values ('scope'), ('done')) a (bucket)
)

This query creates small table of one column and two rows.

Example of the output:

...

Dates and buckets

Here we join two previous queries together to get the table of two columns where each date in our timeline has two status buckets

Code Block
languagesql
dates_buckets as (
 select 
        itd.day_, b.bucket
   from iteration_days itd, buckets b
)

Example of the output:

...

Retrieving data for scope-bucket

This query retrieves all tickets with required type defined before regardless the status (done or not done).

Code Block
languagesql
ticket_keys_created as (
 select 
        t.id,
        t.created,
        t.type,
        t.status,
        t.story_points,
        'scope' as category
   from 
        ticket t
  where
        lower(t.type) in (select ticket_type from types)
)

Example of the output:

...

Retrieving data for done-bucket

This query retrieves all tickets with required type defined before with done status .

Code Block
languagesql
ticket_keys_done as (
 select 
        t.id,
        t.type,
        t.status,
        t.done_date,
        t.story_points,
        'done' as category
   from 
        ticket t 
  where 
        lower(t.status) in ('closed')
    and lower(t.type) in (select ticket_type from types)
    and t.done_date >= (select sd.start_date::date from start_date_ sd)
)

Example of the output:

...

Intermediate result

In this query we join together our timeline and status-buckets with tickets of scope and tickets with done-status. As a result we get the amount of our tickets and story points distributed by days. Also we calculate the number of every iteration and “N” parameter.

Code Block
breakoutModewide
languagesql
intermediate_result as (
select 

        s.day_::timestamp,
        count(tkc.id)         as scope_tickets,
        count(tkd.id)         as done_tickets,
        sum(tkc.story_points) as scope_story_points,
        sum(tkd.story_points) as done_story_points,

        (s.day_::date - (select sd.start_date::date from start_date_ sd))
        /
        (select sd.iteration_duration from variables sd)+1 as iteration_n, 
     	 
        ((s.day_::date - (select sd.start_date::date from start_date_ sd))
        /
        (select v.iteration_duration from variables v)+1)
        -
        ((current_date - (select sd.start_date::date from start_date_ sd))
        /
        (select v.iteration_duration from variables v)+1) as "N"
           
   from 
        dates_buckets s
   left join 
        ticket_keys_created tkc
     on s.day_ between tkc.created and now()
    and s.bucket = tkc.category
   left join
        ticket_keys_done tkd
     on s.day_ between tkd.done_date and now()
    and s.bucket = tkd.category
  group by s.day_
  order by s.day_
  )

Example of the output:

...

Last dates of iterations and switching between tickets and story points

In this query we find the last days for every iteration to summarize numbers of every iteration (“sprint_end_date” column). Also we choose if we want to calculate probabilities by tickets or by story points according to the “calculate_by“ parameter we change or don’t change in the “variables“ query above (“scope“ and “done“ columns). “Where“ condition we have in this query helps us to define how many iterations before current iteration we want to take to calculate the parameters of normal distribution.

...

Code Block
languagesql
iteration_end_dates_ as (
select 
        day_, 
        max(day_) over (partition by iteration_n) as sprint_end_date,
        case 
            when (select v.calculate_by from variables v) = 'tickets'
            then scope_tickets
            else scope_story_points
        end as scope,
    
        case 
            when (select v.calculate_by from variables v) = 'tickets'
            then done_tickets
            else done_story_points
        end as done,
        n
   from intermediate_result
  where n >= - (select v.iterations_to_analyze from variables v)
)

Example of the output:

...

Added tickets and velocity tickets

In this query we summarize scope and done columns for every iteration (scope and done columns) and calculate values for “added_in” column where the amount of added to scope tickets is shown, and “velocity“ column that shows how many tickets got “done“ status in every iteration. Values in “added_in” and “velocity“ columns have the same method of calculation: current value of scope/done column minus previous value of scope/done column.

Code Block
languagesql
added_tickets_velocity_tickets as (    
select 
        day_,
        scope,
        done,
  		case 
            when n<=0 
            then done - (lag(done) over (order by n)) 
        end as velocity,
  		case 	
  			when 
                (select v.scope_change from variables v) = 'no'
  			then 0
  			else 
        		case 
                    when n<=0 
                    then scope - (lag(scope) over (order by n)) 
                 end 
  		 end as added_in,
        n
   from iteration_end_dates_
  where day_ = case n when 0 then current_date else sprint_end_date end
)

Example of the output:

...

Averages for added and done tickets

Here we calculate averages for added tickets and for done tickets in “velocity“ column for all iterations in the past and all iterations in the future

Code Block
languagesql
avg_in_avg_out as (    
select 
        day_,
        scope,
        done,
		added_in,
  	    velocity,
  	    avg(added_in) over () as AverageIn,
        avg(velocity) over () as AverageOut,
        n
  from  
        added_tickets_velocity_tickets
)

Example of the output:

...

Dispersions

Now using averages we have found in the previous query we will calculate dispersions for added tickets and for velocity. We calculate dispersions accordingly to statistics rules. We use these formulas:

...

Code Block
breakoutModewide
languagesql
disp_in_disp_out as (
select 
        day_,
        scope,
        done,  
        added_in,
  		velocity,
        AverageIn,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (added_in - AverageIn)*(added_in - AverageIn)
        end
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_in, --sigma(2)
        AverageOut,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (velocity - AverageOut)*(velocity - AverageOut)
        end 
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_out,
        n
from avg_in_avg_out
)

Example of the output:

...

Normal distribution parameters for future iterations

For each iteration having value of “n“ column larger or equal to 0 we need to calculate averages standard deviations. We calculate them accordingly to following formulas:

...

Code Block
languagesql
meanX_standdevX as (
select 
        day_,
        scope,
        done,
        case 
            when n = 0
            then scope - done
  		    when n > 0
  		    then lag(scope - done, n) over (order by n)
         end as work_left,
        n,
        (case 
             when n>= 0
             then n
          end) * (AverageOut - AverageIn) as me,    --average
        
        sqrt(
            (case 
                when n>= 0
                then n
            end) * (disp_in + disp_out)
            ) as sd                                 --standard deviation
  from 
        disp_in_disp_out
)

Example of the output:

...

Dividing line segment between lower and higher integration limits

In order to find our probabilities we need for normal distribution function but we have normal distribution density function only:

...

Code Block
languagesql
points as (
select 
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        generate_series((work_left)::numeric,
                        (me+4*sd)::numeric,
                        (me+4*sd - work_left)::numeric
                        /
                        (select v.division_step from variables v)
                       
                       ) as points,
        
        (me+4*sd - work_left)::numeric
        /
        (select v.division_step from variables v) as h
   from 
        meanX_standdevX 
  where n > 0
  )

Example of the output:

...

Normal Distribution values calculation

Here we calculate normal distribution values for every value in the “points” column within every iteration. Also we assign a rank (“index” in the query below) for every normal distribution value to use it later in Simpson’s rule formula.

Code Block
breakoutModewide
languagesql
normal_dist_values as (    
select
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        dense_rank() over (partition by n order by points) as index,
        case 
            when h > 0
            then
            1::numeric/(sd*sqrt(2*pi())) * round(cast(exp(-power((points - me), 2)/(2*power(sd, 2))) as numeric), 4)
  		    else 0
  		   end as normal_dist_value
  from  points
  )

Example of the output:

...

Simpson’s rule formula

To find the value of the definite integral we use the most accurate method of integration - Simpson’s method. The formula we will use:

...

Code Block
languagesql
simpson_rule as (
select  
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        index,
        case 
                when index = 1 or index = (select v.division_step from variables v)+1
                then normal_dist_value
                when mod(index, 2) = 0
                then normal_dist_value*2
                when mod(index, 2) = 1
                then normal_dist_value*4
           end  as simpson_values
from
normal_dist_values
  )
  
select 
      to_char(day_, 'yyyy')||' - '||to_char(day_, 'ww') as week_number,
      round((max(h)::numeric/3)*sum(simpson_values)::numeric, 2) as probability
 from simpson_rule
group by day_
order by day_

Example of the output:

...

Full code recap

Expand
Code Block
languagesql
with

variables as (
 select 
        't'  as calculate_by,      -- choose the basement of calculation: tickets or story points
        7    as iteration_duration,     --in days
        10   as iterations_to_analyze,
  	    20   as future_iterations_num,  --number of iterations to make forecast for
  	   'no'  as scope_change,           --yes/no ()
  	    200  as division_step           --must be an even number!
),

types as (
select  a.* from (values ('bug'), ('task')) a (ticket_type) -- add ticket types to make forecast
),

start_date_ as (
select date_trunc('week', current_date)::date 
- 
(select iterations_to_analyze*iteration_duration from variables) as start_date
),

iteration_days as (
select  generate_series(a.min_d, 
                        date_trunc('week', current_date)::date + a.iteration_duration*a.future_iterations_num,
                        '1 day'
                        ) 
    as  day_
  from  (
select 
        sd.start_date::date  as min_d,
      	v.iteration_duration,
      	v.future_iterations_num
  from  start_date_ sd,
      	variables v
    )   a
),

buckets as (
select  a.* from (values ('scope'), ('done')) a (bucket)
),

dates_buckets as (
 select 
        itd.day_, b.bucket
   from iteration_days itd, buckets b
),

ticket_keys_created as (
 select 
        t.id,
        t.created,
        t.type,
        t.status,
        t.story_points,
        'scope' as category
   from 
        ticket t
  where
        lower(t.type) in (select ticket_type from types)
),

ticket_keys_done as (
 select 
        t.id,
        t.type,
        t.status,
        t.done_date,
        t.story_points,
        'done' as category
   from 
        ticket t 
  where 
        lower(t.status) in ('closed')
    and lower(t.type) in (select ticket_type from types)
    and t.done_date >= (select sd.start_date::date from start_date_ sd)
),

intermediate_result as (
select 

        s.day_::timestamp,
        count(tkc.id)         as scope_tickets,
        count(tkd.id)         as done_tickets,
        sum(tkc.story_points) as scope_story_points,
        sum(tkd.story_points) as done_story_points,

        (s.day_::date - (select sd.start_date::date from start_date_ sd))/(select sd.iteration_duration from variables sd)+1 as iteration_n, --also a week number or an iteration number
     	 
        ((s.day_::date - (select sd.start_date::date from start_date_ sd))/(select v.iteration_duration from variables v)+1)
        -
        ((current_date - (select sd.start_date::date from start_date_ sd))/(select v.iteration_duration from variables v)+1) as "N"
           
   from 
        dates_buckets s
   left join 
        ticket_keys_created tkc
     on s.day_ between tkc.created and now()
    and s.bucket = tkc.category
   left join
        ticket_keys_done tkd
     on s.day_ between tkd.done_date and now()
    and s.bucket = tkd.category
  group by s.day_
  order by s.day_
  ),

iteration_end_dates_ as (
select 
        day_, 
        max(day_) over (partition by iteration_n) as sprint_end_date,
        case 
            when (select v.calculate_by from variables v) = 't'
            then scope_tickets
            else scope_story_points
        end as scope,
    
        case 
            when (select v.calculate_by from variables v) = 't'
            then done_tickets
            else done_story_points
        end as done,
        n
   from intermediate_result
  where n >= - (select v.iterations_to_analyze from variables v)
),

added_tickets_velocity_tickets as (    
select 
        day_,
        scope,
        done,
  		case 
            when n<=0 
            then done - (lag(done) over (order by n)) 
        end  as velocity,
  		case 	
  			when 
                 (select v.scope_change from variables v) = 'no'
  			then 0
  			else 
        		case 
                    when n<=0 
                    then scope - (lag(scope) over (order by n)) 
                 end 
  		end  as added_in,
        n
   from iteration_end_dates_
  where day_ = case n when 0 then current_date else sprint_end_date end
),

avg_in_avg_out as (    
select 
        day_,
        scope,
        done,
		added_in,
  	    velocity,
  	    avg(added_in) over () as AverageIn,
        avg(velocity) over () as AverageOut,
        n
  from  
        added_tickets_velocity_tickets
),

disp_in_disp_out as (
select 
        day_,
        scope,
        done,  
        added_in,
  		velocity,
        AverageIn,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (added_in - AverageIn)*(added_in - AverageIn)
        end
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_in, --sigma(2)
        AverageOut,
        coalesce(
        (sum(case 
            when 
            n <= 0 
            then 
            (velocity - AverageOut)*(velocity - AverageOut)
        end 
        ) over ()
                )/nullif((select -n-1 from iteration_end_dates_ order by n limit 1), 0),
        0) as disp_out,
        n
from avg_in_avg_out
),

meanX_standdevX as (
select 
        day_,
        scope,
        done,
        case 
            when n = 0
            then scope - done
  		    when n > 0
  		    then lag(scope - done, n) over (order by n)
        end as work_left,
        n,
        (case 
                when n>= 0
                then n
        end)*(AverageOut - AverageIn) as me,
        
        sqrt(
            (case 
                when n>= 0
                then n
            end)*(disp_in + disp_out)
            ) as sd
  from 
        disp_in_disp_out
),

points as (
select 
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        generate_series((work_left)::numeric,
                        (me+4*sd)::numeric,
                        (me+4*sd - work_left)::numeric
                        /
                        (select v.division_step from variables v)
                       
                       ) as points,
        
        (me+4*sd - work_left)::numeric/(select v.division_step from variables v) as h
   from 
        meanX_standdevX 
  where n > 0
  ),
    
normal_dist_values as (    
select
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        dense_rank() over (partition by n order by points) as index,
        case 
            when h > 0
            then
            1::numeric/(sd*sqrt(2*pi())) * round(cast(exp(-power((points - me), 2)/(2*power(sd, 2))) as numeric), 4)
  		    else 0
  		end as normal_dist_value
  from  points
  ),
 
simpson_rule as (
select  
        day_,
        scope,
        done,
        n, 
        work_left,
        me,
        sd,
        points,
        h,
        index,
        case 
                when index = 1 or index = (select v.division_step from variables v)+1
                then normal_dist_value
                when mod(index, 2) = 0
                then normal_dist_value*2
                when mod(index, 2) = 1
                then normal_dist_value*4
          end  as simpson_values
from
normal_dist_values
  )

select 
      to_char(day_, 'yyyy')||' - '||date_part('week', day_) as week_number,
      round((max(h)::numeric/3)*sum(simpson_values)::numeric, 2) as probability
 from simpson_rule
group by day_
order by day_

Copyright © 2022 EPAM Systems, Inc.

All Rights Reserved. All information contained herein is, and remains the property of EPAM Systems, Inc. and/or its suppliers and is protected by international intellectual property law. Dissemination of this information or reproduction of this material is strictly forbidden, unless prior written permission is obtained from EPAM  Systems, Inc.