A specific elaboration of Test Phase-Space: A Rudimentary Mathematical Model of Testing Phenomenon in General


UNDER CONSTRUCTION

Note1: the equations presented herein are rudimentary; further inputs / corrections / suggestions are welcome.

Note2: although this article was written from the perspective of software testing, we believe this can be expanded to encompass all testing phenomena in general.

 

Summary: this article tries to describe testing using a Confidence Scale, with the various stages interrelated through mathematical equations. As a prerequisite, we base on the following assumptions:

 

Assumption#1.

 

confidence/level/quality scale: from 1(low) to 10(perfect), effectively divided into three ranges:

Low-confidence(1-3),

Medium-confidence(4-6),

High-confidence(7-9),

5 = median-confidence/level/quality

10 = "perfect-confidence"  which will be generally accepted as highly unlikely to be attainable.

(To those familiar with the Kepner-Tregoe method of problem / decision analysis, this scale may sound familiar.)

 

Assumption#2.

 

the general relationship from "idea" to "end-user-acceptance" is:

 

idea > coding.willpower > product > test.iterations > acceptance

 

we would like to somehow provide a mathematical relationship between these stages.

 

 

Equation-A. (idea > coding.willpower > product)

 

w = coding.Willpower-level

i = idea robustness-level

P = productConfidence

 

(w*i)/((w+i)/2)=P

 

this would...

a) put median product confidence at 5, perfect product confidence at 10, and so on:

{1 =< low <=3},  {4 =< med <=6}, {7 =< high <=9} ;

b) forecast that a low idea robustness (level 2) combined with a high codingWillpower (level 8), will still result in a low productConfidence (3) ; with the resolution being to increase idea robustness to increase the confidence level of the output product.

 

 

Equation-B.   relationship of: {coding.Willpower, coder.skill-set, coder.intention }

 

w = coding.Willpower

s = coder.Skill-set

N = coder.iNtent}

d = coding duration, similar to test duration -- see below Equation-D

w = ((0.5*s)+ln(N))*(0.649*ln(d))

    = ((0.5*s)+ln(N))*(0.649*ln(C/c))

 

Example B.

 

 

 

 

coding.willpower

skill

iNtent

d=C/c

 

10.91282

10

10

10

 

9.55023

9

9

9

 

7.31638

9

6

6

 

7.07046

9

3

7

a low intent, given enough time, may generate enough codingwillpower, provided there is an ultra-high level of skill involved

6.85500

6

8

8

 

6.38146

9

5

5

a high skillset, coupled with only median intent and median duration is here predicted to score less than a upper-medium skillset with higher intent and duration

6.24617

6

7

7

high intent (and an increase in coding duration) can actually grow a skill set, accelerating an increase in codingwillpower

5.86005

5

5

9

 

5.68303

9

1

7

an ultra low intent, even when given a reasonably sufficient coding duration might yield just enough as a 5-7-7 scenario

5.61472

5

7

7

median skill-level, with high intent, coupled with a substantial coding duration will provide just enough push for codingwillpower

5.57211

6

6

6

an upper medium skillset, with an upper medium intent, will be limited by the coding duration in which they would operate

5.32523

4

7

8

if a low skillset grows, coding willpower may increase given enough time

4.99068

5

6

6

 

4.81467

6

5

5

 

4.54467

5

3

7

high skill-level or median skill level with low intent, will not produce enough codingwillpower

4.35183

3

7

7

fairly-high intent, with low skill-level, will suffer from the constraints of the limited skillset available, even with a prolonged coding period

4.29241

5

5

5

median skill-level, with median intent, and operating on quite a limited time will not provide enough push for codingwillpower (a half-hearted job will not be very productive)

3.82783

3

6

6

 

3.24789

3

5

5

 

2.92231

6

3

3

 

2.56581

5

3

3

 

1.85281

3

3

3

 

 

from this we would concede that satisfactory product outputs mostly are created/extracted from the upper-mid to high range of this scale.

 

Equation-C. correlation of: (product > test.iterations > acceptance)

 

r = testing.iterations

T = TestLevel applied in the iteration(see below equation-E)

P = productConfidence

A = AcceptanceConfidence

 

r(T(P)/(T+P))=A

 

can be rewritten as,

 

0.84*(ln(r+5)*(T(P)/(T+P)))=A

 

this last equation introduces a logarithmic “saturation” limit for successive iterations “0.84*(ln(r+5)…” for the exact same unchanging contributors -- meaning if all other factors (T,i,w) remain unchanged, there would be an increasing level of AcceptanceConfidence as iterations increase, but eventually reach a saturation limit (where increasing iterations provide marginally less info than the previous runs.)

 

note that every time an iteration is run, if there are changes in any of the contributing factors, the r-value reverts to 1.

 

So, for the following values,

r = test-iterations = 1

T = TestLevel applied in the iteration = 7

P = productConfidence = 6.46154 {I=6 , w=7}

A = AcceptanceConfidence would be logarithmic, demonstrating increasing confidence as test iteration increases, but each successive iteration adds marginally less than the test results of previous runs (so long as the contributing factors remain the same).

 

Examples:

 

Example Ca.

AcceptanceConfidence of a fairly-high testlevel effort, applied to an upper-medium thought-out-idea, and coded with a fairly-high level of coding willpower. (This would suggest that software products with high AcceptanceConfidence scores are achievable only with fairly high levels of coding effort and test effort.)

 

A = 0.84x(ln(r+5)x(T(P)/(T+P)))

r (iteration)

T
(testlevel employed)

i
(idea robustness)

w
(Coding Willpower)

P
(product Confidence)

A (Acceptance
Confidence)

 

1

7

6

7

6.46154

5.05706

starts at Acceptance Confidence level 5 (median)

2

7

6

7

6.46154

5.49214

and Acceptance Confidence accelerates into mid-high level

3

7

6

7

6.46154

5.86902

 

4

7

6

7

6.46154

6.20145

 

5

7

6

7

6.46154

6.49882

 

6

7

6

7

6.46154

6.76782

 

7

7

6

7

6.46154

7.01340

 

8

7

6

7

6.46154

7.23931

 

9

7

6

7

6.46154

7.44848

 

9.5

7

6

7

6.46154

7.54752

 

9.99

7

6

7

6.46154

7.64132

 

9.999999

7

6

7

6.46154

7.64320

 

10

7

6

7

6.46154

7.64320

 

 

 

Example Cb.

AcceptanceConfidence of a median testlevel effort, applied to a median thought-out-idea, and coded with a median level of coding willpower

 

A = 0.84x(ln(r+5)x(T(P)/(T+P)))

r (iteration)

T
(testlevel employed)

i
(idea robustness)

w
(Coding Willpower)

P
(product Confidence)

A (Acceptance
Confidence)

 

1

5

5

5

5

3.7626949

Acceptance Confidence level starts out low

3

5

5

5

5

4.3668272

and approaches mid-level confidence as the testing proceeds

6

5

5

5

5

5.0355801

 

9

5

5

5

5

5.5420204

 

9.5

5

5

5

5

5.6157122

 

9.99

5

5

5

5

5.6855050

 

9.999999

5

5

5

5

5.6869053

 

 

 

Example Cc.

AcceptanceConfidence of a low testlevel effort, applied to a low thought-out-idea, and coded with a low level of coding willpower

A = 0.84x(ln(r+5)x(T(P)/(T+P)))

r (iteration)

T
(testlevel employed)

i
(idea robustness)

w
(Coding Willpower)

P
(product Confidence)

A (Acceptance
Confidence)

 

1

3

3

3

3

2.2576169

AcceptanceConfidence level starts and persits at low

3

3

3

3

3

2.6200963

given that there is quite little effort being put in

6

3

3

3

3

3.0213480

 

9

3

3

3

3

3.3252122

 

9.5

3

3

3

3

3.3694273

 

9.99

3

3

3

3

3.4113030

 

9.999999

3

3

3

3

3.4121432

 

 

 

Example Cd.

AcceptanceConfidence of a high testlevel effort, applied to a low thought-out-idea, and coded with a low level of coding willpower

 

A = 0.84x(ln(r+5)x(T(P)/(T+P)))

r (iteration)

T
(testlevel employed)

i
(idea robustness)

w
(Coding Willpower)

P
(product Confidence)

A (Acceptance
Confidence)

 

1

9

3

3

3

3.3864254

Acceptance Confidence level at first iteration starts low

3

9

3

3

3

3.9301445

and very slowly approaches mid-level confidence as the testing proceeds

6

9

3

3

3

4.5320221

 

9

9

3

3

3

4.9878184

 

9.5

9

3

3

3

5.0541409

 

9.99

9

3

3

3

5.1169545

 

9.99999

9

3

3

3

5.1182136

 

 

 

Example Ce.

AcceptanceConfidence of a low testlevel effort, applied to a highly thought-out-idea, and coded with a high level of coding willpower

 

A = 0.84x(ln(r+5)x(T(P)/(T+P)))

r (iteration)

T
(testlevel employed)

i
(idea robustness)

w
(Coding Willpower)

P
(product Confidence)

A (Acceptance
Confidence)

 

1

3

7

7

7

3.1606637

Acceptance Confidence remains low

3

3

7

7

7

3.6681349

given that there is little Testing employed to evaluate the output-product

6

3

7

7

7

4.2298873

 

9

3

7

7

7

4.6552971

 

9.5

3

7

7

7

4.7171982

 

9.99

3

7

7

7

4.7758242

 

9.99999

3

7

7

7

4.7769994

 

 

 

from these examples

 

A) examples Cd and Ce imply that "a less-tested, highly-engineered product would score less AcceptanceConfidence (A=4.65, r=9)" than "a highly-tested, low-engineered product (A=4.98, r=9)". meaning, a highly tested product will have better predictability-acceptability than an untested/under-tested one. 

 

B) in the traditional sense of pass-fail:

 

B.1) Acceptance Confidence roughly corresponds to pass-fail scores:

B1.1) most probably scores below 5 = the customary Fail;

B1.2) most probably scores equal to or above 5 = the customary Pass, or Pass with Conditions

 

B.2) except, AcceptanceConfidence means much more; it is a measure of User Acceptance -- meaning

B2.1) example Cd and Ce would seem to indicate that both products “Fail” in the traditional sense, but

B2.2) that is where the term “AcceptanceConfidence” transcends traditional concepts of Pass/Fail

B2.2.1) Examples Cd and Ce both have low confidence scores because

a) Example Cd was low-engineered in the first place, so no amount of testing would increase its score

b) Example Ce, although highly-engineered, hasn’t undergone enough testing (both effort and scope) to discover any possible critical flaws

c) which is why it seems that Acceptance Confidence scores would be much better indicators than Pass/Fail tagging

 

C) Insertion of issues could occur wherever the human element is present. In this case at i (idea), w (codingWillpower), T (testLevelemployed)

 

 

Equation-D.   math correlation for: {test-duration, idea.Complexity, client.clamour for release}

 

d = test.duration

(9 = longer test duration; 1 = shorter test duration)

C = idea.Complexity

c = client.clamour for release

 

d = C/c

This would seem to indicate that the test duration would be higher/longer if the complexity is higher, and inversely proportional to the degree of clamour for release; so the higher the clamour for release, the shorter the test duration would be.

 

And the correlation between duration.completion vs. testing backlogs would be...

 

G = test.drag-level

(the amount of unfinished backlog tasks,tickets,WIP falling behind perceived schedule)

 

G =(10-d) 

This would mean the shorter the test duration, the larger the backlog, the higher the test drag-level would be.

 

Equation-E.   And the correlation of {test-duration, idea.Complexity, client.clamour for release, Tester.intent, tester.skill, Scope.tested, Test.level.applied} would be:

S = scope

N = tester.intent

s = testing skill

d = C/c  (as above)

T  = {N,s}+{d,S}

 

    = 0.72*((((2/3)*N)+Ln(s))+Ln(d)+Ln(S))

 

    = 0.72*((((2/3)*N)+Ln(s))+Ln(C/c)+Ln(S))

 

T
(test level applied)

 N
(intent)

s
(skill)

C
(complexitiy)

c
(clamour)

S
(scope)

9.0660

9

9

9

1

9

7.8970

8

7

5

1

8

7.7861

8

6

5

1

8

7.4183

8

3

6

1

8

7.0786

7

5

5

1

7

6.8421

7

3

6

1

7

6.7109

7

3

5

1

7

6.7109

7

5

3

1

7

6.5796

7

5

5

2

7

6.3431

7

3

3

1

7

6.2309

6

5

3

1

7

6.2118

7

5

3

2

7

5.4601

6

6

6

6

6

4.7176

5

5

5

5

5

3.0220

3

3

3

3

3

0.4800

1

1

1

1

1

 

A few takeaways here, generally speaking:

a) Tester intent ((2/3)*N) has a heavier impact, than tester skill (Ln(s))

b) Factors that contribute to a higher Test-level-applied, would be

a) the longer the test duration (Ln(d)),

i. which in turn is directly proportional to the Complexity of the idea being tested vs. the clamour for release (Ln(C/c)).

b) the larger the scope that is tested / explored (Ln(S))

All of which seem to correspond with real world experiences.

 

 

Equation-F.   correlation of: {Scope, Risk, scenarios}

S = scope

R = risk

e = scenarios involved in the Scope

m = number of scenarios above median risk ( count Re > 5 )

z = total number of scenarios

 

S = (0.4189*Ln(m+0.000000001)) + ((Re1, Re2, Re3, Re4, Re5 … ReZ)/z)

This would indicate that the Scope of testing is not based solely on the number of test scenarios but also on the risk level associated with each scenario.This would forecast that a large number of low risk scenarios would give a Scope value of 5 or less; and a 66% ratio (or higher) consisting of major/critical risk scenarios would provide a satisfactory scope to work with.

 

In hindsight, all of the things about heuristics, oracles, mapping, BVAs -- all focus on this one equation -- how to get a better handle of scope.

 

Example Fa (short table)

S

Re1

Re2

Re3

Re4

Re5

Re6

Re7

Re8

Re9

count Re > 5

9.920

9

9

9

9

9

9

9

9

9

9

7.295

6

6

6

6

6

7

8

6

6

9

7.251

8

8

8

8

8

8

3

1

2

6

6.920

6

6

6

6

6

6

6

6

6

9

6.876

7

6

7

5

6

7

6

5

5

6

6.674

9

9

9

9

9

1

1

1

4

5

6.581

9

9

9

9

3

3

3

3

3

4

6.001

6

6

6

6

6

6

3

3

3

6

5.549

6

6

6

6

6

3

3

3

3

5

5.460

9

6

5

5

7

3

3

2

2

3

5.174

7

7

7

7

2

2

1

3

7

5

5.000

5

5

5

5

5

5

5

5

6

1

3.750

9

3

3

3

3

3

3

3

3

1

3.665

2

2

2

2

1

5

6

7

5

2

-5.681

3

3

3

3

3

3

3

3

3

0

Example Fb (expanded table)

S

Re1

Re2

Re3

Re4

Re5

Re6

Re7

Re8

Re9

Re10

Re11

Re12

Re13

Re14

Re15

Re16

Re17

Re18

count Re > 5

10.211

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

18

7.377

6

6

6

6

6

7

8

6

6

6

6

6

6

6

6

6

6

6

18

7.211

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

6

18

6.994

7

7

7

7

2

2

1

3

7

7

7

7

7

7

7

7

7

7

14

6.520

5

5

5

5

5

5

5

5

6

6

6

6

6

6

6

6

6

6

10

5.917

7

6

7

3

6

7

6

3

3

5

5

5

5

5

5

5

5

5

6

5.563

9

9

9

9

9

1

1

1

4

4

4

4

4

4

4

4

4

4

5

4.914

9

9

9

9

3

3

3

3

3

3

3

3

3

3

3

3

3

3

4

4.751

8

8

8

8

8

8

3

1

2

2

2

2

2

2

2

2

2

2

6

4.751

6

6

6

6

6

6

3

3

3

3

3

3

3

3

3

3

3

3

6

4.568

2

2

2

2

1

5

6

7

5

5

5

5

5

5

5

5

5

5

2

4.508

6

6

6

6

6

3

3

3

3

3

3

3

3

3

3

3

3

3

5

3.794

9

6

5

5

7

3

3

2

2

2

2

2

2

2

2

2

2

2

3

3.333

9

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

1

-5.681

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

0

 

 

Equation-G.   evaluating issues (bugs) -- relating { Bug risk to (UI risk and function risk) }

R(b) = Risk level of a bug/issue

R(UI) = Risk level observed in the UI

R(fx) = Risk level observed in the functionality

 

R(b) =ln(2)*((ln(2.9)*Rui)+(ln(3)*Rfx))

R(b) 

R(UI)

R(fx)

13.496

9

9

11.282

6

9

11.211

9

6

9.068

3

9

8.997

6

6

8.927

9

3

7.592

1

9

7.404

9

1

6.783

3

6

6.713

6

3

5.307

1

6

5.190

6

1

4.499

3

3

3.023

1

3

2.976

3

1

1.500

1

1

 

This presupposes that an issue / bug is not a UI issue alone, nor a functional issue alone. A bug/issue is always a combination of {R(UI) ,R(fx) }.

The values above 6 would be considered critical to get fixed,

the values around 5 or 6 would be major,

and the values below 5 would be low risk.

In the words of a famous tester, “learn from every bug” -- indeed in the table, there are a number of bugs/issues tied to low R(UI)  values, but have a high functional risk R(fx) attached. This would corroborate real world experiences where an issue / bug displaying a low R(UI)  turned out to have an underlying major / critical functional issue (R(fx)) attached to It -- because every bug is a tandem of {R(UI) ,R(fx) }. The only exceptions being where R(UI) or R(fx) is zeroed out; for example: integration issues that have no UI involved, R(UI) = 0, and Cosmetic flaws with no function involved, R(fx) = 0.

 

Equation-H.   correlation of: {feature-execution.completion, Acceptance.confidence level, environment}

 

F = feature/sub-feature completion-execution

A = acceptance-confidence level

E = environment {data,preconditons,conditions}

F = (0.76*A) + ln(E) 

This means that even though both E and A contribute to this phenomenon, a feature’s (or sub-feature’s) completion of execution depends more on the code’s Acceptance.confidence than on the data that drives it.

 

 

================================

Comments

  1. For further elucidation /revision (after consulting with James Bach https://www.linkedin.com/in/james-bach-6188a811/)

    ReplyDelete

Post a Comment

Popular posts from this blog

updated: Regression Tests as a normal part of day to day exploration / Regression Testing

The Quality Advocate (Test supervisor / manager / lead)

Test Report1