Sapienza trust model derivation showing equivalence with random answers

So far we have stated that the trust model in Sapienza ( https://ceur-ws.org/Vol-1664/w9.pdf) is the same as modeling the untrustworthy part of a source as random. What does this mean? If I trust my source 90%, then 10% of its reporting is untrustworthy, meaning it answers the question randomly for that 10%.

Let's take a look at some examples and then try to derive a general equation which we will then equate to the trust equation in Sapienza.

We will use the real/fake node case because it is simple to follow. We have 100 new nodes and a source that reports with 60% confidence that the nodes are real (or fake). For purposes of review we'll take the case of 100% trust first. For a single source we would now have a confidence of 60% that the node is real. We just believe the source and no calculation is required.

For two sources at 60% we can model the situation as follows:

100 N

        60%        60%

50 R ==> 30 Tr ==> 18 Tr (ie, for 50 Real nodes, our first 60% confident source reports 30 to be real (Tr) and 20 to be fake (Tf). Our second source independently does the same.

                   12 Tf

        20 Tf ==>  12 Tr
                    8 Tf

50 F ==> 30 Tf ==> 18 Tf (same as above for 50 Fake nodes. It is important to note that if our source is 60% confident about real nodes it is also 60% confident about fake nodes)

                   12 Tr

        20 Tr ==>  12 Tf
                    8 Tr

This is two 60% tests in a row. If two tests in a row say it's real, what is the probability of it being real? Ie how confident should we now be?

18+8 nodes tested real twice. Of those 18 are actually real ==> 18/(18+8) = 0.692. Now we are 69.2% sure the node is real. This is the same, btw, as the Bayes eqn in Sapienza: 0.6*0.6/(0.6*0.6+0.4*0.4) = 0.692.

Now let's build the trust factor into the eqn. Let's suppose I trust Source 1 90%. This means 10% of the time my source will report randomly that the node is real or fake no matter what their test says it is.

SOURCE 1 WITH TRUST:

100 Nodes With Trust

        S1 60% confidence 90% trust (90% of reported answers are same as test and 10% are random)
                                                  Totals

50 R ==> 30 Tr (27 rTr, 1.5 rTr, 1.5 rTf) ==> 29.5 rTr

        20 Tf (18 rTf, 1 rTr, 1 rTf)              20.5 rTf

50 F ==> 30 Tf (27 rTf, 1.5 rTf, 1.5 rTr) ==> 29.5 rTf

        20 Tr (18 rTr, 1 rTr, 1 rTf)              20.5 rTr

If the first test REPORTS that the node is real, what is the probability of it really being real? Well we have, 27+1.5+1 = 29.5 reportedly real tests which are actually real. And we have a total of 29.5 + 1 + 18 + 1.5 = 50 reportedly real tests in total. 29.5 / 50 = 0.59 = 59%. The trust factor of 90% should make us 59% confident in our results, down from 60%.

Now let's show that this is equivalent to the Trust equation in Sapienza paper:

  27    +      1.5     +      1         /     27   +      1.5     +      1       +   18   +      1       +      1.5       = 0.59

( 30*0.9 + 30*(1-0.9)/2 + 20*(1-0.9)/2 ) / ( 30*0.9 + 30*(1-0.9)/2 + 20*(1-0.9)/2 + 20*0.9 + 20*(1-0.9)/2 + 30*(1-0.9)/2 ) = 0.59

( 50*0.6*0.9 + 50*0.6*(1-0.9)/2 + 50*0.4*(1-0.9)/2 ) / ( 50*0.6*0.9 + 50*0.6*(1-0.9)/2 + 50*0.4*(1-0.9)/2 + 50*0.4*0.9 + 50*0.4*(1-0.9)/2 + 50*0.6*(1-0.9)/2 )

We can cancel out the 50:

( 0.6*0.9 + 0.6*(1-0.9)/2 + 0.4*(1-0.9)/2 ) / ( 0.6*0.9 + 0.6*(1-0.9)/2 + 0.4*(1-0.9)/2 + 0.4*0.9 + 0.4*(1-0.9)/2 + 0.6*(1-0.9)/2 )

( 0.6*T + 0.6*(1-T)/2 + 0.4*(1-T)/2 ) / ... (just look at numerator for now)

( 0.6*T + 0.3 - 0.3*T + 0.2 - 0.2*T )

( 0.6*T + 0.5 - 0.5*T )

( 0.5 + (0.6 - 0.5)*T ) ( Pnom + (P - Pnom)*T ) / ( Pnom + (P - Pnom)*T + 0.4*T + 0.4*(1-T)/2 + 0.6*(1-T)/2 ) ... add the denominator back in

                                                 0.4*T + 0.2 - 0.2*T + 0.3 - 0.3*T
                                                 0.2*T - 0.3*T + 0.5
                                                 0.5 - 0.1*T
                                                 Pnom + (P - Pnom)*T where P = 0.4 (the 2nd prob, call it Pb)

( Pnom + (Pa - Pnom)*T ) / ( Pnom + (Pa - Pnom)*T + Pnom + (Pb - Pnom)*T ) Same as Pa/(Pa+Pb) except with all probabilities adjusted by Trust. This shows that the Trust eqn. in Sapienza is equivalent to answering randomly for the untrusted part

Can we do this symbolically? Given Phi Plo = 1 - Phi T (trust)

50 R ==> 50*Phi Tr ( 50*Phi*T rTr, (50*Phi-50*Phi*T)/2 rTr, (50*Phi-50*Phi*T)/2 rTf )

        50*Plo Tf ( 50*Plo*T rTf, (50*Plo-50*Plo*T)/2 rTr, (50*Plo-50*Plo*T)/2 rTf )

50 F ==> 50*Phi Tf ( 50*Phi*T rTf, (50*Phi-50*Phi*T)/2 rTf, (50*Phi-50*Phi*T)/2 rTr)

        50*Plo Tr ( 50*Plo*T rTr, (50*Plo-50*Plo*T)/2 rTr, (50*Plo-50*Plo*T)/2 rTf)

The 50 is arbitrary. We can just say our population is 2 and each category has a population of 1.

                                                                       Totals:

1 R ==> Phi Tr ( Phi*T rTr, (Phi-Phi*T)/2 rTr, (Phi-Phi*T)/2 rTf ) ==> Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2 rTr

       Plo Tf ( Plo*T rTf, (Plo-Plo*T)/2 rTr, (Plo-Plo*T)/2 rTf )      Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2   rTf

1 F ==> Phi Tf ( Phi*T rTf, (Phi-Phi*T)/2 rTf, (Phi-Phi*T)/2 rTr) ==> Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2 rTf

       Plo Tr ( Plo*T rTr, (Plo-Plo*T)/2 rTr, (Plo-Plo*T)/2 rTf)       Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2   rTr

Total nodes that tested real: Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2 + Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2 Total nodes that are real that tested real: Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2

Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2 = Phi*T + 0.5*Phi - 0.5*Phi*T + 0.5*Plo - 0.5*Plo*T = Phi*T + 0.5*(Phi+Plo) - 0.5*T(Phi+Plo) = Phi*T + 0.5 - 0.5*T = 0.5 + (Phi - 0.5)*T = Pnom + (Phi - Pnom)*T

Phi*T + (Phi-Phi*T)/2 + (Plo-Plo*T)/2 + Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2 = Pnom + (Phi - Pnom)*T + Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2 Look at just Plo*T + (Plo-Plo*T)/2 + (Phi-Phi*T)/2 = Plo*T + 0.5*Plo - 0.5*Plo*T + 0.5*Phi - 0.5*Phi*T = Plo*T + 0.5*(Plo+Phi) - 0.5*T(Plo+Phi) = Plo*T + 0.5 - 0.5*T = 0.5 + (Plo - 0.5)*T = Pnom + (Plo - Pnom)*T

So, nodes that are real that tested real / total nodes that tested real = ( Pnom + (Phi - Pnom)*T ) / ( Pnom + (Phi - Pnom)*T + Pnom + (Plo - Pnom)*T ) =

Phi_Trust / (Phi_Trust + Plo_Trust)

This is just the trust-modified probability being used in Bayes eqn. per Sapienza.

Let's add Source 2 with 80% trust. This means the Source 1 will report their result accurately 90% of the time and the rest will be random. Source 2 will report accurately 80% of the time and the rest will be random. SOURCE 1 AND 2 WITH TRUST (numerical example)

        S1 60% confidence 90% trust           S2 60% confidence 80% trust                      Totals for S2

50 R ==> 29.5 rTr ==> 17.7 Tr (14.16 rTr, 1.77 rTr, 1.77 rTf) ==> 17.11 rTr

                                              11.8 Tf (9.44 rTf, 1.18 rTr, 1.18 rTf)           12.39 rTf

        20.5 rTf                      ==>     12.3 Tr (9.84 rTr, 1.23 rTf, 1.23 rTr)      ==>  11.89 rTr
                                               8.2 Tf (6.56 rTf, 0.82 rTr, 0.82 rTf)            8.61 rTf

50 F ==> 29.5 rTf ==> 17.7 Tf (14.16 rTf, 1.77 rTr, 1.77 rTf) ==> 17.11 rTf

                                              11.8 Tr (9.44 rTr, 1.18 rTf, 1.18 rTr)           12.39 rTr

        20.5 rTr                      ==>     12.3 Tf (9.84 rTf, 1.23 rTr, 1.23 rTf)      ==>  11.89 rTf
                                               8.2 Tr (6.56 rTr, 0.82 rTf, 0.82 rTr)            8.61 rTr

Probability of node being real given 2 reportedly real tests:

  Total of two real tests: 17.11 + 8.61 = 25.72
  Of those, 17.11 are actually real: 17.11 / 25.72 = 0.6652 Now our confidence is 66.52%. Same as Eric's app.

Again, the trust as modeled in the Sapienza paper matches what you'd expect from a source giving random answers for its untrustworthy part.

This shows that all you need to do is pre-adjust each probability by trust to get a new probability dist. and then do the calc from there the usual way (by applying Bayes). This is what is done in Sapienza. Let's do a quick review:

Sources: 0.6, 0.6 T: 0.9, 0.8

For 100% Trust: 0.6*0.6/(0.6*0.6 + 0.4*0.4) = 0.692

For 0.9, 0.8 Trust: P1 = Pnom + (P - Pnom)*T = 0.5 + (0.6 - 0.5)*0.9 = 0.59 P2 = 0.5 + (0.6 - 0.5)*0.8 = 0.58

P1l = 0.5 + (0.4 - 0.5)*0.9 = 0.41 P2l = 0.5 + (0.4 - 0.5)*0.8 = 0.42

Now Sources become: 0.59, 0.58 / 0.41, 0.42

0.59*0.58 / (0.59*0.58 + 0.41*0.42) = 0.6652 = 66.52% ==> Same as above

We can build this idea into an equation in the sapienza_bayes.py model for verification:

From above: (cancel out the 50) ( 0.6*T + 0.6*(1-T)/2 + 0.4*(1-T)/2 )

Phi*T + Phi*(1-T)/2 + Plo*(1-T)/2

0.6*0.9 + 0.6*(1-0.9)/2 + 0.4*(1-0.9)/2 = 0.59 (numerical verification)

0.4*0.9 + 0.4*(1-0.9)/2 + 0.6*(1-0.9)/2 = 0.41 "

Phi*T + Phi*(1-T)/2 + Plo*(1-T)/2 (same as above)

sapienza_bayes2.py

Original Sapienza formulation: P = Pnom + (P - Pnom)*T = P*T + Pnom*(1-T) New formulation as above: P2 = P*T + P*(1.0-T)*Pnom + (1.0-P)*(1.0-T)*Pnom = P*T + Pnom*(P*(1.0-T) + (1.0-T)*(1.0-P)) = P*T + Pnom*(1-T)(P+(1-P)) = P*T + Pnom*(1-T) So both these eqns are the same