Sapienza trust model derivation showing equivalence with random answers
More actions
So far we have stated that the trust model in Sapienza ( https://ceur-ws.org/Vol-1664/w9.pdf) is the same as modeling the untrustworthy part of a source as random. What does this mean? If I trust my source 90%, then 10% of its reporting is untrustworthy, meaning it answers the question randomly for that 10%.
Let's take a look at some examples and then try to derive a general equation which we will then equate to the trust equation in Sapienza.
We will use the real/fake node case because it is simple to follow. We have 100 new nodes and a source that reports with 60% confidence that the nodes are real (or fake). For purposes of review we'll take the case of 100% trust first. For a single source we would now have a confidence of 60% that the node is real. We just believe the source and no calculation is required.
For two sources at 60% we can model the situation as follows:
100 N
60% 60%
50 R ==> 30 Tr ==> 18 Tr (ie, for 50 Real nodes, our first 60% confident source reports 30 to be real (Tr) and 20 to be fake (Tf). Our second
source independently does the same.
12 Tf
20 Tf ==> 12 Tr
8 Tf
50 F ==> 30 Tf ==> 18 Tf (same as above for 50 Fake nodes. It is important to note that if our source is 60% confident about real nodes it is also
60% confident about fake nodes)
12 Tr
20 Tr ==> 12 Tf
8 Tr
This is two 60% tests in a row. If two tests in a row say it's real, what is the probability of it being real? Ie how confident should we now be?
18+8 nodes tested real twice. Of those 18 are actually real ==> 18/(18+8) = 0.692. Now we are 69.2% sure the node is real. This is the same, btw, as the Bayes eqn in Sapienza: 0.6*0.6/(0.6*0.6+0.4*0.4) = 0.692.
Now let's build the trust factor into the eqn. Let's suppose I trust Source 1 90%. This means 10% of the time my source will report randomly that the node is real or fake no matter what their test says it is.
SOURCE 1 WITH TRUST:
100 Nodes With Trust
S1 60% confidence 90% trust (90% of reported answers are same as test and 10% are random)
Totals
50 R ==> 30 Tr (27 rTr, 1.5 rTr, 1.5 rTf) ==> 29.5 rTr
20 Tf (18 rTf, 1 rTr, 1 rTf) 20.5 rTf
50 F ==> 30 Tf (27 rTf, 1.5 rTf, 1.5 rTr) ==> 29.5 rTf
20 Tr (18 rTr, 1 rTr, 1 rTf) 20.5 rTr
If the first test REPORTS that the node is real, what is the probability of it really being real? Well we have, 27+1.5+1 = 29.5 reportedly real tests which are actually real. And we have a total of 29.5 + 1 + 18 + 1.5 = 50 reportedly real tests in total. 29.5 / 50 = 0.59 = 59%. The trust factor of 90% should make us 59% confident in our results, down from 60%.
Now let's show that this is equivalent to the Trust equation in Sapienza paper:
27 + 1.5 + 1 / 27 + 1.5 + 1 + 18 + 1 + 1.5 = 0.59 ( 30*0.9 + 30*(1-0.9)/2 + 20*(1-0.9)/2 ) / ( 30*0.9 + 30*(1-0.9)/2 + 20*(1-0.9)/2 + 20*0.9 + 20*(1-0.9)/2 + 30*(1-0.9)/2 ) = 0.59 ( 50*0.6*0.9 + 50*0.6*(1-0.9)/2 + 50*0.4*(1-0.9)/2 ) / ( 50*0.6*0.9 + 50*0.6*(1-0.9)/2 + 50*0.4*(1-0.9)/2 + 50*0.4*0.9 + 50*0.4*(1-0.9)/2 + 50*0.6* (1-0.9)/2 ) We can cancel out the 50:
( 0.6*0.9 + 0.6*(1-0.9)/2 + 0.4*(1-0.9)/2 ) / ( 0.6*0.9 + 0.6*(1-0.9)/2 + 0.4*(1-0.9)/2 + 0.4*0.9 + 0.4*(1-0.9)/2 + 0.6*(1-0.9)/2 )
( 0.6*T + 0.6*(1-T)/2 + 0.4*(1-T)/2 ) / ... (just look at numerator for now)
( 0.6*T + 0.3 - 0.3*T + 0.2 - 0.2*T )
( 0.6*T + 0.5 - 0.5*T )
( 0.5 + (0.6 - 0.5)*T )
( Pnom + (P - Pnom)*T ) / ( Pnom + (P - Pnom)*T + 0.4*T + 0.4*(1-T)/2 + 0.6*(1-T)/2 ) ... add the denominator back in
0.4*T + 0.2 - 0.2*T + 0.3 - 0.3*T
0.2*T - 0.3*T + 0.5
0.5 - 0.1*T
Pnom + (P - Pnom)*T where P = 0.4
(the 2nd prob, call it Pb) ( Pnom + (Pa - Pnom)*T ) / ( Pnom + (Pa - Pnom)*T + Pnom + (Pb - Pnom)*T ) Same as Pa/(Pa+Pb) except with all probabilities adjusted by Trust.
This shows that the Trust eqn. in Sapienza is equivalent to answering randomly for the untrusted part