Toggle menu
122
332
11
3.4K
Information Rating System Wiki
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Notes on using the algorithm interface: Difference between revisions

From Information Rating System Wiki
Content deleted Content added
Pete (talk | contribs)
No edit summary
Pete (talk | contribs)
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 3: Line 3:
<h2>Introduction</h2>
<h2>Introduction</h2>


Lem provided an algorithms.py (the interface) and custom_algo.py (user implementation of their algorithms). I added some algorithms to custom_algo.py and made a couple of suggested changes to algorithms.py. We will discuss these here, among other things.
An algorithms.py (the interface) and custom_algo.py (user implementation of their algorithms) has been provided. We've added some algorithms to custom_algo.py and made a couple of suggested changes to algorithms.py. We will discuss these here, among other things.


Both these files are located in the snippets:
Both these files are located in the snippets:
Line 17: Line 17:
<syntaxhighlight lang="python">class OpinionData:
<syntaxhighlight lang="python">class OpinionData:
pdf_points: list[float]</syntaxhighlight>
pdf_points: list[float]</syntaxhighlight>
The <code>list[float]</code> is an example of a subscripted type. Python 3.9 should also work but 3.10 is the version Lem is using for development.
The <code>list[float]</code> is an example of a subscripted type. Python 3.9 should also work but 3.10 is the version being used for development.


The exact error you get before upgrading Python is: <code>builtins.TypeError: 'type' object is not subscriptable</code>
The exact error you get before upgrading Python is: <code>builtins.TypeError: 'type' object is not subscriptable</code>


Also had to <code>pip install loguru</code> in my Python 3.10. This is not critical, as Lem pointed out, because you can always just remove the logging (a single line in the algorithms.py code).
Also had to <code>pip install loguru</code> in my Python 3.10. This is not critical because you can always just remove the logging (a single line in the algorithms.py code).


Thereafter everything worked fine. This discussion will now focus on the details of integrating custom algorithms [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] with the interface, [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/146 algorithms.py].
Thereafter everything worked fine. This discussion will now focus on the details of integrating custom algorithms [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] with the interface, [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/146 algorithms.py].
Line 27: Line 27:
<h2>Pre-existing Algorithms</h2>
<h2>Pre-existing Algorithms</h2>


Lem provided two algorithms to go with the interface (in algorithms.py), <code>sapienza_bayes_v1</code> and <code>trust_weighted_average_v1</code>. They both appear to work and give correct results. There is some concern that <code>sapienza_bayes_v1</code> has an error because it gives two peaks rather than one when running the [[Binned and continuous distributions|Binned and continuous distributions]] example. We will investigate the error, if it is that, and fix it. For now we are concerned with the process of getting algorithms into the interface and uploading them to the main server code.
We have provided two algorithms to go with the interface (in algorithms.py), <code>sapienza_bayes_v1</code> and <code>trust_weighted_average_v1</code>. They both appear to work and give correct results. There is some concern that <code>sapienza_bayes_v1</code> has an error because it gives two peaks rather than one when running the [[Binned and continuous distributions|Binned and continuous distributions]] example. We will investigate the error, if it is that, and fix it. For now we are concerned with the process of getting algorithms into the interface and uploading them to the main server code.


<h2>straight_average and straight_average_intermediate algorithm</h2>
<h2>straight_average and straight_average_intermediate algorithm</h2>


A straight_average algorithm was added to custom_algo.py. This implements the ideas in [[Internal:FromGitlab/A_simple_averaging_technique_to_supplement_the_Bayes_equation|A simple averaging technique to supplement the Bayes equation]]. It does not handle multiple levels, however, because the averaged output in this case is not automatically the input to the next level (as it is in Bayes). Using the averaged output as the input to the next level results in an average of averages which is not the same as averaging all the probabilities in the population. See the discussion in the above link under ‘Combining input probabilities in simple averaging’ to understand this better.
A straight_average algorithm was added to custom_algo.py. This implements the ideas in [[A simple averaging technique to supplement the Bayes equation]]. It does not handle multiple levels, however, because the averaged output in this case is not automatically the input to the next level (as it is in Bayes). Using the averaged output as the input to the next level results in an average of averages which is not the same as averaging all the probabilities in the population. See the discussion in the above link under ‘Combining input probabilities in simple averaging’ to understand this better.


A proposed way to handle this issue involves the <code>straight_average_intermediate</code> algorithm which is very similar to <code>straight_average</code> but uses <code>intermediate_results</code> to allow for correctly sending results to the next level up.
A proposed way to handle this issue involves the <code>straight_average_intermediate</code> algorithm which is very similar to <code>straight_average</code> but uses <code>intermediate_results</code> to allow for correctly sending results to the next level up.
Line 54: Line 54:
<h3>Example of using straight_average_intermediate</h3>
<h3>Example of using straight_average_intermediate</h3>


Let’s start by using the case from [[Internal:FromGitlab/A_simple_averaging_technique_to_supplement_the_Bayes_equation|A simple averaging technique to supplement the Bayes equation]]:
Let’s start by using the case from [[A simple averaging technique to supplement the Bayes equation]]:


<kroki lang="graphviz">
<kroki lang="graphviz">
Line 128: Line 128:


<syntaxhighlight lang="python">comp0.intermediate_results = output012.intermediate_results</syntaxhighlight>
<syntaxhighlight lang="python">comp0.intermediate_results = output012.intermediate_results</syntaxhighlight>
Running the snippet [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] for this case gives an overall <math display="inline">P_{ave} = 0.616</math>, same as in [[Internal:FromGitlab/A_simple_averaging_technique_to_supplement_the_Bayes_equation|A-simple-averaging-technique-to-supplement-the-Bayes-equation]].
Running the snippet [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] for this case gives an overall <math display="inline">P_{ave} = 0.616</math>, same as in [[A simple averaging technique to supplement the Bayes equation]].


<h2>Trust-weighted histogram algorithm</h2>
<h2>Trust-weighted histogram algorithm</h2>


Also added was Eric’s [[Internal:FromGitlab/Dan's_proposal_for_trust_weighted_histograms|trust_weighted_histogram algorithm]]. This algorithm requires an additional input, the number of bins, so a <code>misc_input</code> field was added to <code>AlgorithmInput</code> to include this easily:
Also added was the [[Trust-weighted histograms|trust-weighted historgram algorithm]]. This algorithm requires an additional input, the number of bins, so a <code>misc_input</code> field was added to <code>AlgorithmInput</code> to include this easily:


<syntaxhighlight lang="python">class AlgorithmInput:
<syntaxhighlight lang="python">class AlgorithmInput:
Line 142: Line 142:
<h3>Example problem for Trust-weighted Histogram Algorithm</h3>
<h3>Example problem for Trust-weighted Histogram Algorithm</h3>


The example problem is the same as the one solved by Eric in [[Internal:FromGitlab/Dan's_proposal_for_trust_weighted_histograms|Dan’s-proposal-for-trust-weighted-histograms]]:
The example problem is the same as the one solved in [[Trust-weighted histograms]]:


<kroki lang="graphviz">
<kroki lang="graphviz">
Line 201: Line 201:
<syntaxhighlight lang="python">alginp1234 = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
<syntaxhighlight lang="python">alginp1234 = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
output1234 = trust_weighted_histogram(alginp1234)</syntaxhighlight>
output1234 = trust_weighted_histogram(alginp1234)</syntaxhighlight>
At this point we’re done. Running the snippet [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] yields the expected result, same as in [[Internal:FromGitlab/Dan's_proposal_for_trust_weighted_histograms|trust_weighted_histogram algorithm]], for Node 1’s computed histogram:
At this point we’re done. Running the snippet [https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/147 custom_algo.py] yields the expected result, same as in [[Trust-weighted histograms]], for Node 1’s computed histogram:


[0.0, 0.0, 0.5555555555555556, 0.5, 1.0, 0.45, 0.9, 0.45, 0.45, 0.45]
[0.0, 0.0, 0.5555555555555556, 0.5, 1.0, 0.45, 0.9, 0.45, 0.45, 0.45]

Latest revision as of 19:04, 26 September 2024

Main article: Technical overview of the ratings system

Introduction

An algorithms.py (the interface) and custom_algo.py (user implementation of their algorithms) has been provided. We've added some algorithms to custom_algo.py and made a couple of suggested changes to algorithms.py. We will discuss these here, among other things.

Both these files are located in the snippets:

algorithms.py

custom_algo.py

Administrative Details

I had to upgrade my version of Python from 3.8.10 to 3.10.11 (Windows) to support supscripted types, eg:

class OpinionData:
    pdf_points: list[float]

The list[float] is an example of a subscripted type. Python 3.9 should also work but 3.10 is the version being used for development.

The exact error you get before upgrading Python is: builtins.TypeError: 'type' object is not subscriptable

Also had to pip install loguru in my Python 3.10. This is not critical because you can always just remove the logging (a single line in the algorithms.py code).

Thereafter everything worked fine. This discussion will now focus on the details of integrating custom algorithms custom_algo.py with the interface, algorithms.py.

Pre-existing Algorithms

We have provided two algorithms to go with the interface (in algorithms.py), sapienza_bayes_v1 and trust_weighted_average_v1. They both appear to work and give correct results. There is some concern that sapienza_bayes_v1 has an error because it gives two peaks rather than one when running the Binned and continuous distributions example. We will investigate the error, if it is that, and fix it. For now we are concerned with the process of getting algorithms into the interface and uploading them to the main server code.

straight_average and straight_average_intermediate algorithm

A straight_average algorithm was added to custom_algo.py. This implements the ideas in A simple averaging technique to supplement the Bayes equation. It does not handle multiple levels, however, because the averaged output in this case is not automatically the input to the next level (as it is in Bayes). Using the averaged output as the input to the next level results in an average of averages which is not the same as averaging all the probabilities in the population. See the discussion in the above link under ‘Combining input probabilities in simple averaging’ to understand this better.

A proposed way to handle this issue involves the straight_average_intermediate algorithm which is very similar to straight_average but uses intermediate_results to allow for correctly sending results to the next level up.

For this to work, an intermediate_results field is added to ComponentData (on the input side):

class ComponentData:
    opinion: Optional[OpinionData]
    trust_factor: float
    intermediate_results: list

and to AlgorithmOutput (one the output side):

class AlgorithmOutput:
    opinion: OpinionData
    intermediate_results: list

Calculations now proceed with the input intermediate_results and result in new output intermediate_results which then become the input to the next level. Of course, the calculated OpinionData, which is what is actually of interest to the user, is also produced.

In essence, for the case of straight averaging, the intermediate_results are just the Sapienza trust-modified probabilities of every source in the sub-group to be analyzed. The sub-group in this case is a node and its direct descendants (children). The average of the group is calculated and that plus the list of probabilities (modified by trust) become the results for the next level. Two examples of a multi-level tree were created manually to test this concept (just run custom_algo.py). In real life transferring results between nodes will presumably be handled by the server code, so this is really just an experiment to see that things are conceptually ok.

A privacy enhancing variant of this algorithm can also be created in which each sub-group’s average is broken out as a numerator, denominator, and number of nodes. These three pieces would then constitute the intermediate_results. The numerator could then, as a whole, be trust-modified for the next level up calculation. Doing so would require a change to the Sapienza trust-modification since it is now a function of the number of nodes. The trust-modified numerator would be added to other numerators at the next level up, denominators would be added, and so on. Doing this would obscure a particular node’s trust information which now can be back-calculated by a parent node by knowing the node’s probability.

Example of using straight_average_intermediate

Let’s start by using the case from A simple averaging technique to supplement the Bayes equation:

Following the straight_average_intermediate -- TEST1 in the snippet custom_algo.py we divide the calculation up into 3 sub-groups: 1,3,4 and 2,5,6 at the bottom level followed by 0,1,2 at the top level.

For sub-group 1,3,4 the opinion and components are defined according to the diagram, noting that Node 1 trusts itself fully (trust_factor = 1) and that intermediate_results is empty because we are at the bottom level.

opinion1 = OpinionData([0.5,0.5], 1)
opinion3 = OpinionData([0.6,0.4], 1)
opinion4 = OpinionData([0.7,0.3], 1)
intermediate_results = []
comp1 = ComponentData(opinion1, 1.0, intermediate_results)
comp3 = ComponentData(opinion3, 0.9, intermediate_results)
comp4 = ComponentData(opinion4, 0.9, intermediate_results)

Next we define AlgorithmInput and perform the calculation for the sub-group:

alginp134 = AlgorithmInput([comp1, comp3, comp4],{})
output134 = straight_average_intermediate(alginp134)

At this point we have intermediate results for the output of the sub-group which can be passed on as input to subsequent calculations. Since comp1 will be in those calculations, we set its intermediate_results as follows:

comp1.intermediate_results = output134.intermediate_results

Next is the 2,5,6 sub-group which is essentially the same except with different probabilities:

opinion2 = OpinionData([0.5,0.5], 1)
opinion5 = OpinionData([0.8,0.2], 1)
opinion6 = OpinionData([0.9,0.1], 1)
intermediate_results = []
comp2 = ComponentData(opinion2, 1.0, intermediate_results)
comp5 = ComponentData(opinion5, 0.9, intermediate_results)
comp6 = ComponentData(opinion6, 0.9, intermediate_results)
alginp256 = AlgorithmInput([comp2, comp5, comp6],{})
output256 = straight_average_intermediate(alginp256)
print('output256 = ', output256.opinion.pdf_points)
print('inter_results256 = ', output256.intermediate_results)
comp2.intermediate_results = output256.intermediate_results

Now we are ready to move a level up and do the 0,1,2 sub-group. Since we haven’t defined comp0 yet we do so as follows, noting that it starts with no intermediate_results:

opinion0 = OpinionData([0.5,0.5], 1)
intermediate_results = []
comp0 = ComponentData(opinion0, 1.0, intermediate_results)

We also reset the trust_factor for comp1 and comp2 because now we are representing the trust Node 0 has for them, rather than the trust they have for themselves (as we did previously):

comp1.trust_factor = 0.9
comp2.trust_factor = 0.9

Next is to define the AlgorithmInput and do the calculation:

alginp012 = AlgorithmInput([comp0, comp1, comp2],{})
output012 = straight_average_intermediate(alginp012)

Since we’re at the top node, this result contains the final average result of the calculation and is seen in the following print statement:

print('output012 = ', output012.opinion.pdf_points)

We don’t need to do this since we’re done but if Node 0 were to require transmission of its results to yet a higher level node, we would have to:

comp0.intermediate_results = output012.intermediate_results

Running the snippet custom_algo.py for this case gives an overall , same as in A simple averaging technique to supplement the Bayes equation.

Trust-weighted histogram algorithm

Also added was the trust-weighted historgram algorithm. This algorithm requires an additional input, the number of bins, so a misc_input field was added to AlgorithmInput to include this easily:

class AlgorithmInput:
    components: list[ComponentData]
    misc_input: dict

The user (or algorithm developer) would then include this when creating an AlgorithmInput, eg:

alginp = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})

Example problem for Trust-weighted Histogram Algorithm

The example problem is the same as the one solved in Trust-weighted histograms:

First we start with Nodes 2,5,6 which have no intermediate_results since they are the bottom level. Node 2 trusts itself completely so its trust_factor will be 1. Nodes 5 and 6 have trust_factor set to 0.9 which is Node 2’s trust for them. The opinions and components are created as usual:

intermediate_results = []
opinion2 = OpinionData([0.3,0.7], 1)
opinion5 = OpinionData([0.6,0.4], 1)
opinion6 = OpinionData([0.9,0.1], 1)
comp2 = ComponentData(opinion2, 1.0, intermediate_results)
comp5 = ComponentData(opinion5, 0.9, intermediate_results)
comp6 = ComponentData(opinion6, 0.9, intermediate_results)

Next the AlgorithmInput is created noting that we are choosing to use 10 bins for the histogram, inserted into the misc_input field discussed above. The calculation then proceeds:

alginp256 = AlgorithmInput([comp2, comp5, comp6], {'Nbins':10})
output256 = trust_weighted_histogram(alginp256)

Since Node 2 will be used for the next level, we set its intermediate_results:

comp2.intermediate_results = output256.intermediate_results

In similar fashion, Nodes 3,7,8 and Nodes 4,9,10 are computed. Once this is done, we can move to the next level up and do Nodes 1,2,3,4. First we modify the trust_factor for 2,3,4 to be 0.9 which is Node 1’s trust for them rather than their trust for themselves:

comp2.trust_factor = 0.9
comp3.trust_factor = 0.9
comp4.trust_factor = 0.9

Next, comp1 needs to be created, starting with no intermediate_results:

intermediate_results = []
opinion1 = OpinionData([0.2,0.8], 1)
comp1 = ComponentData(opinion1, 1.0, intermediate_results)

Next is the creation of the AlgorithmInput and the calculation itself:

alginp1234 = AlgorithmInput([comp1, comp2, comp3, comp4], {'Nbins':10})
output1234 = trust_weighted_histogram(alginp1234)

At this point we’re done. Running the snippet custom_algo.py yields the expected result, same as in Trust-weighted histograms, for Node 1’s computed histogram:

[0.0, 0.0, 0.5555555555555556, 0.5, 1.0, 0.45, 0.9, 0.45, 0.45, 0.45]

If Node 1 were to need to feed it’s results to a next higher level, its intermediate_results would need to be updated:

comp1.intermediate_results = output1234.intermediate_results

A comment on intermediate_results

For the trust-weighted histogram algorithm the intermediate_results are really the same thing as the output. Therefore it is possible that these could be folded together and not require the extra variable. However, it is convenient to use this variable for differentiating between nodes that have not yet been binned because they are the first nodes in the calculation, and those that have. Those that have then simply use the intermediate_results and those that haven’t need to undergo a binning process (done in GetHForAllLeafNodes). For the straight average algorithm, however, the output and the intermediate_results are completely different.

The method by which we update intermediate_results here could be improved by doing it automatically, within the calculation. We refrained from that, however, in recognition that doing so is the province of the server code, the one responsible for understanding the tree’s overall configuration and how results should pass between the nodes.

Also the intermediate_results are really for the parent components of each sub-group. This just happens to be the first component passed to AlgorithmInput in the cases above but there is no mechanism to enforce that. We could, of course, have just set all the components’ intermediate_results to be the same as that of the parent component and things would have worked. But this seems like a shortcut that might inadvertently cause problems later and be hard to detect. It appears a better practice is to set variables explicitly at the point where we know we will need them for a subsequent calculation.