Toggle menu
122
332
11
3.4K
Information Rating System Wiki
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Trust-weighted histograms

From Information Rating System Wiki
Revision as of 20:12, 10 September 2024 by Pete (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Main article: Aggregation techniques

A proposal for trust-weighted histograms

This algorithm behaves differently from previous algorithms:

  • the output of this algorithm is a histogram that could be presented to the user as-is
  • previous algorithms used the trust factor to pull stronger opinions towards the center. In other words, previous algorithms would treat a source who had 50% confident in their own opinion (personal opinion of 25% or 75%) but whom we trusted at 50% entirely equal to a source who was 25% confident in their own opinion but whom we trusted at 100% – both would be weighted at 37.5% or 62.5%). This algorithm preserves the individual sources’ “confidence” in the final result – if every source in your graph has a 100% confidence in one outcome or the other, the resulting histogram will only have nonzero values for the bins containing the extreme values of 0% and 100%.
setup

Assume a simple “a or b” predicate, and any node that has a personal opinion expresses it as a probability from 0% to 100%. If a personal opinion is 0%, it means they are completely confident that the answer is “a”. If their personal opinion is 100%, it means that they are completely confident that the answer is “b”. If it’s 50%, they have no idea and probably shouldn’t be wasting your time by answering.

Instead of reporting a single answer for their computed opinion, each node will report a histogram

algorithm

We decide on a number of “bins”, let’s assume we choose 10, defined the obvious way:

  • bin 1:
  • bin 2:
  • etc.

To generate a computed answer, each node will generate a trust-weighted histogram as follows:

  • start with a computed opinion of ten zeros
  • if they have a personal opinion, set the bin containing their personal answer to (the node trusts itself completely)
  • iterate through the sources the node trusts. Each of those nodes will be providing their own trust-weighted histogram. For each source:
    • for each bin:
      • multiply the source’s value for that bin by your trust factor for that source. Add it to the bin
  • normalize the histogram by dividing all bins by the highest value in any bin. if the highest value is zero, you may want to skip this step. Note: this “normalize to make the highest value = 1” was my initial assumption, but we should investigate alternatives, like “normalize to make the area under the histogram = 1”
example

Let’s work Pete’s example:

Starting at the leaf nodes, the nodes will just return a histogram containing in the bin with their personal opinion, elsewhere:

On the middle layer of the graph, it gets more interesting:

Node 2 starts with its own personal opinion, in the form of a histogram: . Then it takes the opinion of node , scales it by to get , and adds it to its personal opinion to get: . It repeats the process with node , scaling ’s histogram down to and adding it to the running total to get . Finally, it would scale the histogram, but the highest value is already so no scaling is necessary, so the final result is:

Node 3 and 4 repeat the same process to get:

Finally, node gets their say. They start with . Then they scale node ’s computed opinion by the trust factor to get: , and add it to their personal opinion to get . They do the same with ’s computed opinion, scaled to , and accumulating it to get . Scaling ’s yields , and adding it gives: . Finally, they scale their answer back down, dividing by to get the final answer of:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "description": "Final Normalized Result",
  "width": 640,
  "data": {
    "values": [
      {"bin": "0%", "weight": 0}, {"bin": "10%", "weight": 0}, {"bin": "20%", "weight": 0.56},
      {"bin": "30%", "weight": 0.5}, {"bin": "40%", "weight": 1}, {"bin": "50%", "weight": 0.45},
      {"bin": "60%", "weight": 0.9}, {"bin": "70%", "weight": 0.45}, {"bin": "80%", "weight": 0.45}, {"bin": "90%", "weight": 0.45}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "bin", "type": "nominal", "axis": {"labelAngle": 0}},
    "y": {"field": "weight", "type": "quantitative"}
  }
}

This shows a slight preference for the interval, because two sources favored it. Two sources also favored the interval, but those sources were more distant so their effect was watered down.

A variant on this algorithm allows us to ask more than simple predicate questions.