Toggle menu
122
332
11
3.4K
Information Rating System Wiki
Toggle personal menu
Not logged in
Your IP address will be publicly visible if you make any edits.

Avoiding feedback: Difference between revisions

From Information Rating System Wiki
Content deleted Content added
Pete (talk | contribs)
No edit summary
Lembot (talk | contribs)
m Pywikibot 9.3.1
Line 18: Line 18:
}
}
</kroki>
</kroki>
And ''alice'' wants ''bob''’s computed opinion. A careless implementation could run into an infinite loop, where ''alice'' asks ''bob'' for his computed opinion, who asks ''carol'' for her computed opinion, who asks ''alice'' for her computed opinion, who asks ''bob''…
And ''alice'' wants ''bob''’s [[Computed opinion|computed opinion]]. A careless implementation could run into an infinite loop, where ''alice'' asks ''bob'' for his computed opinion, who asks ''carol'' for her computed opinion, who asks ''alice'' for her computed opinion, who asks ''bob''…


This also leads to the fact that ''alice'' can get ''bob'' and ''carol''’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some trust factors to this network like so:
This also leads to the fact that ''alice'' can get ''bob'' and ''carol''’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some [[Trust factor|trust factors]] to this network like so:


<kroki lang="graphviz">
<kroki lang="graphviz">
Line 34: Line 34:
}
}
</kroki>
</kroki>
We can see that ''alice'' doesn’t trust ''carol'' much, so she will assign a low weight ''carol''’s direct opinions. But when she asks ''bob'' for his opinion, ''bob'' rates ''carol'' highly so his computed opinion will include ''carol''’s opinion, and ''carol''’s influence on ''alice''’s computed opinion will probably be significantly higher through ''bob'' than when presented directly to ''alice''. But we can say that ''alice'' has assigned high trust in ''bob'' because ''bob''’s computed opinions are usually right, regardless of how he comes up with them. She doesn’t know that he trusts ''carol'', and doesn’t need to. So getting ''carol''’s opinion through ''bob'' is probably a good thing, as her low direct trust in ''carol'' may be a mistake. Or maybe this is a situation where we have different trust factors for different domains; where ''alice'' is asking a car question and ''bob'' knows that ''carol'' is a mechanic so has a high trust factor on automotive predicates, but ''alice'' just knows her as someone with a reputation for being prone to groupthink on cultrue-war issues.
We can see that ''alice'' doesn’t [[Trust|trust]] ''carol'' much, so she will assign a low weight ''carol''’s direct opinions. But when she asks ''bob'' for his opinion, ''bob'' rates ''carol'' highly so his computed opinion will include ''carol''’s opinion, and ''carol''’s influence on ''alice''’s computed opinion will probably be significantly higher through ''bob'' than when presented directly to ''alice''. But we can say that ''alice'' has assigned high trust in ''bob'' because ''bob''’s computed opinions are usually right, regardless of how he comes up with them. She doesn’t know that he trusts ''carol'', and doesn’t need to. So getting ''carol''’s opinion through ''bob'' is probably a good thing, as her low direct trust in ''carol'' may be a mistake. Or maybe this is a situation where we have different trust factors for different domains; where ''alice'' is asking a car question and ''bob'' knows that ''carol'' is a mechanic so has a high trust factor on automotive [[Predicate|predicate]]s, but ''alice'' just knows her as someone with a reputation for being prone to [[wikipedia:Groupthink|groupthink]] on cultrue-war issues.


<span id="depth-limit"></span>
<span id="depth-limit"></span>
== Depth Limit ==
== Depth Limit ==


A simple way to avoid infinite recursion is to put in a depth limit. When ''alice'' asks ''bob'' and ''carol'' for their computed opinions, she will set a depth limit of, say, <math display="inline">depth = 4</math>. Then, when ''bob'' gets the request, he would ask the nodes he trusts for their computed opinions, but pass <math display="inline">depth - 1</math> to ''alice'' and ''carol'', and so on. Eventually, if you are asked for a computation with <math display="inline">depth = 0</math>, you just return your own personal opinion, or <code>null</code> if you don’t have one. The initial depth limit would need to be set to be roughly the depth of an acyclic version of your network so there aren’t any nodes you can’t reach because they’re too far away.
A simple way to avoid [[wikipedia:Recursion|infinite recursion]] is to put in a depth limit. When ''alice'' asks ''bob'' and ''carol'' for their computed opinions, she will set a depth limit of, say, <math display="inline">depth = 4</math>. Then, when ''bob'' gets the request, he would ask the nodes he trusts for their computed opinions, but pass <math display="inline">depth - 1</math> to ''alice'' and ''carol'', and so on. Eventually, if you are asked for a computation with <math display="inline">depth = 0</math>, you just return your own personal opinion, or <code>null</code> if you don’t have one. The initial depth limit would need to be set to be roughly the depth of an acyclic version of your network so there aren’t any nodes you can’t reach because they’re too far away.


Pros:
Pros:
Line 56: Line 56:


* The ideal value for the three-person fully-connected network would be <math display="inline">depth = 1</math>, and even that may have ''alice'' getting her own personal opinion fed back to her unless the algorithm has a way for alice to tell bob not to turn around and ask her right back, which would be a step towards a node identifier.
* The ideal value for the three-person fully-connected network would be <math display="inline">depth = 1</math>, and even that may have ''alice'' getting her own personal opinion fed back to her unless the algorithm has a way for alice to tell bob not to turn around and ask her right back, which would be a step towards a node identifier.
* For privacy, you may want to stop at <math display="inline">depth = 1</math> in the algorithm as described above because allowing <math display="inline">depth = 0</math> requests directly exposes your personal opinion)
* For [[Privacy|privacy]], you may want to stop at <math display="inline">depth = 1</math> in the algorithm as described above because allowing <math display="inline">depth = 0</math> requests directly exposes your personal opinion)
* My ten-seconds-of-effort google search turned up these statistics on social networks: https://miro.medium.com/v2/resize:fit:4800/format:webp/1*rkITSNe7ngMh8vmQexayQw.png The highlight being that for twitter, the average path length was 2.6 and the diameter was 18. Other services had somewhat higher average path lengths and somewhat lower diameters, but the difference was still at least a factor of 3.
* My ten-seconds-of-effort google search turned up these statistics on [[Social network|social networks]]: https://miro.medium.com/v2/resize:fit:4800/format:webp/1*rkITSNe7ngMh8vmQexayQw.png The highlight being that for twitter, the average path length was 2.6 and the diameter was 18. Other services had somewhat higher average path lengths and somewhat lower diameters, but the difference was still at least a factor of 3.


<span id="unique-identifier-for-each-inquiry"></span>
<span id="unique-identifier-for-each-inquiry"></span>
== Unique identifier for each inquiry ==
== [[Unique identifier|Unique identifier]] for each inquiry ==


This has a few variants. The property they all have in common is that when ''alice'' starts trying to figure out her computed opinion, she assigns a unique identifier to the computation, and passes that identifier to ''bob'' and ''carol'' with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be:
This has a few variants. The property they all have in common is that when ''alice'' starts trying to figure out her computed opinion, she assigns a unique identifier to the computation, and passes that identifier to ''bob'' and ''carol'' with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be:
Line 114: Line 114:


* Every time a node makes a request
* Every time a node makes a request
** they create a random UUID to identify themselves for this computation. They store it in a short-term cache so they’ll know it if they see it again.
** they create a random [[wikipedia:Universally unique identifier|UUID]] to identify themselves for this computation. They store it in a short-term cache so they’ll know it if they see it again.
** they also generate a number of other decoy UUIDs that help obscure the trust paths. Maybe between 3 and 10 decoys per request.
** they also generate a number of other decoy UUIDs that help obscure the trust paths. Maybe between 3 and 10 decoys per request.
** then they send their request with both the real and decoy UUIDs (sorted lexicographically, say)
** then they send their request with both the real and decoy UUIDs (sorted lexicographically, say)

Revision as of 15:32, 25 September 2024

Main article: Technical overview of the ratings system

Methods of preventing feedback

Say we have a fully-connected trust network that looks like:

And alice wants bob’s computed opinion. A careless implementation could run into an infinite loop, where alice asks bob for his computed opinion, who asks carol for her computed opinion, who asks alice for her computed opinion, who asks bob

This also leads to the fact that alice can get bob and carol’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some trust factors to this network like so:

We can see that alice doesn’t trust carol much, so she will assign a low weight carol’s direct opinions. But when she asks bob for his opinion, bob rates carol highly so his computed opinion will include carol’s opinion, and carol’s influence on alice’s computed opinion will probably be significantly higher through bob than when presented directly to alice. But we can say that alice has assigned high trust in bob because bob’s computed opinions are usually right, regardless of how he comes up with them. She doesn’t know that he trusts carol, and doesn’t need to. So getting carol’s opinion through bob is probably a good thing, as her low direct trust in carol may be a mistake. Or maybe this is a situation where we have different trust factors for different domains; where alice is asking a car question and bob knows that carol is a mechanic so has a high trust factor on automotive predicates, but alice just knows her as someone with a reputation for being prone to groupthink on cultrue-war issues.

Depth Limit

A simple way to avoid infinite recursion is to put in a depth limit. When alice asks bob and carol for their computed opinions, she will set a depth limit of, say, . Then, when bob gets the request, he would ask the nodes he trusts for their computed opinions, but pass to alice and carol, and so on. Eventually, if you are asked for a computation with , you just return your own personal opinion, or null if you don’t have one. The initial depth limit would need to be set to be roughly the depth of an acyclic version of your network so there aren’t any nodes you can’t reach because they’re too far away.

Pros:

  • simple to implement
  • doesn’t require any node identifiers
  • low network overhead (1 byte?)/low computing overhead
  • results are stable – if you ask the same question on the same network multiple times, you’ll get the same answer.

Cons:

  • peoples opinions are counted multiple times. my gut feeling is people near you tend to get relatively more weight than those farther away from you.
  • you have to balance your initial value of somewhere between cutting off distant sources of information, and over-valuing nearby information

Notes:

  • The ideal value for the three-person fully-connected network would be , and even that may have alice getting her own personal opinion fed back to her unless the algorithm has a way for alice to tell bob not to turn around and ask her right back, which would be a step towards a node identifier.
  • For privacy, you may want to stop at in the algorithm as described above because allowing requests directly exposes your personal opinion)
  • My ten-seconds-of-effort google search turned up these statistics on social networks: https://miro.medium.com/v2/resize:fit:4800/format:webp/1*rkITSNe7ngMh8vmQexayQw.png The highlight being that for twitter, the average path length was 2.6 and the diameter was 18. Other services had somewhat higher average path lengths and somewhat lower diameters, but the difference was still at least a factor of 3.

Unique identifier for each inquiry

This has a few variants. The property they all have in common is that when alice starts trying to figure out her computed opinion, she assigns a unique identifier to the computation, and passes that identifier to bob and carol with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be:

  • returning null, effectively saying “I have no opinion on this, because my opinion has already been accounted for in this calculation”
  • returning the same (cached) result they returned the first when they did the full computation
  • returning their own personal opinion

Pros:

  • low network overhead (16 bytes). low computational overhead
  • doesn’t require node identifiers
  • doesn’t loop back and count someone’s opinions multiple times

Cons:

  • unstable. results change depending on which order nodes receive requests
  • some variants wouldn’t allow someone’s opinion to count multiple times, which we convinced ourselves was desirable

Passing the path

In this method, each time you send a request to a new node, you include the list of all upstream nodes, so it knows not to loop back. When a node receives a request and all of the nodes they trust are listed in the request’s upstream nodes, it will just return its personal opinion, or null if it has none.

Pros:

  • stable
  • only counts opinions twice when we want it to
  • doesn’t double-count anyone or loop back

Cons:

  • requires node identifiers
  • higher network overhead for the list of node identifiers
  • list of upstream nodes leaks a lot of information

Privacy-preserving variant

To make the above more palatable, we can change it as follows:

  • Every time a node makes a request
    • they create a random UUID to identify themselves for this computation. They store it in a short-term cache so they’ll know it if they see it again.
    • they also generate a number of other decoy UUIDs that help obscure the trust paths. Maybe between 3 and 10 decoys per request.
    • then they send their request with both the real and decoy UUIDs (sorted lexicographically, say)
  • When a node receives a request, it checks to see if any of the UUIDs in the request are in its short-term cache of ids that represent itself.
    • if so, it returns null
    • if not, it computes its opinion normally. When it has to send requests to other nodes for, it does the same thing the originating node did: create and cache a real UUID, create several decoys, and append them to the list of UUIDs in the request it’s processing

Pros:

  • makes it much harder to figure out who is whom

Cons:

  • makes requests much larger on the network (figure what, maybe another 512 bytes on average?)
  • also increases the number of requests (a full new set of leaf nodes). In the non-privacy version, if carol was processing request(ignore=alice,bob), she knew not to contact alice. In the privacy variant, she needs to contact alice because she doesn’t know if any of the UUIDs represent alice.