More actions
m Pywikibot 9.3.1 |
No edit summary |
||
Line 18: | Line 18: | ||
} |
} |
||
</kroki> |
</kroki> |
||
And ''alice'' wants ''bob''’s [[Computed opinion|computed opinion]]. A careless implementation could run into an infinite loop, where ''alice'' asks ''bob'' for his computed opinion, who asks ''carol'' for her computed opinion, who asks ''alice'' for her computed opinion, who asks ''bob''… |
And ''alice'' wants ''bob''’s [[Computed opinion|computed opinion]]. A careless implementation could run into an infinite loop, where ''alice'' asks ''bob'' for his computed [[opinion]], who asks ''carol'' for her computed opinion, who asks ''alice'' for her computed opinion, who asks ''bob''… |
||
This also leads to the fact that ''alice'' can get ''bob'' and ''carol''’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some [[Trust factor|trust factors]] to this network like so: |
This also leads to the fact that ''alice'' can get ''bob'' and ''carol''’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some [[Trust factor|trust factors]] to this network like so: |
||
Line 60: | Line 60: | ||
<span id="unique-identifier-for-each-inquiry"></span> |
<span id="unique-identifier-for-each-inquiry"></span> |
||
== |
== Unique identifier for each inquiry == |
||
This has a few variants. The property they all have in common is that when ''alice'' starts trying to figure out her computed opinion, she assigns a unique identifier to the computation, and passes that identifier to ''bob'' and ''carol'' with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be: |
This has a few variants. The property they all have in common is that when ''alice'' starts trying to figure out her computed opinion, she assigns a [[wikipedia:Universally unique identifier|unique identifier]] to the computation, and passes that identifier to ''bob'' and ''carol'' with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be: |
||
* returning <code>null</code>, effectively saying “I have no opinion on this, because my opinion has already been accounted for in this calculation” |
* returning <code>null</code>, effectively saying “I have no opinion on this, because my opinion has already been accounted for in this calculation” |
Latest revision as of 15:43, 25 September 2024
Main article: Technical overview of the ratings system
Methods of preventing feedback
Say we have a fully-connected trust network that looks like:
And alice wants bob’s computed opinion. A careless implementation could run into an infinite loop, where alice asks bob for his computed opinion, who asks carol for her computed opinion, who asks alice for her computed opinion, who asks bob…
This also leads to the fact that alice can get bob and carol’s opinion twice – once directly, once indirectly through the other. This is acceptable, and probably desirable. If we add some trust factors to this network like so:
We can see that alice doesn’t trust carol much, so she will assign a low weight carol’s direct opinions. But when she asks bob for his opinion, bob rates carol highly so his computed opinion will include carol’s opinion, and carol’s influence on alice’s computed opinion will probably be significantly higher through bob than when presented directly to alice. But we can say that alice has assigned high trust in bob because bob’s computed opinions are usually right, regardless of how he comes up with them. She doesn’t know that he trusts carol, and doesn’t need to. So getting carol’s opinion through bob is probably a good thing, as her low direct trust in carol may be a mistake. Or maybe this is a situation where we have different trust factors for different domains; where alice is asking a car question and bob knows that carol is a mechanic so has a high trust factor on automotive predicates, but alice just knows her as someone with a reputation for being prone to groupthink on cultrue-war issues.
Depth Limit
A simple way to avoid infinite recursion is to put in a depth limit. When alice asks bob and carol for their computed opinions, she will set a depth limit of, say, . Then, when bob gets the request, he would ask the nodes he trusts for their computed opinions, but pass to alice and carol, and so on. Eventually, if you are asked for a computation with , you just return your own personal opinion, or null
if you don’t have one. The initial depth limit would need to be set to be roughly the depth of an acyclic version of your network so there aren’t any nodes you can’t reach because they’re too far away.
Pros:
- simple to implement
- doesn’t require any node identifiers
- low network overhead (1 byte?)/low computing overhead
- results are stable – if you ask the same question on the same network multiple times, you’ll get the same answer.
Cons:
- peoples opinions are counted multiple times. my gut feeling is people near you tend to get relatively more weight than those farther away from you.
- you have to balance your initial value of somewhere between cutting off distant sources of information, and over-valuing nearby information
Notes:
- The ideal value for the three-person fully-connected network would be , and even that may have alice getting her own personal opinion fed back to her unless the algorithm has a way for alice to tell bob not to turn around and ask her right back, which would be a step towards a node identifier.
- For privacy, you may want to stop at in the algorithm as described above because allowing requests directly exposes your personal opinion)
- My ten-seconds-of-effort google search turned up these statistics on social networks: https://miro.medium.com/v2/resize:fit:4800/format:webp/1*rkITSNe7ngMh8vmQexayQw.png The highlight being that for twitter, the average path length was 2.6 and the diameter was 18. Other services had somewhat higher average path lengths and somewhat lower diameters, but the difference was still at least a factor of 3.
Unique identifier for each inquiry
This has a few variants. The property they all have in common is that when alice starts trying to figure out her computed opinion, she assigns a unique identifier to the computation, and passes that identifier to bob and carol with her requests. If a node gets a request with an identifier it has never seen before, it does a full computation. If it gets a request with an identifier it has seen before, it does something less. That could be:
- returning
null
, effectively saying “I have no opinion on this, because my opinion has already been accounted for in this calculation” - returning the same (cached) result they returned the first when they did the full computation
- returning their own personal opinion
Pros:
- low network overhead (16 bytes). low computational overhead
- doesn’t require node identifiers
- doesn’t loop back and count someone’s opinions multiple times
Cons:
- unstable. results change depending on which order nodes receive requests
- some variants wouldn’t allow someone’s opinion to count multiple times, which we convinced ourselves was desirable
Passing the path
In this method, each time you send a request to a new node, you include the list of all upstream nodes, so it knows not to loop back. When a node receives a request and all of the nodes they trust are listed in the request’s upstream nodes, it will just return its personal opinion, or null if it has none.
Pros:
- stable
- only counts opinions twice when we want it to
- doesn’t double-count anyone or loop back
Cons:
- requires node identifiers
- higher network overhead for the list of node identifiers
- list of upstream nodes leaks a lot of information
Privacy-preserving variant
To make the above more palatable, we can change it as follows:
- Every time a node makes a request
- they create a random UUID to identify themselves for this computation. They store it in a short-term cache so they’ll know it if they see it again.
- they also generate a number of other decoy UUIDs that help obscure the trust paths. Maybe between 3 and 10 decoys per request.
- then they send their request with both the real and decoy UUIDs (sorted lexicographically, say)
- When a node receives a request, it checks to see if any of the UUIDs in the request are in its short-term cache of ids that represent itself.
- if so, it returns
null
- if not, it computes its opinion normally. When it has to send requests to other nodes for, it does the same thing the originating node did: create and cache a real UUID, create several decoys, and append them to the list of UUIDs in the request it’s processing
- if so, it returns
Pros:
- makes it much harder to figure out who is whom
Cons:
- makes requests much larger on the network (figure what, maybe another 512 bytes on average?)
- also increases the number of requests (a full new set of leaf nodes). In the non-privacy version, if carol was processing
request(ignore=alice,bob)
, she knew not to contact alice. In the privacy variant, she needs to contact alice because she doesn’t know if any of the UUIDs represent alice.