Privacy differences between the subjective and community ratings system
More actions
Main article: Privacy, identity, and fraud in the ratings system
Differences between the Community and Subjective Ratings System
Earlier, we presented various schemes of increasing complexity designed to mitigate trust concerns in the subjective ratings system (SRS). In the next section, we will take a look at these and see how they apply to the community ratings system (CRS). But first it might be instructive to outline in general how these two systems are different.
Here we treat the CRS as simply a special version of the SRS that the community agrees to use.
Let’s suppose Alice has her subjective ratings system and tells her friends how it works: the algorithms she uses, how she set it up, her trust factors, etc. Her friends like and ask to use it. So Alice sets up a server that she and her friends can access. Anytime one of her friends has a query, they ask the ratings system and it goes out to the network and asks the question. When the answer comes back, it gets processed and aggregated by whatever algorithm Alice chose, and everyone in the group can see the answer. The public nature of the answer is one of the main differences between CRS and SRS.
Eventually, Alice’s group might become larger and the number of queries might get so large that they have to prioritize which ones get sent to the server. So the group somehow votes on how to do this (they can use the ratings system itself). A selection of queries go out and the answers are made public to the group. This process is more complex, but analogous, to Alice prioritizing her own queries.
Obviously other issues also present themselves for community deliberation: who the trusted parties are and what level of trust to assign to them. Here again, we use the ratings system itself to answer the question, eg: In foreign policy, who do we trust the most? When a question about foreign policy is asked, do we ask only the top X trusted people or everyone? For informational questions we may only be interested in the most trusted answers. For policy decision questions we may want to ask everyone in a spirit of one-person / one-vote.
In terms of privacy, many of the concerns in the CRS are the same as the SRS. Individuals may certainly want to hide their opinions and only consent to revealing them as part of an aggregate. The aggregate opinion will, in all probability, be published for the whole community to see, whereas in the SRS only one person would see it. This privacy concern is no doubt magnified when dealing with a community server that is probably the target of many hacking attacks.
In this vein, the privacy concerns of the rater are mostly similar between the SRS and CRS. But, given that the CRS makes its results public, we may have an additional concern. Bob may not have a problem giving his opinion to Alice (if it’s aggregated with Carol and Dave’s) but might if he knew she were going to tell everyone she knows the result. This could happen because he trusts Alice to treat the information carefully whereas he has no such confidence in the community. For example, he may be speculating on who may have committed a murder. He believes Alice will treat the information with the proper discretion but if leaked to the community might result in retaliation against someone who could, in fact, be innocent. Not wanting anything to do with an outcome such as this, he avoids answering the community question but is fine with answering the same question from Alice. This problem obviously necessitates transparency about how results are disseminated and used. It also emphasizes the ability to hide the raters through privacy mechanisms, if such is desired. In the CRS then, we have a potential conflict between the community’s right to know its rater’s and the raters’ right to remain private.
Of course, as the survey population gets larger, Bob can be more confident that his opinion will be lost in the crowd. Furthermore, with differential privacy, as the population gets bigger its results become more accurate, and Bob can plausibly deny having given any meaningful opinion at all. If Bob wants more reassurances than this, we can also turn to homomorphic encryption (HE) or secure multiparty computation (SMC) techniques at the community level. We will discuss ideas of this nature in the next section.
Individuals will similarly want their trust values kept private and only used in the aggregate. But it’s important to keep in mind that in the CRS, aggregate trust values will, indeed, be public. If we trust Bob on foreign policy, we will obviously want to know how much he’s being trusted. And Bob will certainly want to know that as well. It is each person’s trust, and their knowledge of it that makes the CRS a socially transformational tool (among other things).
However, it would be natural for someone to think that aggregate trust information about oneself should also be kept private. Alice, in the accounting department, does not want her 360 rating at work to be disseminated to the public. In the same vein, she does not want her trust rating in the CRS being divulged either. This is an obvious problem and it is hard to think of a way out. The premise of the CRS is that being constantly rated creates better people and a better society. But we can see how living under a ratings microscope could be intrusive. We might, in that case, divide the system into those who exist in a public sphere and those who want to remain essentially private. Bob, the foreign policy expert, will consent to having his trust made public for the sake of his elite position in society. Alice, an accountant and regular citizen, wants to stay out of the limelight. She tries to improve herself based on her ratings but doesn’t want everyone to know what they are. It is possible that there is a compromise here.
Various Privacy Methods in the CRS Context
Now let’s turn to the several privacy enhancement methods proposed earlier. Here we will be summarizing the method and applying it to the community ratings system (CRS).
The first was a “simple trusted aggregator”. Alice wants the opinion of three other people, Bob, Carol, and Dave which they supply to the aggregator. Alice supplies her trust in Bob, Carol, and Dave to the aggregator. The aggregator performs the calculation and only gives to Alice the average (or whatever aggregate she wants). Alice doesn’t know Bob, Carol, or Dave’s individual opinion. Everyone trusts the aggregator to act in good faith and not reveal anyone’s personal opinion or Alice’s trust in her raters.
At the community level, Alice represents the community and not much changes except that she will, under normal circumstances, tell everyone the aggregate opinion. The question to ask is decided by the community and put to the community server to be sent out in some prioritized fashion, as noted earlier. The trust factors are also decided by the community and are either based on a separate question (who should I trust, and how much, for this question?) or are present in some default form already. We trust the community to not reveal its trust factors of specific individuals to the public. So these aspects of it are different but, fundamentally, the information flow is the same as before: the people chosen to answer (Bob, Carol, and Dave) are given a message from the community for their opinion. They send the aggregator their opinion, the community sends the aggregator its trust vector, and the aggregator sends back to the community the result. The CRS models the "trusted aggregator" nicely.
The second scheme was a less trusted aggregator. Here we use homomorphic encryption so the aggregator is performing the calculation with data it doesn’t know. This would involve a public/private key for the session (generated by Alice) where everyone would encrypt using the public key. This key is, of course, homomorphic in that it produces an encrypted dataset that can be operated on mathematically (ie the key does not produce random numbers as most regular keys would). So now the aggregator doesn’t know the aggregate, the opinions of each person, or the trust vector supplied by Alice. But it would still know the question Alice is asking and which people Alice wants asked for their opinions. We might also worry that the aggregator could insert fake data and use that instead since they have the public key for the session.
The community system, under this arrangement, works the same way where, again, Alice takes the place of the community and the results would generally be made public. One difference, alluded to earlier, is that we might trust Alice more than the community in keeping trust levels private. Alice is a human being and so is quite motivated to not reveal her trust in any one specific person. The community server may not have any such compunction and, even if it did, there will be highly motivated hackers who would want this type of information. Nevertheless, as it stands, the community system can implement a similar protocol as Alice with respect to the aggregator.
To mitigate the concerns about fake data and knowing the people/question Alice is asking about, we might add the following: Alice does not tell the aggregator the session homomorphic public key, the question being asked, or the people she wants to answer. She simply gives the aggregator a message encrypted with Bob’s own public key (and signed with her private key) asking for his opinion and requesting that he supply the answer encrypted with the session homomorphic public key. Alice does this for Carol and Dave as well. So the aggregator only receives these encrypted messages, one for each rater, and sends them out. The aggregator would still know that Alice is trusting Bob, Carol, and Dave, however. So we might add to that the stipulation that Alice asks many more people for their opinion and puts their trust at 0 so as not to affect the calculation. With these decoys, the aggregator does not know that it is Bob, Carol, and Dave that are specifically being trusted by Alice.
In a community rating system all these mechanisms would be available to the community vis a vis the aggregator. However, if the population size is large or, say is everyone in the community, there wouldn’t be another much greater population that we could obscure with. But probably for large samples it wouldn’t matter that much if the aggregator knows who the community is trusting. If it is everybody in the community, then so be it.
Another means for doing this might be for Alice to agree with Bob, Carol, and Dave to use a common secondary server that the aggregator sends stuff to, so it doesn’t know how to contact each person specifically. The aggregator would send its encrypted messages to Bob, Carol, and Dave to the secondary server. Bob, Carol, and Dave would monitor this server and attempt to decrypt all the messages that come to it. Bob, for instance, would attempt to decrypt the 3 messages that arrived on the server. The one he is successful in decrypting is the one for him. He proceeds to read it and follow Alice’s request. When the aggregator then receives the homomorphically encrypted information from the sources, it doesn’t know who is supplying it.
Another problem we may want to solve is to somehow guarantee that Alice is not trying to expose Bob’s opinion (say) by putting in a trust of 0 for Carol and Dave. Thus the raters would want an assurance that Alice’s trust values were reasonable and couldn’t be used to back-calculate a single rater’s opinion. We could require, for instance, that the average trust value be 0.5 or that none of the trust values is equal to zero. The solution for this was to jumble the trust vector into many random permutations and homomorphically encrypt each one with a different key. These would be transmitted to the aggregator along with the keys to decrypt them for all but one of the permutations. The aggregator would then proceed to decrypt the permutations and verify that they all fulfill the mathematical requirement (average >= 0.5 or all are > 0). Since the aggregator selected the permutations to decrypt randomly, he is all but certain that the remaining permutation also fulfills the mathematical requirement. Now Alice can use the remaining permutation, for which she knows the order, and tell the aggregator to order its calculation in accord with it. This mechanism leads to a virtual guarantee that the parameters have certain mathematical properties that demonstrate that they haven’t been manipulated to expose a single person’s opinion.
This mechanism is also possible in the community rating system and could be necessary. The community may well require a guarantee that its members’ trust values are being used reasonably. We wouldn’t want, for instance, some administrator getting the opinion of one citizen.
We can also speculate on how the aggregator can be prevented from knowing whose opinions are being aggregated. He described a system where everyone would get a UUID to use to identify themselves to the aggregator along with their opinion. The aggregator then only knows the UUID’s and can match these to the opinions and the trust vector (also provided with UUID). Now, however, the aggregator doesn’t know if real people gave it the information or if it was Alice just generating UUIDs and fake opinions/trusts used to isolate a specific member’s opinion. The solution for this is to use an identity server that is used to prove that each UUID corresponds to a real person. The identity server then sends a message back to each rater verifying their identity that they can forward in their message to the aggregator. Thus the aggregator knows that the UUIDs are real but doesn’t know anything else about them.
Clearly this can also be used in the community system but it isn’t clear why we’d want to. The community initiates a request to the ratings system and either knows certainly or probably who the raters are going to be (eg experts in foreign policy). Since the raters are probably public information anyway, it doesn’t seem like anyone would care that the aggregator also knows who they are.
Of course, there may be experts in foreign policy who serve the community but wish to remain as anonymous as possible. Certainly public intellectuals like this exist in our own society and have a high degree of influence while concealing important aspects of their identity. Options like the above could be used to allay their privacy concerns. In fact, a robust privacy system of this kind might encourage more bright minds to participate in public policy.
We’ve concluded in each of the scenarios that the CRS can function technically much like the SRS would. Again, at its most basic, the CRS is really nothing more than an SRS that a group of people decide to use. The difference is that the CRS, by default, chooses its raters and their trust publicly and disseminates its ratings publicly. In any event, the CRS would have an aggregator, an identity server, a means to hide the participants through a UUID mechanism and an information flow that parallels the SRS.