Ideas for encryption in aggregators: Difference between revisions
More actions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{Main|Privacy, identity, and fraud in the ratings system}} |
{{Main|Privacy, identity, and fraud in the ratings system}} |
||
{{Main|Aggregation techniques}} |
|||
<span id="deniable-authentication"></span> |
<span id="deniable-authentication"></span> |
||
Line 8: | Line 11: | ||
“Off The Record Messaging” is an implementation of deniable authentication for chat programs, implemented in software you may have used like Pidgin. The Signal protocol also has this property, it’s used in Signal, WhatsApp, and in Facebook Messenger, Skype, etc, when you choose e2e encryption. In it, the two participants chatting can be sure that the messages they receive were really generated by the person they thought they were talking to (i.e., they can treat the messages as if they had been signed by their interlocutor’s private key that they had previously trusted), but if someone later hacks their computer and gets a copy of their chat history ciphertext, the attacker can’t use that to prove that either person actually said what they said. |
“Off The Record Messaging” is an implementation of deniable authentication for chat programs, implemented in software you may have used like Pidgin. The Signal protocol also has this property, it’s used in Signal, WhatsApp, and in Facebook Messenger, Skype, etc, when you choose e2e encryption. In it, the two participants chatting can be sure that the messages they receive were really generated by the person they thought they were talking to (i.e., they can treat the messages as if they had been signed by their interlocutor’s private key that they had previously trusted), but if someone later hacks their computer and gets a copy of their chat history ciphertext, the attacker can’t use that to prove that either person actually said what they said. |
||
I can imagine it being useful for exchanging opinions; I might be happy to tell you my honest opinion on something, while at the same time not wanting you to be able to later publish a proof of what my opinion was for everyone to see. |
I can imagine it being useful for exchanging opinions; I might be happy to tell you my honest [[opinion]] on something, while at the same time not wanting you to be able to later publish a proof of what my opinion was for everyone to see. |
||
<span id="fhe-scenarios"></span> |
<span id="fhe-scenarios"></span> |
Latest revision as of 16:10, 12 September 2024
Main article: Privacy, identity, and fraud in the ratings system
Main article: Aggregation techniques
Deniable Authentication
While I don’t have a concrete use case in mind right now, it seems like we may encounter a use for deniable authentication. These systems allow the participants to be confident about the authenticity of the data exchanged, but the participants can later deny that they generated the data and a third party could not prove otherwise.
“Off The Record Messaging” is an implementation of deniable authentication for chat programs, implemented in software you may have used like Pidgin. The Signal protocol also has this property, it’s used in Signal, WhatsApp, and in Facebook Messenger, Skype, etc, when you choose e2e encryption. In it, the two participants chatting can be sure that the messages they receive were really generated by the person they thought they were talking to (i.e., they can treat the messages as if they had been signed by their interlocutor’s private key that they had previously trusted), but if someone later hacks their computer and gets a copy of their chat history ciphertext, the attacker can’t use that to prove that either person actually said what they said.
I can imagine it being useful for exchanging opinions; I might be happy to tell you my honest opinion on something, while at the same time not wanting you to be able to later publish a proof of what my opinion was for everyone to see.
FHE scenarios
We’ve been discussing opinion aggregators, and how to protect the privacy. The assumptions seem to be:
- for the people generating opinions:
- their opinions are not public, they don’t want the world at large knowing what their opinion is
- they are willing to have their opinion combined with others’ opinions to get a combined opinion, as long as that doesn’t expose their personal opinion
Trusted Aggregator
In the case of a trusted aggregator, things are pretty simple. If Alice wants to know Bob, Carol, and Dave’s collective opinion, she can ask the aggregator to collect the data and run the computation for her. She’ll send the aggregator a message that says:
- I want to know if Avatar is a good movie or not.
- Please give me the combined opinions of Bob, Carol, and Dave.
- My trust in them is as follows: Bob: 0.7, Carol 0.8, Dave: 0.8 The aggregator will then collect opinions from Bob, Carol, and Dave. For each, it will send them a message:
- Alice wants your opinion on Avatar.
- I’ll be combining the opinion with two other people’s opinions, and only giving Alice the average and they’ll simply reply with:
- my opinion is
x
After getting all opinions, the aggregator would perform the computation and return the result to Alice. There is no real encryption here, other than basic connection-level encryption.
Note in this scheme, the aggregator may be a centralized server, and it’s possible that Bob, Carol, and Dave may just store their opinions on the server.
Naturally, in this scheme, the aggregator has a lot of information. It knows what Alice is asking, who Alice is asking, and how much she trusts the others. The aggregator also knows the raw opinions of Bob, Carol, and Dave. The aggregator can also tamper with the results.
Less trusted aggregator
We can use homomorphic encryption to reduce what the aggregator sees. Let’s see how this works:
- Alice generates a public/private keypair for this session
- She constructs a vector of her trust levels in Bob, Carol, and Dave: [0.7, 0.8, 0.8], and encrypts it using the session public key
- She sends a message to the aggregator saying:
- I want to know if Avatar is a good movie or not.
- Please give me the combined opinions of Bob, Carol, and Dave.
- Here is my session pubkey
- Here is my encrypted trust vector When the aggregator gets this, it sends a message each to Bob, Carol, and Dave, that says:
- Alice wants your opinion on Avatar
- Here is the session pubkey
- I’ll be combining the opinion with two other people’s opinions, and only giving Alice the average And Bob, for example, would:
- encrypt his opinion using the session pubkey and send it to the aggregator
Once the aggregator receivte aggregator receives all of the encrypted opinions, it would
- run the averaging algorithm using the weights provided by Alice and the opinions provided by Bob, Carol, and Dave to get the encrypted result.
- send the encrypted result back to Alice Upon receiving the encrypted result, Alice would:
- decrypt the encrypted result using her session private key
- go to blockbuster video and rent Avatar
- be disappointed
In this scenario, we’ve improved things a bit. The aggregator no longer knows how much Alice trusts Bob, Carol, and Dave. It no longer knows Bob, Carol, and Dave’s opinions, and it doesn’t know the combined average opinion either.
Alice still needs to trust the aggregator
- because it could simply create three fake opinions as inputs to the algorithm and return that result
- because it learns what question Alice wants answered and who’s opinions she wants aggregated
Bob, Carol and Dave need to trust the aggregator
- When the aggregator asks them, they need to believe that Alice really initiated the request
- They need to trust that the aggregator is not colluding with Alice, and will not pass their encrypted opinions straight to Alice
- Bob, e.g., needs to trust that the aggregator is a good judge of character, and that the aggregator really believes that Carol and Dave are both good-faith actors and not colluding with Alice
- They need to trust that the aggregator isn’t running a MITM attack
Some of those could be mitigated a bit. For example:
- Alice could generate a encrypted message using Bob’s pubkey (and signed with Alice’s private key) that says:
- please give me your opinion on Avatar
- it will be aggregated with two other people’s opinions
- here is the session pubkey
- Alice would do the same for Carol and Dave
- Alice would add a bunch of random people whose opinions she’s not interested in to the list, and weight them with 0 trust.
- Then when she asks the aggregator, she doesn’t tell the aggregator the topic she’s interested in or the pubkey, she just gives the aggregator the three encrypted messages to pass along.
With those modifications, the aggregator
- couldn’t know what Alice was asking about
- couldn’t run a MITM attack
- wouldn’t know that Alice trusted Bob, Carol, and Dave. They would be somewhat obscured by the other random people Alice added.
There are still plenty of attack vectors. In particular, Alice could send a list of trust factors that are all 0 except for the one person whose opinion she wanted to see. Or she could add a bunch of people she knows rated Avatar at 100% on the tomatometer, along with Bob. Then she could gauge Bob’s opinion by how far down it nudged the 100%.
Making guarantees about the parameters
We can use a variant of the Blinded Signature protocol to help with the first problem. Let’s say the aggregator wants to be able to promise Bob, Carol, and Dave that the average trust value applied would be at least 0.5. With that promise, Bob can know that trust factors weren’t something like [1, 0, 0], specifically designed to expose his opinion. Without that promise, Bob may not be comfortable sharing his opinion. (in the case of 3 opinions, this isn’t a lot of protection, but it gets better as the number of opinions increases). Since the trust factors have been blinded to keep the aggregator from knowing how much Alice trusts each of the three, the aggregator can’t make this guarantee.
We can alter the protocol this way. Let’s say Alice has trust weights of Bob: 0.5, Carol: 0.6, Dave: 0.7
- Alice generates 100 keypairs, and 100 different random permutations of the trust vector [0.5, 0.6, 0.7]
- She homomorphically encrypts each permutation using a different key
- enc(key1, [0.5, 0.6, 0.7])
- enc(key2, [0.6, 0.5, 0.7])
- enc(key3, [0.7, 0.5, 0.6])
- and sends the encrypted vectors to to the aggregator
- the aggregator randomly chooses 99 vectors, and asks alice for the keys.
- alice sends the aggregator the keys for those vectors
- the aggregator decrypts the vectors and verifies that the contents of each fulfills its requirements (average value >= 0.5 or whatever)
- the aggregator can be 99% certain that the remaining vector also contains numbers that fulfill its requirements, but it doesn’t know the order of the values in that vector.
- Alice does know the order of values in that vector, though. Let’s say the remaining vector was [0.7, 0.5, 0.6]. Alice knows those correspond to her trust in [Dave, Bob, Carol], so she can tell the aggregator to order the inputs to its computation so that Dave’s input gets matched with Dave’s trust level, etc.
Can we keep the aggregator from knowing whose opinions it’s aggregating?
If we assume that there is an anonymous transport layer like TOR, let’s see what we can do?
Alice could assign a different UUID to Bob, Carol, and Dave.
- Alice could generate a encrypted message using Bob’s pubkey (and signed with Alice’s private key) that says:
- please give me your opinion on Avatar
- it will be aggregated with two other people’s opinions, and the average trust factor will be at least 0.5
- here is the session pubkey to encrypt your result with
- here is the UUID you should use to identify yourself to the aggregator when you send in your encrypted opinion
- then Alice sends that message directly to Bob
- Alice sends a message to the aggregator saying:
- I need you to compute an aggregate opinion for me
- You will receive three opinions, identified with these three UUIDs
- Here are the trust factors you should apply to those opinions: {uuid1: 0.5, uuid2: 0.6, uuid3: 0.7}
- When Bob gets the message from Alice, he composes a message that says:
- This is my encrypted opinion
- This is the UUID
- You may only use this opinion if it is aggregated with two other people’s opinion, and if the average trust factor is at least 0.5
With this modification, the aggregator doesn’t know who any of the participants are. It may not even know who Alice is, though it may be reasonable to require Alice to identify herself before requesting the aggregator to do work.
But there’s a down-side here: the aggregator can no longer assert that there are three distinct real humans providing opinions. Alice could, for example, attack the system by sending in fake opinions for Carol and Dave, to make it trivial to reverse-engineer Bob’s opinion. All the aggregator knows is that it has opinions identified with three different UUID, but Alice could easily forge any or all of them.
Fixing this with the help of another trusted service
Let’s say we have another service i’ll call an identity service, that can be used to prove that an actor is an individual human. To use this, Bob would contact the identity service and prove his bobness by signing a message with his private key. He would also provide the UUID he was assigned by Alice, and also something that identifies this specific aggregation (in our case, could be the pubkey, or could be a UUID that Alice generated). The identity service would look up Bob in a database or using the regular reputation system web of trust to establish that Bob is a human within some level of certainty, then it would sign a message that says “The fella with UUID x is a real person, and is the 2nd distinct real person to contact me regarding the aggregation that uses pubkey y”.
When Bob then sends his encrypted opinion to the aggregator, he includes this message from the identity service. The aggregator will check that each opinion-giver has provided a similar message, that their UUID matches the one the identity service signed, and that the identity messages say 1st, 2nd, and 3rd indicating that they’re unique real people.
The down-side here is that the Identity server knows that Bob, Carol, and Dave are all involved in the same computation. They don’t know what it is, or that it’s for Alice. This feels like an improvement, but it would be nice to avoid this.
We might be able to do better using crypto. Homomorphic encyrption may be able to help us here, or we may be able to use some ‘private set intersection’ protocol to determine that the participants are unique, and just rely on the identity server to prove that the participants are real people.
HTTP-Response: Error 400: Internal Server Error Diagram-Code: sequenceDiagram participant Alice participant Aggregator participant Bob participant Carol participant Dave participant Identity Note over Alice: Generates keypair, UUID_aggregation, and UUID_bob, UUID_carol, UUID_dave Note over Alice: Could use blinding scheme to prove to aggregator that homomorphically-encrypted weights are fair instead of sending weights in the clear Alice->>+Aggregator: I need an aggregate computation UUID_aggregation, weights UUID_bob: 0.5, UUID_carol: 0.6, UUID_dave: 0.7. Alice->>+Bob: I want your opinion on Avatar, it will be averaged with 2 others' opinions. The average trust level will be at least 0.5 Alice->>Bob: identify yourself with UUID_bob. Encrypt with pubkey and send results to Gandalf's aggregator Bob->>-Alice: sure thing Bob->>+Identity: Hey, I'm Bob, helping out with aggregation UUID_aggregation. see, I signed this thing. Identity->>-Bob: Here's a signed message saying you're the first real person I've seen for UUID_aggregation Note over Bob: Encrypts his opinion using pubkey Bob->>Aggregator: Hi, I'm UUID_bob. Here's my encrypted opinion on UUID_aggregation and a message from Identity proving I'm real Bob->>Aggregator: Only use my opinion if it's averaged with at least 2 others with an average trust level of 0.5 Note over Aggregator: collects opinions from Carol and Dave in the same manner loop Computation Aggregator->Aggregator: averages the results end Aggregator->>-Alice: Here's the encrypted result Note over Alice: decrypts the result and gets the answer