Argument evaluation and scoring

Some thoughts on evaluating arguments

In our last meeting Dan asked how we might evaluate an argument made by a respondent instead of simply relying on the given probability (as we've been doing so far). An argument, assuming it is made public, could then be evaluated by the questioner and others independently to find a more accurate probability. This opens up a new idea in our work, that of assessing the truth by evaluating the reasoning put forth in an opinion.

One idea for doing this starts with a simple model for argument construction. The argument consists of supporting statements which are tied together with logic to form a conclusion. The conclusion is the answer to the overall question being asked of the network. Each supporting statement and the logic can be evaluated independently to determine the extent to which the conclusion is true. The following diagram illustrates this:

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   0 [label="Answer/Conclusion, Pc"]
   1 [label="Logic, Pl"]
   2 [label="Support. Stmt. 1, Ps1"]
   3 [label="Support. Stmt. 2, Ps2"]
   4 [label="Support. Stmt. 3, Ps3"]
   0 -> 1 [dir="back"];
   1 -> 2 [dir="back"];
   1 -> 3 [dir="back"];
   1 -> 4 [dir="back"];
   }

```

The probability of the supporting statements can be combined in a Bayesian manner. This is in keeping with the Bayesian idea of modifying prior probabilities given new evidence (ie supporting statements). These probabilities can be trust-modified as Sapienza proposed (https://ceur-ws.org/Vol-1664/w9.pdf) but since they are likely being assigned by the questioner, we will assume that trust is already built into them. Of more importance is the relevance of the supporting statements. They can range from completely irrelevant to completely relevant. A completely relevant statement will take the full value of the probability it was originally assigned. A completely irrelevant statement would reduce the probability to 50%, where it will have no influence on the outcome. In that sense relevance functions in the same way trust does to modify the probability:

$$ P_{mod} = P_{nom} + R(P - P_{nom}) $$

where

$R$ is relevance (0.0 - 1.0)

$P_{mod}$ is relevance-modified probability

$P_{nom}$ is the nominal probability (=0.5 for a predicate question)

$P$ is the unmodified probability

After the relevance-modification, each supporting statement is combined in the usual manner via Bayes. For the first two statements,

$$ P_{comb1,2} = {P_{s1}P_{s2}\over {P_{s1}P_{s2} + (1-P_{s1})(1-P_{s2})}} $$

and so on for each additional statement. Here it is to be understood that $P_{s1}$, etc. is the value after the relevance modification.

Logic will also have a probability assigned to it to represent its quality. A fully illogical argument would receive a 0, which when combined via Bayes with the supporting statements would render the probability of the entire argument 0. This makes sense because a completely illogical argument, regardless of the strength of its supporting statements, destroys itself. A fully logical argument, however, will not receive a 1 but rather a 0.5. When combined with the supporting statements a 1 would render the final probability a 1, which is not reasonable. A 0.5, however, would do nothing and the final probability would be the combined probability of the supporting statements. Thus we assume that perfect logic is neutral and less than perfect logic reduces the combined probability of the statements. Again, this seems reasonable. We expect, by default, logical arguments which then rest on the strength of their supporting statements. If we notice flaws in the logic we discount the strength of the argument accordingly.

Let's try an example with these ideas:

Question: Is man-made climate change real?
Answer / Conclusion: Man made climate change is real.
Logic: Mankind is causing climate change if we can show that the earth's temperature is changing over time and can show a human behavior that makes the temperature change.
Supporting Statements:
1. The average earth temperature has gone up by 2 deg F since the late 19th century (https://climate.nasa.gov/evidence/#footnote_4) 2. My wife complained about the heat this summer. 3. Scientists say the oceans are getting warmer, ice caps are melting, and glaciers are retreating.

We start by judging the quality of the supporting statements. 1 seems like a well substantiated statement (a high P) but is not completely relevant because it only hints at human involvement. 2 is completely true but irrelevant. 3 is a contributor but seems less substantiated than 1 and contains no human cause. We proceed by assigning probability and relevance values:

$P_{s1}=0.9$ $R_{s1}=0.7$ $P_{s1mod} = 0.5 + 0.7(0.9 - 0.5) = 0.78$

$P_{s2}=1.0$ $R_{s2}=0.0$ $P_{s2mod} = 0.5 + 0.0(1.0 - 0.5) = 0.5$

$P_{s3}=0.75$ $R_{s3}=0.5$ $P_{s3mod} = 0.5 + 0.5(0.75 - 0.5) = 0.625$

Since s2 won't count in the Bayesian calculation we can ignore it and:

$$P_{comb,s} = {(0.78)(0.625) \over {0.78(0.625)+0.22(0.375)}} = 0.855$$

The logic/conclusion in this case is reasonably strong so we will assign it a high value, say $P_l = 0.45$ (remember, out of 0.5). It could be improved by observing that the word "behavior" is too general and should be replaced by, say, "policy choice" (ie burning fossil fuels). We note here that logic is more than just the mathematical construction of an argument. Since we are speaking a human language, logic might also be flawed because it uses imprecise wording.

Putting $P_{comb,s}$ together with $P_l$ using Bayes we obtain a concluding probability:

$$P_c = {0.855(0.45) \over Template:0.855(0.45) + 0.145(0.55)} = 0.82$$

One potential pitfall of this model is that repetitive supporting statements of high probability will quickly render a combined probability near 1.0. As we've seen in the past, this is simply the result of the Bayes equation. The user would need to watch for attempts like these to distort the answer by removing repetitive statements or making them irrelevant.

Scoring of individual arguments

[Last time](More argument mapping tools and proposed ideas for our own such tool) we discussed some criteria for argument scoring:

- Veracity, $V$ - Impact & Relevance, $R$ - Clarity, $C$ - Informal Quality (extent to which argument is free of fallacies), $F$

Since Impact and Relevance are closely related concepts we will merge these into one, Relevance. The simplest method for combining these is to average them, or weighted average them:

$$ S = w_vV + w_rR + w_cC + w_fF $$

where

$w_x$ is a weighting for category X (eg Veracity, Relevance, etc)

$w_v + w_r + w_c + w_f = 1$

This seems reasonable and if we believe that certain criteria should weigh more (such as Veracity) we can easily make the weighting factors reflect this. However, intuitively it seems that a category such as Veracity should not only weigh more but have the power to take down the whole argument. After all, if the argument is a straightforward lie, it should receive a score of zero, regardless of its other attributes (such as relevance, clarity, etc):

Who is the best choice for President? Trump is the best choice because he is honest in everything he does and would be President today if the Democrats hadn't cheated him out of the 2020 election, which he won.

This argument is a lie and although it is clear, has no evident fallacies, and is relevant to the question at hand, it should be thrown out.

The same can be said of Relevance. A completely irrelevant argument should also have the power to render the whole argument moot:

Who is the best choice for President? Trump is the best choice because he likes McDonald's and so do I.

With this in mind, we can propose the following equation, which we will dub the "VRFC equation":

$$ S = VR(w_fF + w_cC) $$

Where

$S$ = Score for the argument which varies from 0-1

$w_f$ = weighting factor for Fallacies.

$w_c$ = weighting factor for Clarity.

$w_f + w_c = 1$

and each of the constituent variables ($V, R, F, C$) has a range 0-1.

In this equation either Veracity or Relevance have the power to nullify the entire argument. Similarly a combination of Fallacies and lack of Clarity can do the same. However, a fallacious argument alone seems like it could still have merit, as would an argument whose only flaw was lack of clarity:

We should support Ukraine because if Russia prevails it will conquer the world

This argument commits the slippery slope fallacy but is not entirely invalid. Similarly an unclear argument can still manage to make a point:

We should support Ukraine because first the Sudetenland, then the Czechs, and soon enough it's over when all the Brits had to do was get rid of that weakling sooner.

It would seem that a fallacious argument should weigh more than an unclear one. Proposed weights might be:

$w_f = 0.7, w_c = 0.3$

Rolling up the score of argument trees

The equation above applies to a single argument but, as we've seen, most arguments have sub-arguments below, sub-sub-arguments, and so forth. They are really trees in which each individual argument can be scored separately.

Here we develop a proposed equation for rolling up the score for an argument based on its own score and that of its sub-arguments. In doing so we emphasize that any argument can stand on its own and be scored in the absence of sub-arguments. This creates an interesting dynamic. The sub-argument may bolster or detract from the parent argument but the extent to which it does should be limited.

Furthermore, once the sub-argument becomes weaker than a certain threshold, it should stop influencing the parent argument altogether. Here, we will set this threshold at 0.5. Thus only Pro sub-arguments that score 0.5 or better will have any influence on the parent argument. For Con sub-arguments we will use the same threshold but first modify the sub-argument score by $1-S$. Thus a strong Con sub-argument, scoring say 0.9, would enter the calculation with a score of 0.1. The result is a range of scores 0-1 of which 0-0.5 is Con and 0.5-1 is Pro. Scores of exactly 0.5 are neutral.

Let's consider the case with one argument and one pro sub-argument and one Con sub-argument.

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   0 [label="Argument, s = 0.9"]
   1 [label="Pro sub-argument, xp = 0.7"]
   2 [label="Con sub-argument, xc = 0.7"]
   0 -> 1 [dir="both"];
   0 -> 2 [dir="both"];
   }

```

In this case the argument's score is 0.9, and both the Pro/Con sub-argument score is 0.7. These numbers would normally be arrived at by using the VRFC eqn above, but we will just assume them for now. The first sub-argument, in this case, bolsters the argument because it is a Pro argument and has a score (0.7) greater than 0.5. The second sub-argument, with the same score, detracts from the argument because it is on the Con side. We emphasize that if these scores were at or below 0.5 they would have no effect on the argument.

The general equation governing this situation is as follows:

For $s \gt 0.5$ and $x \gt 0.5$

$$ s_{mod} = 2(1-s)fx + s - (1-s)f $$

For $s \gt 0.5$ and $x \le 0.5$

$$ s_{mod} = s $$

For $s \lt 0.5$ and $x \gt 0.5$

$$ s_{mod} = 2sfx + s - sf $$

For $s \lt 0.5$ and $x \le 0.5$

$$ s_{mod} = s $$

where

$s$ = score for parent argument

$x = x_p$ = score for Pro arguments

$x = 1-x_c$ = score for Con arguments

$f$ = maximum possible fraction of increase possible, 0-1

The variable $f$ is a user selected number between 0-1 and represents the extent to which the sub-argument score can be affected. For example, a sub-argument with x = 0.9, as in the diagram above, can be improved by 0.1 to a maximum of 1. Then $f$ represents the fraction of 0.1 that we will allow for our improvement. If $f = 0.25$, for instance, then the maximum range around 0.9 that the sub-argument can affect is $(0.25)(0.1) = 0.025$. Thus the maximum score the argument can have is 0.925 and the minimum is 0.875.

For the argument above:

$s = 0.9$

$f = 0.25$ User input

For the Pro sub-argument:

$x = x_p = 0.7$

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.9)(0.25)(0.7) + 0.9 - (1-0.9)(0.25) = 0.91$

For the Con sub-argument:

$x = (1-x_c) = (1-0.7) = 0.3$

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.9)(0.25)(0.3) + 0.9 - (1-0.9)(0.25) = 0.89$

We can see here that the Pro and Con sub-arguments exactly balance each other since they both have the same score.

The equation above is piecewise linear and can be visualized as follows:

![image](uploads/7f62b22b9eb19771503d654db29cab92/image.png)

One important property of this equation is that the stronger (or weaker) an argument becomes, it becomes harder for a sub-argument to change it. This is because the maximum allowed movement is $1-s$ if $s\gt 0.5$ or simply $s$ if $s\le 0.5$. The idea behind this property is that very strong arguments should be harder to dislodge precisely because they have covered themselves well. A weaker argument, for instance one that fails to mention an obvious supporting fact, is in a position to be bolstered more by a sub-argument which mentions the fact. Similarly a very weak argument should be difficult to bolster. If the argument is a lie or irrelevant, for instance, there isn't much that can be done to rescue it.

This property has the further consequence that $s$ cannot be changed by the sub-arguments if it is 1 or 0. A truly perfect argument, $s = 1$, cannot be weakened no matter how strong its Con sub-argument. Similarly a perfectly flawed argument, $s = 0$ cannot be bolstered with any Pro sub-argument. We will discuss below a method to deal with the fact that, regardless of the quality of the argument, users may still vote to score arguments 1 or 0.

Population adjustments

The algorithm described above assumes a single vote for the argument and sub-arguments. In fact, this will rarely be the case, because multiple users will be voting on each. The effect of a sub-argument on its parent should be weighed by the population of users who voted for the sub-argument and parent argument.

Here we propose a simple modification factor, based on the ratio of users voting for each argument/sub-argument:

$s_{mod,pop} = (s_{mod} - s){p_s\over p} + s$

where

$s_{mod,pop}$ is the population modified score

$s_{mod}$ is the modified score without population modifications (see above)

$p_s$ is the population voting for the sub-argument

$p$ is the population voting for the parent argument

$s$ is the original score of the parent argument

Usually we expect that sub-arguments will receive fewer votes than parent arguments, so ${{p_s\over p} \le 1}$ in general. For the case when $p_s \gt p$ we will force ${p_s\over p} = 1$. Therefore there is no danger that a sub-argument can overwhelm a parent argument by voting power alone. This is in keeping with our philosophy that sub-arguments can have at best a limited effect on parent arguments.

Example calculation

Let's do a problem with the following argument tree and $f=0.25$:

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   0 [label="0, Thesis Statement"]
   1 [label="1, Pro argument, s = 0.9, p = 96"]
   2 [label="2, Pro sub-argument, xp = 0.7, p = 55"]
   3 [label="3, Pro sub-argument, xp = 0.8, p = 26"]
   4 [label="4, Con sub-argument, xc = 0.6, p = 30"]
   5 [label="5, Con sub-argument, xc = 0.7, p = 43"]
   6 [label="6, Pro sub-argument, xp = 0.85, p = 19"]
   7 [label="7, Con sub-argument, xc = 0.95, p = 28"] 
   8 [label="8, Con argument, ...."]
   0 -> 1 [dir="both"];
   1 -> 2 [dir="both"];
   1 -> 5 [dir="both"];
   2 -> 3 [dir="both"];
   2 -> 4 [dir="both"];
   5 -> 6 [dir="both"];
   5 -> 7 [dir="both"];
   0 -> 8 [dir="both"];
   }

```

Our objective here is to roll up the score for the Pro side of this tree. The Con side would be calculated similarly and we will skip this for the sake of brevity. Note that $s$ stands for the score and $p$ is the population voting to produce that score. We start at the bottom, with the 2-3-4 portion of the tree, and for the sake of consistency with the above calculation we will recast $x_p$ for 2 as $s$ and label the population of the sub-arguments as $p_s$.

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   2 [label="2, Pro argument, s = 0.7, p = 55"]
   3 [label="3, Pro sub-argument, xp = 0.8, ps = 26"]
   4 [label="4, Con sub-argument, xc = 0.6, ps = 30"]
   2 -> 3 [dir="both"];
   2 -> 4 [dir="both"];
   }

```

For the Pro sub-argument, we write:

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.7)(0.25)(0.8) + 0.7 - (1-0.7)(0.25) = 0.745$

We modify this by the respective populations:

$s_{mod,pop23} = (s_{mod} - s){p_s\over p} + s = (0.745 - 0.7){26\over 55} + 0.7 = 0.721$

For the Con sub-argument we first modify its score,

$x = 1 - x_c = 0.4$

and write

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.7)(0.25)(0.4) + 0.7 - (1-0.7)(0.25) = 0.685$

and modify by the respective population,

$s_{mod,pop24} = (s_{mod} - s){p_s\over p} + s = (0.685 - 0.7){30\over 55} + 0.7 = 0.692$

These two values of $s_{mod,pop}$ can now be combined to create a new $s$ for the Pro argument:

$s_{mod,tot} = (s_{mod,pop23} - s) + (s_{mod,pop24} - s) + s = (0.721 - 0.7) + (0.692 - 0.7) + 0.7 = 0.713$

We note here that the Pro argument got a little stronger as a result of its sub-arguments. The Pro sub-argument was substantially stronger than the Con sub-argument and, although fewer people voted for it, the population difference was not large.

For the Con sub-argument 5-6-7 we have the following situation:

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   5 [label="5, Con argument, s = 0.7, p = 43"]
   6 [label="6, Pro sub-argument, xp = 0.85, ps = 19"]
   7 [label="7, Con sub-argument, xc = 0.95, ps = 28"] 
   5 -> 6 [dir="both"];
   5 -> 7 [dir="both"];
   }

```

Here, for the Pro sub-argument, we first modify its score since it is the opposite of its parent. It is as if the parent were a Pro argument and the child were a Con argument.

$x = 1 - x_p = 1 - 0.85 = 0.15$

We then proceed as usual with the calculation:

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.7)(0.25)(0.15) + 0.7 - (1-0.7)(0.25) = 0.6475$

$s_{mod,pop56} = (s_{mod} - s){p_s\over p} + s = (0.6475 - 0.7){19\over 43} + 0.7 = 0.677$

For the Con sub-argument $x = x_c = 0.95$ since the parent argument is also Con:

$s_{mod} = 2(1-s)fx + s - (1-s)f = 2(1-0.7)(0.25)(0.95) + 0.7 - (1-0.7)(0.25) = 0.7675$

$s_{mod,pop57} = (s_{mod} - s){p_s\over p} + s = (0.7675 - 0.7){28\over 43} + 0.7 = 0.744$

We combine these two results in the same manner as above:

$s_{mod,tot} = (s_{mod,pop56} - s) + (s_{mod,pop57} - s) + s = (0.677 - 0.7) + (0.744 - 0.7) + 0.7 = 0.721$

With the bottom layer of the tree calculated, we have the following situation:

```graphviz digraph G {

   fontname="Helvetica,Arial,sans-serif"
   node [fontname="Helvetica,Arial,sans-serif"]
   edge [fontname="Helvetica,Arial,sans-serif"]
   layout=dot
   0 [label="0, Thesis Statement"]
   1 [label="1, Pro argument, s = 0.9, p = 96"]
   2 [label="2, Pro sub-argument, xp = 0.713, ps = 55"]
   5 [label="5, Con sub-argument, xc = 0.721, ps = 43"]
   8 [label="8, Con argument, ...."]
   0 -> 1 [dir="both"];
   1 -> 2 [dir="both"];
   1 -> 5 [dir="both"];
   0 -> 8 [dir="both"];
   }

```

All that remains is to calculate the 1-2-5 portion, which is very similar to the 2-3-4 calculation performed above. Therefore will skip the details of this and simply report the results:

$s_{mod,pop12} = 0.906$

$s_{mod,pop15} = 0.895$

$s_{mod, tot} = 0.901$

We see here that the final result is not much different than the original $s = 0.9$. This is a result of the Pro sub-arguments essentially being cancelled by the Con sub-arguments. Such a result is to be expected in many cases.

In this example, we are skipping the Con side of the overall argument (node 8 in the tree above) because it would be exactly the same as what we have shown. If it had been calculated we would then combine the result for 1 and 8 to produce an overall score for the argument.

The calculations above can be performed with the [attached snippet](https://gitlab.syncad.com/peerverity/trust-model-playground/-/snippets/164). The user input portion of the snippet is set up for the calculation we did immediately above:

```python

User input

side_parent = 'pro' #side, pro or con, that the parent argument is on s = 0.9 #score for the parent argument mf = 0.25 #max fraction that parent argument can be changed in terms of (1-s) or (s-0) p = 96.0 #population voting for the parent argument x_pro_arr = [0.713] #score for the pro children x_con_arr = [0.721] #score for the con children ps_pro_arr = [55.0] #population voting for each pro child sub-argument ps_con_arr = [43.0] #pop voting for each con child sub-argument mods_if1or0 = True #True if we want scores of 1 or 0 to be modified to near 1 or 0 (otherwise they can't be adjusted by this calculation) ```

To note, the snippet contains arrays to handle any number of child arguments. These are combined in the same way we combined the single Pro and Con argument above.

Another variable, `mods_if1or0` controls whether we allow $s$ to be modified when it is set to 1 or 0. As discussed above, arguments where $s = 1$ or $s = 0$ are perfect, or perfectly flawed, and thus cannot be changed with sub-arguments. This idea may be theoretically plausible but it wouldn't stop users from voting 1 or 0 for arguments. In such cases the `mods_if1or0` switch, when True, changes $s$ to 0.99 and 0.01 respectively.

As a side note, this property is similar to Bayesian probabilities of 1 or 0 which also cannot be changed. We have discussed this problem in earlier posts under the guise that 1 or 0 probabilities don't really exist because they would require an infinite sample size. In the same way, a perfect argument (or perfectly imperfect) argument cannot exist because it would, at some point, run into the same issues that Bayesian probabilities do.

For example, suppose we've invented a pill that cures cancer. It is one dose, costs 10 cents to make, has no side effects, has no environmental impact due to manufacture, and is certain to cure someone's cancer. The argument for a cancer patient taking the pill is, for all practical purposes, perfect. There is simply no plausible argument against it. We could score such an argument a 1 until we remember our probabilities. We only know the pill works and has no side effects on a limited population, say 100,000 patients. We don't know what effect it will have on the 100,001st patient. So the best we can say is that the drug is 0.99999 effective. Given that the argument is really predicated on the effectiveness of the drug we could say its score is also 0.99999.

[Slashdot](https://slashdot.org/)

Slashdot offers a system for content moderation summarized by the following from [Wikipedia](https://en.wikipedia.org/wiki/Slashdot):

> Slashdot's editors are primarily responsible for selecting and editing the primary stories that are posted daily by submitters. The editors provide a one-paragraph summary for each story and a link to an external website where the story originated. Each story becomes the topic for a threaded discussion among the site's users.[62] A user-based moderation system is employed to filter out abusive or offensive comments.[63] Every comment is initially given a score of −1 to +2, with a default score of +1 for registered users, 0 for anonymous users (Anonymous Coward), +2 for users with high "karma", or −1 for users with low "karma". As moderators read comments attached to articles, they click to moderate the comment, either up (+1) or down (−1). Moderators may choose to attach a particular descriptor to the comments as well, such as "normal", "offtopic", "flamebait", "troll", "redundant", "insightful", "interesting", "informative", "funny", "overrated", or "underrated", with each corresponding to a −1 or +1 rating. So a comment may be seen to have a rating of "+1 insightful" or "−1 troll".[57] Comments are very rarely deleted, even if they contain hateful remarks.[64][65]

> Starting in August 2019 anonymous comments and postings have been disabled.

> Moderation points add to a user's rating, which is known as "karma" on Slashdot. Users with high "karma" are eligible to become moderators themselves. The system does not promote regular users as "moderators" and instead assigns five moderation points at a time to users based on the number of comments they have entered in the system – once a user's moderation points are used up, they can no longer moderate articles (though they can be assigned more moderation points at a later date). Paid staff editors have an unlimited number of moderation points.[57][62][66] A given comment can have any integer score from −1 to +5, and registered users of Slashdot can set a personal threshold so that no comments with a lesser score are displayed.[62][66] For instance, a user reading Slashdot at level +5 will only see the highest rated comments, while a user reading at level −1 will see a more "unfiltered, anarchic version".[57] A meta-moderation system was implemented on September 7, 1999,[67] to moderate the moderators and help contain abuses in the moderation system.[68][unreliable source?][page needed] Meta-moderators are presented with a set of moderations that they may rate as either fair or unfair. For each moderation, the meta-moderator sees the original comment and the reason assigned by the moderator (e.g. troll, funny), and the meta-moderator can click to see the context of comments surrounding the one that was moderated.[62][66]

Slashdot's purpose is to promote high quality discussion which is somewhat similar to our purpose of promoting high quality arguments. In particular, the reputation (karma) of the moderators is an interesting concept. We could use a similar system to weight voters with a good reputation higher in their argument scoring. Another interesting idea is the use of word descriptors to match scores. In our system descriptors such as "Completely irrelevant", "somewhat irrelevant", etc. could be a useful way to break up corresponding numerical ranges in our 0-1 scoring system.