Software for argumention -- IBM Project Debater and Carneades

Main article: Logic

IBM Project Debater API

IBM maintains an “early access program” for Project Debater which includes a Python API. You must login as Guest and accept the agreement before getting access.

The following installation procedure was followed:

$ pip install debater-python-api
$ pip install tqdm   (not included in above install for whatever reason)

Upon attempting to run one of the example problems, however, the need for an API key became apparent:

We have reached out to IBM to request the key, introduce ourselves, and briefly describe the PeerVerity mission. Their response was to use the online demo page instead.

The Project Debater website provides a demo page where users can try out several of the features present through the API. The API can perform a number of tasks related to argument mining: key points analysis, claim detection, claim boundaries, evidence detection, argument quality, pro/con identification, term wikifier, term relater, clustering, theme extraction, index searcher, narrative generation, etc.

Let’s try the one for Argument Quality. The user types a topic and sentence. The sentence is then scored on structure and clarity as well as relevance to the topic.

Topic: We should further explore the development of autonomous vehicles.

Sentence: Autonomous vehicles will help the disabled move around independently.
Score: 0.9406

Sentence: Cars should only provide assisted driving, not complete autonomy
Score: 0.8043

Sentence: Autonomous vehicles are cute.
Score: 0.3778

Sentence: Jack and Jill went down the hill.
Score: 0.3338

Sentence: dksk skkal akdklfl laldktivk
Score: 0.3023

We can clearly see a declining score as the sentence becomes less relevant to the topic at hand. The second to last sentence is nonsense with regard to the topic but still receives a non-zero score. One would think this is because its clarity and structure are still ok despite its poor relevance. However, the last sentence, which isn’t anything comprehensible still receives a significant non-zero score. This problem would need to be looked at further.

Another service is Pro/Con, that is the extent to which a sentence supports a given topic.

Topic: Social media is harmful

Sentence: Social media disproportionally promotes fake news
Score: 0.9966

Sentence: Social media use among our youth is of some concern but shouldn't be our highest priority.
Score: 0.9728

Sentence: Social media seems to influence our kids.
Score: 0.9680

Sentence: Social media gives our bored kids something to do.
Score: -0.9993

Here, an attempt to provide a significant negative gradient of scores still resulted in very high scores for the first three sentences. The last sentence, which clearly favors the opposite point of view (with respect to the topic) does switch to a negative number, indicating its opposite polarity. The fact that the first 3 sentences are so similar score-wise, needs to be investigated.

Further evidence that Debater doesn’t understand the sentences (or is terrible at scoring) is the following:

Sentence: Social media is neither good nor bad.
Score: 0.9885

Sentence: The latest surveys paint a mixed picture on the harmful effects of social media use.
Score: 0.9973

These sentences are neutral on the topic. They should score around 0, since the scale is -1 to 1.

Incidentally, on this topic, we tried using Bard(Google), ChatGPT, GPT 4.0 (through Bing), and Claude (Anthropic) to answer the following prompt:

Prompt: Let’s suppose I start with the assertion that “social media is harmful”. I then want to provide sentences that support or contradict the assertion. A sentence that fully supports it will be scored 1. A sentence that fully contradicts it will be scored -1. A neutral sentence will be scored 0. How would the sentence “Social media disproportionally promotes fake news” score?

Bard: The sentence “Social media disproportionally promotes fake news” would likely score between -0.5 and 0.5, meaning it’s close to neutral but with a slight negative lean. Here’s why: For a score of -1 (fully contradicts): It would need clear evidence that social media actively suppresses or discourages fake news, which isn’t the case. There would need to be definitive proof that other media sources are significantly better at fact-checking and promoting accurate information, which is also debatable. For a score of 1 (fully supports): The statement would need to prove that fake news makes up a significantly larger proportion of content on social media compared to other platforms. While research supports that fake news spreads faster and gets more engagement on social media, quantifying its “disproportionate” nature is challenging. It would need to demonstrate that social media platforms intentionally promote fake news, which they largely deny and actively try to combat. Therefore, the statement sits in a gray area: Negative aspect: Social media algorithms can favor sensational and emotionally charged content, including fake news, due to its higher engagement potential. This can create echo chambers and confirmation bias, making it harder for users to encounter diverse perspectives and factual information. Neutral aspect: Social media platforms also host verified news sources and provide fact-checking tools. Additionally, user awareness and responsible content consumption can mitigate the spread of misinformation. Considering these nuances, a score of -0.5 to 0.5 reflects the statement’s partial truth and lack of a definitive outcome. Ultimately, the “harmful” nature of social media is a complex issue with various contributing factors. While fake news plays a role, attributing “disproportionate” promotion solely to social media requires more thorough analysis and context.

Bard gets the wrong answer from the outset, so we will skip any analysis of its faulty explanation. It is unclear why Bard fails so spectacularly in this case. It apparently forgets that the baseline for comparison is the sentence “social media is harmful” and the subsequent scoring rules.

ChatGPT: The sentence “Social media disproportionally promotes fake news” would likely be scored as supporting the assertion that “social media is harmful.” This is because it suggests a negative impact, indicating that social media has a tendency to promote misinformation and falsehoods. Therefore, it would likely score 1 in favor of the assertion.

GPT 4.0 and Claude also get the correct answer (1.0) for largely the same reasons.

Carneades

Software for the latest version of Carneades is available here.

Requirements are Version 1.4x or newer of the Go programming language. My version is 1.17.

$ go install github.com/carneades/carneades-4/src/cmd/carneades@latest

After installation carneades.exe will be in your $GOPATH/bin/carneades. On my system, for example, this is C:\Users\peter\go\bin\carneades.exe. To run an example file:

Grab the tandem.yaml file from the examples folder on the Carneades github site.
Put it where carneades.exe is
Run: carneades eval -o tandem.graphml tandem.yml. This converts tandem.yml file to tandem.graphml
To view the graphml file, download an appropriate viewer, eg yEd.
Run viewer and open the tandem.graphml file.

Carneades requires a pre-built structured argument file to start with, the YAML file mentioned above. Earlier versions of Carneades (Carneades 2.0) have a GUI to help with this, but the GUI cannot be applied to the current version (completely different rewrite, different language).

The current version has a web server which presumably allows for some graphical functionality but, after starting it, I was unable to view any contents in the browser.

An example YAML input file looks like this, an argument (with sub-arguments) about the legalization of marijuana.

meta:
  title: Legalization of Marijuana
  notes: "A reconstruction of arguments about the pros and cons of legalizing marijuana"
  source: >
    http://www.nytimes.com/2014/08/01/opinion/Legalizing-Marijuana-Pros-and-Cons.html?_r=0#addenda
    http://www.dbp.idebate.org/en/index.php/Debate:_Legalization_of_Marijuana#Pro.2Fcon_resources

statements:
  legalize: Marijuana should be legalized.
  ¬legalize:  Marijuana should not be legalized.
  harmless: Marijuana is harmless.
  ¬harmless: Marijuana is not harmless.
  social_costs: Legalization would entail heavy social costs.
  ¬social_costs: Legalization would not entail heavy  social costs.
  reduced_consumption: Legalization would reduce the consumption of marijuana.
  ¬reduced_consumption: Legalization  would not reduce the consumption of marijuana.
  reduce_crime: Legalization would reduce crime.
  ¬reduce_crime:  Legalization would not reduce crime.
  economical: Legalization would be economical.
  ¬economical: Legalization would not be economical.
potency.
  legal_drugs_have_higher_consumption: Legal drugs, like alcohol and tobacco, are used more than illegal drugs.
  us_drug_use_statistics: 52 percent of Americans 12 or older used alcohol, 27 percent used tobacco and only 7 percent used marijuana.
  emergency_rooms: Marrijuana emergency roolm admissions are reported at the rate of 1,250 a day and 455,000 a year.
  highway_crashes: Highway crashes double for marijuana users.
  robbed_future: Prohibition has robbed children of their future, through imprisonment.
  prison_complex: Prohibition has led to the construction of a huge prison-industrial complex.
  harmful_for_young_brains: Marijuana is harmful for still-developing adolescent brains.
  costs_of_regulation: Legalization would create an economic burden for regulating the drug.
  costs_of_treatment:  Legalization would create an economic burden to treat new problematic users.
  increased_risk_of_substance_use_disorders: Research shows that use of marijuana, especially in early adolescence, significantly heightens risks of substance use disorders in later life.
  better_treatment_and_prevention_infrastructure: Legalize would require the provision of a much better prevention and treatment infrastructure.

issues:
  i1:
    positions: [legalize, ¬legalize]
  i2:
    positions: [harmless, ¬harmless]
  i4:
    positions: [social_costs, ¬social_costs]
  i5:
    positions: [reduced_consumption, ¬reduced_consumption]
  i8:
    positions: [reduce_crime, ¬reduce_crime]
  i10:
    positions: [economical, ¬economical]
      
arguments:
  a1:
    scheme: cumulative
    conclusion: legalize
    premises:
      - harmless
      - ¬social_costs
      - reduced_consumption
      - reduce_crime
      - economical
  a2:
    scheme: cumulative
    conclusion: ¬legalize
    premises:
      - ¬harmless
      - social_costs
      - ¬reduced_consumption
      - ¬reduce_crime
      - ¬economical
  a3:
    conclusion: ¬reduced_consumption
    premises:
      - legal_drugs_have_higher_consumption
  a4:
    conclusion: legal_drugs_have_higher_consumption
    premises:
      - us_drug_use_statistics
  a5:
    conclusion: ¬harmless
    scheme: cumulative
    premises:
      - emergency_rooms
      - highway_crashes
      - harmful_for_young_brains
      - increased_risk_of_substance_use_disorders
  a6:
    conclusion: ¬social_costs
    premises:
      - robbed_future
      - prison_complex 
  a7:
    conclusion: ¬economical
    scheme: cumulative
    premises:
      - costs_of_regulation
      - costs_of_treatment
      - better_treatment_and_prevention_infrastructure

assumptions:
  - us_drug_use_statistics
  - harmful_for_young_brains
  - emergency_rooms
  - costs_of_regulation
  - robbed_future
  - prison_complex

Running this file through Carneades,

$ carneades eval -o legalization-of-marijuana.graphml legalization-of-marijuana.yml

again produces a graphml file which is viewed in yEd:

Several arguments a1-a7 are made. The second argument, visualized above, is that marijuana should not be legalized. The statements that support this position are shown in the green and red boxes above the argument. The green boxes are assumptions which have public support and the red ones do not. Since there are 3 green boxes and 2 red boxes, the strength of the conclusion is 60% (the boxes all weigh equally by default, although this is adjustable). It is unclear how public support is obtained but the implication in associated academic papers is that there is a voting process to establish it.

Note that the first green box “Marijuana is not harmless” is the result of another argument, a5:

Argument a5, in turn, is composed of 4 statements, two of which are supported and two not supported, creating a weight of 0.5. Since 0.5 is the threshold for acceptance, we admit the “Marijuana is not harmless” as a green box in further assumptions.

The second box “Legalization would not be economical” is similar and is the result of argument a7:

The only supported assumption here is that “Legalization would create an economic burden for regulating the drug”. The supported assumptions are shown at the bottom of the YAML file above.

The third box, the statement “Legalization would not reduce crime”, has no argument to support it and was not admitted as an assumption when reviewed by the public.

The fourth box, the statement “Legalization would not reduce the consumption of marijuana” is the result of argument a3:

Within this argument, the statement “Legal drugs, like alcohol and tobacco, are used more than illegal drugs” is supported by argument a4:

Here the supporting assumption, viewed favorably by the public, is that “52 percent of Americans 12 or older used alcohol, 27 percent used tobacco and only 7 percent used marijuana”.

The fifth and last box, “Legalization would entail heavy social costs”, is not supported by the adjoining arguments. In fact, the opposite view is supported (see below).

The other side of this argument, a1, is also displayed, that “Marijuana should be legalized”:

This argument is not supported since most of its input sub-arguments/assumptions are red. The only green one is “Legalization would not entail heavy social costs” which in turn is supported by argument a6:

The assumptions which feed this argument were admitted by the public and can be seen at the end of the YAML file above.

We will not go over in detail the rest of the arguments/assumptions that feed a1 (the red ones) since we’ve already covered the details of how this is done for argument a2.

Carneades provides very little documentation (no user manual) and the last modification of the codebase occurred 2 years ago. Each new version appears to be a rewrite, subject to funding. Of the two project leaders, one is deceased.

The emphasis in Carneades is legal argumentation, for which there is a clear Pro and Con position. It does not have a mechanism to detect logical fallacies in informal arguments, although the mechanism of public review could presumably be extended to perform this function. Needless to say, it does not attempt to parse and understand natural language.

Legal argumentation can be quite different from how people normally argue. A legal argument has exactly two sides, it must be concluded with a resolution favoring one side or the other, and the resolution is normally through a vote (by a jury or judges). It is easy to see how Carneades fits into this idea. Freeform scientific, philosophical, or political arguments between real people may not fit so neatly into this framework.

However, the notion of argument diagramming, which is principally what Carneades does, is worthwhile for further exploration. Our software could allow for breaking up arguments into their constituent parts, allowing users to add new pieces to the argument (pro, con, or otherwise), challenge statements with questions or refutations, etc. An overall score, based on this user input, could then be assigned to each side of the argument.