Skip to content

GPT-3 and ChatGPT: Disrupting LegalTech?

[1] “Disclaimer: this text is written by a human being, without the help of any generative large-language model. The author did use automated tools to check spelling, grammar, clarity, conciseness, formality, inclusiveness, punctuation conventions, sensitive geopolitical references, vocabulary, synonym suggestions, and occasionally hit tab on good predictions by MS-Word.”

 

This month, OpenAI introduced ChatGPT, a new large-language model based on the latest version of GPT-3 capable of generating text, including coding, rhyming poetry, drafting essays, and even passing exams. ChatGPT also proved to be able to respect ethical boundaries and demonstrated the capacity to produce text of a high caliber with an authoritative tone.

Nevertheless, occasionally ChatGPT and GPT-3 provide advice that is either nonsensical or not factual, which presents significant risks in legal applications.

To responsibly employ these models, an understanding of its makeup and restrictions is essential. In this article “ChatGPT: the Next Step in the Natural Language Processing (NLP) Revolution? Or is it not”, I have written an extensive blog on how the GPT-3 models (on which ChatGPT is based) work.

GPT-3 in the Legal World

But let us start with what you may not yet know: GPT-3 and other large-language models are already extensively used in the legal world:

  • If you wish to protest a traffic fine, fight corporations, beat bureaucracy, or just sue anybody, try this app: it will use GPT’s models to generate a protest letter: https://donotpay.com/
  • How about all these LegalTech blogs, LinkedIn posts and web pages describing advantages of legal tech. You never know how much of this was written by GPT models.
  • The Search Engine Ranking, making sure that company X is #1 Google list in your area if you search for LegalTech software.
  • The sales pitch you get in your mailbox from the latest and greatest LegalTech company. Are you 100% sure that was not written using this app: https://www.gong.io/.
  • The intern that just wrote an extended essay on a legal problem studied, are you sure that was not partly written by GPT-3?
  • How do you know the source code from that hot-new LegalTech application is not partly generated using GPT-3?

Currently, there are thousands of start-ups using GPT-3’s models for a variety of applications, including LegalTech.

After this month’s introduction of the new large-language models by OpenAI, we are all so impressed by GPT-3’s and ChatGPT’s performances, that one may believe that they will radically change the legal profession, if not replace a significant part of legal professionals. Is that really the case?

Hereunder, I would like to discuss a few examples of such legal applications using GPT-3. In all cases, the Text-Davinci-003 model was used, this is the most advanced and most recent language model in GPT-3.

Is GPT-3 a Replacement for Case Law Search Engines?

GPT-3 can be used to ask questions on case law. Here is an example on expected jail time for financial fraud in the State of New York. GPT-3 was prompted with a request to provide an estimate, including 10 case law references.

Prompt: What is the expected jail time or fine for financial fraud in the state of NY. Give 10 case law references.

GPT-3:
The exact jail time or fine for financial fraud in the state of NY will depend on the specific details of the case and the severity of the fraud. However, some examples of cases that have been decided in the state of New York are provided below for reference.

  1. People v. Green, 8 N.Y.3d 677 (2007): Defendant received a sentence of 15 years in prison and a $30,000 fine for grand larceny and money laundering related to a real estate fraud scheme.
  2. People v. Basciano, 9 N.Y.3d 592 (2007): Defendant received a sentence of 25 years in prison and a $3.5 million restitution order for a fraud and money laundering scheme.
  3. People v. Schleifer, 16 N.Y.3d 740 (2011): Defendant received a sentence of 15 years in prison and a $2 million fine for securities fraud.
  4. People v. Ghazi, 17 N.Y.3d 856 (2011): Defendant received a sentence of 8 to 24 years in prison and a $2.5 million fine for grand larceny and fraud related to a real estate scheme.
  5. People v. Siegel, 24 N.Y.3d 902 (2014): Defendant received a sentence of 5 to 15 years in prison and a $3.3 million restitution order for fraud related to an investment scheme.
  6. People v. Gropper, 25 N.Y.3d 883 (2015): Defendant received a sentence of 6 to 12 years in prison and a $2 million fine for securities fraud.
  7. People v. Shmulevich, 28 N.Y.3d 934 (2017): Defendant received a sentence of 6 to 18 years in prison and a $2 million fine for fraud related to a securities scheme.
  8. People v. Seabrook, 33 N.Y.3d 961 (2018): Defendant received a sentence of 5 to 15 years in prison and a $2.5 million fine for fraud and bribery related to a city contract.
  9. People v. Harris, 46 N.Y.3d 1 (2018): Defendant received a sentence of 12 to 24 years in prison and a $3.5 million fine for fraud related to a real estate investment scheme.
  10. People v. McKelvey, 56 N.Y.3d 887 (2019): Defendant received a sentence of 5 to 15 years in prison and a $2 million fine for securities fraud.

Impressive! So, is GPT-3 a replacement for my search engine? Not really, you never know if what it returns is factual. This new model could write perfectly formulated texts, with a strong tone of authority, but sometimes, it will include complete non-sense in its recommendations or discussions. We never know when this is the case.

This is because the GPT models do nothing else than to repeat statistical sequences of human language and other internet content to which they have been exposed during training. Using the right prompt, we can “guide” this process into a specific direction. GPT was never designed to stick only to the facts, it was also designed to be a creative text generator.

In some legal applications (such as criminal defense), creativity may be a virtue. But this is a different type of creativity than the one that GPT-3 exposes. The market for lawyers showing the same linguistic creativity as GPT-3 is probably small.

Another examples of GPT-3 non-factual behavior can be observed when asking contradictory questions. GPT-3 would then (confidently) provide contradictory outcomes of opposing polarity. Very much like a friend telling you what you want to hear. GPT-3 will not tell you which one is the best, let alone the optimal legal solution. See the examples at the end of this blog.

Finally, GPT-3 is trained with internet content until the end of 2021: all case law from 2021 onwards, will not be known to the model.

So: no, it is not a search engine.

GPT-3: a contract drafting tool?

GPT-3 is an auto-generative model, trained to generate a variety of texts. It has been used to draft contract clauses, legal letters, and even entire legal agreements. Let us give that a try. Hereunder, GPT-3 is asked to draft an NDA under the laws of the State of New York:

Prompt: Draft an NDA NON-DISCLOSURE AGREEMENT of 1000 words under the jurisdiction of the state of New York, USA.

GPT-3:

NON-DISCLOSURE AGREEMENT

This Agreement is made between [Name of Disclosing Party] (“Disclosing Party”) and [Name of Receiving Party] (“Receiving Party”) on this [date].

WHEREAS, the Disclosing Party desires to disclose certain confidential information (“Confidential Information”) to the Receiving Party;

NOW, THEREFORE, in consideration of the disclosure of the Confidential Information to the Receiving Party, and other good and valuable consideration, the receipt and sufficiency of which is hereby acknowledged, the Disclosing Party and the Receiving Party agree to the following terms:

  1. Disclosure of Confidential Information

The Disclosing Party agrees to disclose certain Confidential Information to the Receiving Party. The Confidential Information may include, but is not limited to, trade secrets, processes, systems, technology, pricing information, financial information, customer information, and other confidential information related to the Disclosing Party’s business.

  1. Restrictions on Use of Confidential Information

The Receiving Party agrees to not use any of the Confidential Information disclosed by the Disclosing Party for any purpose other than the purpose agreed to by the Disclosing Party and the Receiving Party. The Receiving Party agrees not to disclose any of the Confidential Information to any third party, or use any of the Confidential Information to directly or indirectly compete with the Disclosing Party.

  1. Time Period

The restrictions on the use and disclosure of the Confidential Information shall remain in effect for a period of [number of years] years from the date of this Agreement.

  1. Ownership of Confidential Information

The Disclosing Party retains all right, title, and interest in the Confidential Information disclosed under this Agreement.

  1. Non-Solicitation

The Receiving Party agrees not to solicit any employees of the Disclosing Party for the purposes of hiring or recruiting them for employment during the term of this Agreement, and for a period of [number of years] years thereafter.

  1. Non-Circumvention

The Receiving Party agrees not to circumvent the Disclosing Party in any business dealings with any third parties, and shall not use any of the Confidential Information disclosed by the Disclosing Party to do so.

  1. Liability for Breach

If the Receiving Party breaches any of the terms of this Agreement, the Receiving Party shall be liable for any damages, costs, and expenses resulting from the breach.

  1. Governing Law

This Agreement shall be governed by and construed in accordance with the laws of the State of New York.

IN WITNESS WHEREOF, the parties have executed this Agreement on the date first above written.

[Name of Disclosing Party]

[Name of Receiving Party]

Oeps, so GPT-3 is an intelligent contract drafting tool. There go all the junior lawyers’ jobs.

Does it really? For simple tasks, many people already search the internet to look for contract templates. Without fully understanding the legal meaning, they use the one that looks best and adopt it to their needs. These “internet templates” are indeed the contracts on which GPT-3’s answer is based. On more complex legal tasks (unless there are many examples on the internet available for download), it is doubtful if GPT-3 could write a full agreement.

When provided with the right prompt, it could assist lawyers drafting individual contract clauses, but then there are many (proven and legally defensible) legal document management tools that can do the same, based on (pre-approved) libraries of contract and clause templates.

GPT-3 for Legal Advice

Maybe GPT-3 can also give me advice on what steps I need to take to navigate though a complex legal process such as immigrating to the USA:

Prompt:

What are the steps required to immigrate to the USA from the Netherlands?

How many people immigrate to the USA from the Netherlands per year?

What are the odds that my immigration process succeeds?

How much will it cost me?

GPT-3:

  1. Determine Eligibility: The first step for anyone looking to immigrate to the USA from the Netherlands is to determine their eligibility. This can be done by researching the different visa categories available and understanding the requirements for each.
  2. Gather Required Documents: After determining eligibility, applicants must then gather the required documents and evidence needed to support their application. This will include any proof of financial stability, educational qualifications, and other evidence needed to support the application.
  3. Submit Application: Once the applicant has gathered all the necessary documents, they must then submit their application to the United States Citizenship and Immigration Services (USCIS) with the appropriate fees.
  4. Wait for Decision: After submitting the application, the applicant must wait for a decision from the USCIS regarding their application. This can take a few months or more depending on the complexity of the case.
  5. Attend Interview: If the application is approved, the applicant will then be required to attend an interview with the USCIS. This is to ensure that all the information provided in the application is accurate and that the applicant is eligible to enter the United States.
  6. Receive Decision: After the interview, the USCIS will provide the applicant with a decision regarding their application. If approved, the applicant will receive their visa and then proceed to the next step.

Answer to How many people immigrate to the USA from the Netherlands per year?: According to the latest data from the Department of Homeland Security, approximately 1,100 people immigrated to the United States from the Netherlands in 2019.

Answer to What are the odds that my immigration process succeeds?: The success rate of immigration processes varies depending on the individual case. Generally, the more complete and accurate the application is, the better the odds of success.

Answer to How much will it cost me?: The cost of the immigration process varies depending on the type of visa you are applying for. The USCIS website provides a fee calculator to help estimate the cost of the application.

Help! All my legal advisors will soon be out of work!

Here is another one:

Prompt: write a letter to protest a rent increase under the laws of the State of California.

GPT-3:

Dear [Landlord],

I am writing to protest the recent rent increase you imposed on my unit at [address].

Under California law, rent increases can only be issued once a year and must be within the legal limits of the state. The rent increase you have issued is not within legal limits and is also outside of the one-year period since the last rent increase. This is a violation of California law.

I am asking that you immediately cease the rent increase and return my rent back to the previous amount.

If you do not comply, I will be forced to take legal action and seek compensation for the additional money I have paid due to your illegal rent increase.

I look forward to your response.

Sincerely,

[Your Name]

Really impressive, but here too, the answer is based on blogs and other internet content providing advice on immigration questions. A simple Google search would probably return thousands of blogs and web pages providing similar legal advice.

Validation of Factuality is Lacking for all GPT-3 Generated Content.

In all the above examples, one never knows if the text generated by GPT-3, is (legally) correct. In other words: when factuality is required, one cannot count on GPT-3. OpenAI, the developer of GPT-3 actually acknowledges that and warns for this:

  • Sam Altman, OpenAI’s CEO states on twitter: “It does know a lot, but the danger is that it is confident and wrong a significant fraction of the time”
  • OpenAI warns for the risk of counter-factuals in their blog: “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.”

Here is what others say in less flattering words:

  • Julian Togelius, Associate Professor Department of Computer Science and Engineering Tandon School of Engineering New York University stated: “GPT-3 often performs like a clever student who has not done their reading, trying to bull$&^! their way through an exam. Some well-known facts, some half-truths, and some straight lies, strung together in what [at] first looks like a smooth narrative.” I can definitely confirm this, both in GPT and in students’ answers 😊.
  • MIT’s Technology Review named GPT-3 and other large-language models “Stochastic Parrots”. They named Galactica, a large-language model developed by Meta (Facebook): “Like all language models, Galactica is a mindless bot that cannot tell fact from fiction.” Recently adding to this: “But increasingly, the output these models generate can easily fool us into thinking it was made by a human. And large language models in particular are confident bullshitters: they create text that sounds correct but in fact may be full of falsehoods.”
  • WIRED published a great article on this: ChatGPT’s Fluent BS Is Compelling Because Everything Is Fluent BS. As the state: “The AI chatbot was trained on text created by humans. Of course, its writing is superficially impressive and lacking in substance.”

Because the lack of control over factuality, one should not use GPT-3, or other large-language models, for applications where factuality is required. The fact that GPT uses a strong authoritarian voice and perfect language, only adds to the deception.

Using GPT-3 for advice on legal topics, without human validation, is very risky. One can use GPT to quickly draft a non-disclosure agreement or write a letter to protest a traffic violation, but one should not count that the content is 100% correct.

Other Risks

A full overview of risks using GPT-3 can be found here: GPT-3 and ChatGPT: the Next Step in the Natural Language Processing (NLP) Revolution? Or is it not?

Hereunder, the legally most significant risks are repeated.

Copyright and License Violations

All source-code, text and art generated by models such as GPT-3, are based upon (i) open-source information that has been developed by volunteers for free, and (ii) copyrighted material, or (iii) material specifying that usage is subject to some form of license agreement. Now the question is, where do these legal restrictions end? The first lawsuits have been filed and the outcome is uncertain.

Many companies are already using GPT to assist them writing source code or web content, so do students (who can now not be detected) or artist using DALL-E to generate art for which they are paid. Now ask yourself: what will happen when using GPT is considered a copyright or license violation?

A Detector for GPT-3 Generated Text

GPT-2 is open-source. Analysing the parameters of the model, allows us to detect if text is written by GPT-2. GPT-3 or ChatGPT will not be released as open-source models, so it is impossible to detect if text has been written by GPT-3. OpenAI has a moral obligation to release tooling to detect text generated by GPT-3, but this will undermine their own business model.

What if it would suddenly be possible to detect GPT-3 generated text and Google decides that such content will be ignored (or maybe even punished) for Search Engine Optimization (SEO), then one is in trouble. Same for student thesis that are written with the help of GPT-3.

Negations

Large-Language models such as GPT have solved many linguistic problems, but they still have  problems handling negations. One of the better methods to “trick” the model is confronting it with (double) negations.

This is a tough problem to tackle, as the source of these limitations touches the fundamental intuitions on which word embeddings and self-attention are based on.

Lack of transparency and Explainability, and ultimately Legal Defensibility

We must be aware of the potential and limitations of large-language models, as well as the risks associated with it, to gain both a better scientific comprehension and a clearer picture of its overall effect on society. Transparency is the key starting point to reach these objectives. Especially if Legal Defensibility is required. See also this publications on using AI and Legal Defensibility.

Large language models could be utilized in a wide range of linguistic applications for LegalTech. Every use case has its own set of expectations, and the models need to meet these standards for accuracy, dependability, fairness, and efficiency.

We need to know what these models know, what they do not know, what they can, etc. Therefore, Explainable Artificial Intelligence (XAI) is currently an important topic for research.

Hallucinations

Sometimes, large language models start to hallucinate. Hallucinations is defined as generating words by the model that are not supported by the source input. Hallucinations “invent” text. Deep learning-based generation is prone to hallucinate unintended text. These hallucinations degrade system performance and fail to meet user expectations in many real-world scenarios. We need to understand better when models are hallucinating, when they are likely to start hallucinating, and how we can prevent hallucination.

Time-Capped Understanding of the World

As GPT-3 was trained on texts written until the end of 2021, it also has no clue of anything that happened after that date. Try to ask it anything about recent or upcoming sport events...

Another LegalTech Example: GPT-3 and eDiscovery and Information Governance

Now that we better understand the architecture, capabilities, and limitations of transformers and the GPT-3 models, we can also assess how GPT-3 will do for instance in eDiscovery and Information Governance.

Well, GPT-3 will likely fail in Legal Review for eDiscovery for the following reasons:

  • GPT-3 can ONLY generate what it has seen in training. Deviating or recent content in the email or other ESI in eDiscovery will be unknown to it.
  • Suppose we would use email from a particular case to fine-tune GPT-3. Would that work? Well, this will require to lemmatize and POS tag all of them (needed for GPT-3 input) and then feed it in the system for basic classification. But you will never have 100% recall as it will not be able to classify totally new concepts. It will also be very expensive (about 10k cost to OpenAI for 100 million emails). Let alone the fact that GPT-3 cannot process or enrich data as we do in eDiscovery.
  • GPT-3 is not a search engine. It has no index. All processing is sequential. It will be very slow.
  • GPT-3 is a sub-optimal information extractor. Encoder-only models are much more efficient and of higher quality (e.g., BERT) for Legal Analytics.
  • There is no solid XAI framework to explain or understand exactly what GPT-s is doing. So, legal defensibility using GPT-3 currently is a house of cards.

Where GPT-3 can add value in eDiscovery and Information Governance:

  • Query expansion for search to come up with relevant synonyms.
  • Text generation for the drafting of eDiscovery responds letters.
  • A Legal Chatbot for questions on simple legal problems and advise (the same way one would now search on the internet).
  • Generate Blogs and Social Media posts on LegalTech.
  • Explain what is going on in Assisted Review or Legal Analytics as a form of Explainable Artificial Intelligence (XAI) explaining the system’s decisions in natural language.
  • Help developers write computer code faster.
  • Generate search queries for the ZyLAB or Solar search engine
  • Generate regular expressions for the ZyLAB Insights extraction platform
  • Have fun during your lunch break.

But in all cases, we should validate the content generated by GPT-3. Legal professionals can work faster and more efficiently drafting of simple contract clauses or letters using these tools, in a similar way that software developers use code generators such as Copilot.

But these tools are not suited as replacement of search engines, templates for complex contracts, eDiscovery review, legal analytics, translation, and other LegalTech tooling. Nor will they currently replace lawyers providing legal advice as long as the legal factuality of these models is not guaranteed.

Future Developments

A significant task in LegalTech is understanding and being able to deal with human language. This field of research is called Natural Language Processing (NLP).

Five years ago, there were many problems in NLP for which no significant progress was made. One of them was the reliable generation of proper language, non-distinguishable from humans. GPT-3 has solved that problem for us, and that is a major achievement.

Another problem was to manage a dialog with a computer system, either goal-driven or just chatting for fun. GPT-3 also seems to have addressed that problem with an effective "human in the loop" reinforcement-learning algorithm.

Where Microsoft’s Tay and Meta’s Galactica could still be abused to enter into non-ethical discussions, express biased vision, or be used for hate speech, ChatGPT was able to avoid most of these problems. It was still possible to generate bias in source code, but that was one of the few exceptions they overlooked during the training and by the “Content Moderation Tooling”. These can now easily be fixed (if they are not already).

In general, it looks likes OpenAI developed a method to teach large-language models to keep on track and behave more human aligned than previous models did using "human in the loop" reinforcement-learning and the Content Monitoring Tools.

As long as you ask reasonable and rational questions, the model responds in a similar manner. However, once you start asking nonsense, the sky is the limit. As “one fool can ask more questions in an hour than a wise man can answer in seven years” there is a significant long tail of non-sense questions, it will take time to address all of these. I am confident that OpenAI is already working on this.

One should double check factuality and be careful believing everything that it generates, especially for legal topics.

I would not be surprised if several startups are working on developing a LegalGPT that does understand legal problems better and that sticks more to factuality than the current (general language models) models.

If we can then also provide more transparency and explain what these models know, what they do not know, why they provide certain answers and why not others, then a wider variety of legal applications is in sight. That is also the moment when legal professionals should really get worried.

Appendix:

In this recent article on LinkedIn: GPT-3 and ChatGPT: the Next Step in the Natural Language Processing (NLP) Revolution? Or is it not?, I have written a more detailed explanation on how the GPT models work, how they resolve complex linguistic problems, how they are trained ad what the limitations are. This text also contains citations to the original research papers, but also to more accessible publications from WIRED, MIT Technology Review, the Atlantic and the Wall Street Journal. 

In the two sections hereunder, you can find a short summary. But as Transformers are one of the most complex deep-learning models around, it is impossible to understand them without digging deeper. 

How Transformers (the building blocks for GPT-3) Work

GPT-3 and ChatGPT are constructed of so-called Transformers. Transformers consist of an encoder-decoder architecture designed to deal with complex sequence-to-sequence patterns that are bot left- and right-side context sensitive. Natural Language consists of such patterns. Hence the success of these models on natural language.

Natural Language has many forms of linguistic ambiguity. Transformers can address these very effectively by a mechanism called multi-headed self-attention. Hereby, tokens in input sequences are compared to each other to find hidden patterns. This allows them to deal with typical problems of (human) natural language such as morphological variations, synonyms, homonyms, syntactic ambiguity, semantic ambiguity, co-references, pronouns, negations, alignment, intent, etc.

By using multiple layers and multiple (parallel) heads, transformers implement this process very efficiently and effectively. They stack for instance 96 layers (in GPT-3 largest models) of self-attention on top of each other and show a transformer large quantity of textual data or program code, where the model encodes the input words and learns to predict what comes next after such words.

It has been shown that these layers capture different level of linguistic complexity and ambiguity, starting with the punctuation, to morphology, to syntax, to semantics, to more complex relations. In other words, these models automatically discovered the traditional NLP pipeline, just from being exposed to language.

BERT and the GPT models

Full encoder-decoder models (such as T5) are very CPU intensive and hard to train. That is why there are also encoder-only and decoder-only models.

Encoder-only models are classifiers. So, it no longer learns to predict the next sentence, but it learns to predict a classification problem. This is how BERT works. Great for sentiment mining, Part of Speech tagging, Named Entity Extraction, and many other linguistic classification techniques.

Decoder-only models are text generators. GPT-3 is a decoder-only architecture, trained on tons of textual data. It does not encode; it can only decode based on everything it has seen. This is all Wikipedia data, books from unpublished authors, and other (copyright free) textual data.

Interestingly, GPT-3 has seen so much data, that it can also be used for simple classification problems, sentiment mining or named entity extraction problems (often based on zero shot or few shot learning to start the decoding process in the right direction). However, it will always be sub-optimal compared to BERT or full encoder-decoder models. GPT-3 can also translate because it has seen many examples of translated text. Here too, the result will be sub-optimal compared to full encoder-decoder models.

This is quite dangerous: as GPT-3 initially performs reasonable on such tasks, one tends to continue fine-tuning the prompt to perform such tasks better, but ultimately one will likely fail and waste time and resources trying.