By Jill Hubbard Bowman

I. U.S. Copyright Law Basics

Unprotectable Subject Matter [i]

The purpose of U.S. copyright law is to “promote the progress of science and useful arts.”[ii] Granting authors exclusive rights for some uses of their creative expression in their works was meant as an incentive for the creation of more original works. The goal was not to stop the free flow and use of ideas and information. It was to encourage innovation and publication and sharing of ideas broadly. To carefully balance competing interests, the U.S. Copyright Act included basic subject matter restrictions limiting what can be copyrighted. It also limited copyright owners’ monopoly through the flexible “fair use” doctrine, which allows use of copyrighted works in some circumstances.

No copyright protection for facts, ideas, concepts, math, or methods.

U.S. copyright law does not protect: [iii]

facts or information
ideas, concepts, or principles
discoveries or inventions
mathematical formulas and symbols
letters, words, short phrases, names, and titles
processes or procedures
systems or methods of operation
functional format or layout.

Anyone can use general factual information.[iv]

The general information and abstract concepts in a news article, book, photograph, or illustration are not copyrightable. But there’s an information lock-in problem with a copyright because it prevents unauthorized copying of the creative expression describing information. To address this type of issue, the U.S. Copyright Act has an open-ended exception to copyright infringement called “fair use” that limits the scope of a copyright monopoly. This doctrine tries to strike a careful balance between allowing a copyright to restrict some uses of creative expression and allowing unprotected information to be used freely.

Short phrases and single words are not copyrightable.

It follows that word tokens, which are words or subsets of words, are not protected by copyright. For large language models, like ChatGPT, text from training datasets is broken down into word length tokens. LLMs are trained on trillions of tokens. The tokens are converted into numbers. Mathematical calculations are done on the tokens to alter the numbers (weights) to reflect statistical relationships between the numbers. These numbers are continually updated during training. When generating output, an input prompt from a user is broken down into tokens. The model calculates the relationships between tokens to determine the meaning of words and concepts and to predict the next word in a sentence for the prompt.

Descriptive meta data is likely not copyrightable.

A data label, whether written by a human or computer, that identifies or factually describes an object is conveying information and is not sufficiently creative to be copyrightable.

APIs may be protectable by copyright.

Short words and phrases are not supposed to be copyrightable. But somehow, in the Oracle v. Googlecopyright case, Oracle convinced the Federal Circuit Court of Appeals that application programing interfaces (APIs) should be protected by copyright.[v] This is still good law. Google appealed to the U.S. Supreme Court and argued that APIs were too short and functional for copyrightability. The U.S. Supreme Court refused to address API copyrightability but decided the copyright infringement case in favor of Google on fair use grounds.[vi]

Some AI model licenses are specifically to APIs, which are copyrightable.

Grammar rules and syntax are not copyrightable. Language rules for the structure of words or how words or phrases should be arranged (syntax) is not copyrightable subject matter or creative expression.

An AI algorithm is not copyrightable.

An algorithm is a process or method, which is not copyrightable subject matter. When an AI developer uses the term “algorithm,” however, she may be referring to one or more types of technology, including the process for training a model, the trained model, the software for training a model, or the process used by the model. (See AI Key Terms in Resources at AI Law Maze Map for more information on definitions). What’s confusing is that the code expressing the algorithm may be copyrightable (if it is creatively written by a human and not dictated by technical considerations) but the “algorithm” underlying the software is not. [vii]

Creative computer code, but not functional aspects, may be copyrightable.

AI software, including code for pre-processing, processing, and optimizing data or models in the processing pipeline, may be copyrightable if it includes creative expression written by a human.

In 1980, Congress amended the U.S. Copyright Act to allow computer programs to be copyrightable as literary works. The Copyright Act defines “computer program” as “a set of statements or instructions to be used directly or indirectly in a computer in order to bring about a certain result.”[viii] There is an inherent contradiction with subject matter eligibility because computer code is essentially a method of operating a computer. This conundrum is mostly ignored. Courts, however, doing a copyright infringement assessment struggle because they must filter out many unprotectable elements and determine the potentially infringed creative expression in a computer program. There is no protection for purely functional aspects of a computer program including the algorithm, logic, or the program design. There is also no protection for HTML generated by website design software.[ix]

Other Protection Requirements

U.S. copyright only protects human, creative expression in works of authorship.

To qualify for protection, in addition to being copyrightable subject matter, a literary work must also be: [x]

original, creative expression
authored by a human, and
in a form that can be copied.

In the U.S., only original, creative expression of ideas, concepts, and processes may be protected-not the underlying ideas, concepts, or processes that are described or explained. Not every word or phrase of a literary work is creative expression. Certainly, information about the technical structure of a word is not protected. This idea/expression dichotomy is a very, very confusing. Drawing the line between the idea and the creative expression of the idea is difficult.

Copyright protection arises automatically when a literary work is fixed.

A work is fixed when it exists in a tangible medium. For example, when a software programmer writes code on a computer, it is fixed.

Database copyrights are “thin” and protect the selection and arrangement of data.

The U.S. does not have explicit database rights like the European Union and has rejected the “sweat of the brow” theory for protection. The U.S. requires human creativity for copyright protection for databases. Under U.S. copyright law, a database for use in a computer may be a compilation “formed by the collection and assembling of preexisting materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.”[xi]

There is no protection if the selecting and arranging is done mechanically by a computer, or it is obvious and common in the industry. The database compilation copyright protection does not extend to the pre-existing data in the database. Some databases may contain data, like photographs taken by humans, that may have their own individual copyrights because the works independently meet the copyrightability requirements. In contrast, some data in a database is unprotectable subject matter.[xii] Some data is also unprotectable because it was generated autonomously by machines and lacks human creativity. According to the U.S. Supreme Court, everyone is free to use uncopyrighted data in a database. Copyright infringement only occurs if the way the database data was selected and arranged was copied.[xiii] Numerous courts have found that wholesale takings of unprotected data from compilations is non-infringing.[xiv] Several courts have found that copying an entire database to obtain unprotectable data was intermediate copying and fair use.[xv] Essentially, a compilation copyright for a database in the U.S. is almost worthless.

Navigation tip: Don’t count on copyright law to protect your database in the U.S. Instead, use contracts and Terms of Use to control access and use of databases.

Software that can only be written one way isn’t copyrightable.[xvi]

Under the merger doctrine, copyright protection does not apply if an underlying idea can only be written in a limited number of ways. This prevents monopolization of ideas. If software code can only be written one or a limited number of optimal ways to carry out a process, the idea has merged with the expression and the code isn’t copyrightable.

Standard common expressions are not copyrightable.

Under the doctrine scènes à faire, standard, commonly used elements can’t be protected by copyright. For example, widely used programming techniques, hardware design standards, or code dictated by specifications of the computer or efficiency aren’t protectable. Similarly, common syntax is not copyrightable.

Artistic style is likely not copyrightable.

Some creative artists are concerned about the ability of generative AI to create new works in a style similar to their works. Although the line between information and expression is a not a bright line, copyright law has never stretched to the point of protecting “style” as a vague expression disembodied from a specific copyrightable work. The substantial similarity infringement test would be challenging if not impossible to conduct between one or more copyrighted works and new AI output that is in the same “style” but isn’t substantially similar to any existing registered work. Other laws like the right of publicity or unfair competition are more suited to protecting artists from uses of generative AI to mimic a specific artist.

Only humans can be authors.

A monkey talking a selfie,[xvii] a non-human spiritual being,[xviii] an autonomous car photographing road signs, and an AI model creating beautiful art works can’t be authors and own copyrights. The U.S. Copyright Office refuses to register works listing AI as an author or co-author.

AI output may not be copyrightable.

The U.S. Copyright Office in recent guidance has clearly stated that human authorship, creativity, and control is required for copyrightability of AI output. Some AI systems, including a few complex computer visions systems, are tightly controlled by developers. The output may be a data container that includes numerous creative choices by the developers, which is likely copyrightable.

The output from many generative AI models, however, is created by a computer with little input or control from a human, which means it isn’t copyrightable. The Copyright Office recently determined that an image created by generative AI is not a copyrightable work.[xix] Even a complicated text prompt for a generative AI system may not be sufficient to deem the human the author of the resulting output if the human does not control how prompts are interpreted by the AI system or how creative output is generated. Interestingly, prompts may be sufficiently creative to be copyrightable but not the resulting output.

The Copyright Office says it will consider whether the traditional elements of authorship were created “by man or a machine” on a case-by-case basis. It further states: “When an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship. As a result, that material is not protected by copyright and must be disclaimed in a registration application.”^[xx]

AI Generated Computer Code

Millions of software programmers are now using AI systems, like GitHub Copilot, to generate new computer code. This AI generated code likely isn’t copyrightable under the U.S. Copyright Office standards. But human generated code and the human modifications to the AI code are copyrightable. In practice, however, the speed of AI output code adoption and incorporation into the main code base during development with a generative AI tool may make it difficult to identify and track uncopyrightable code.

This integration of AI generated code with human code creates new legal uncertainty about the availability of copyright protection and enforcement for the entire software program. A work has to be registered with the Copyright Office for enforcement in U.S. courts.[xxi] The lack of protection could also hurt the commercial viability of the mixed software.

Description and disclaimer of AI computer-generated code is required when filing a U.S. copyright registration application. This isn’t practical or really feasible for mixed human and AI generated code. It’s foreseeable that issues with the AI generated code identification in the copyright application in enforcement litigation could lead to more defenses for defendants and lack of copyright protection, despite an initial copyright registration.

Navigation Tip: Avoid using AI generated code for critical code your company wants to monetize by licensing and distributing. The code will need to be registered with the Copyright Office for enforcement in court in a copyright infringement action if the license is breached.

Determining what is potentially copyrightable in an AI model requires analysis.

What are the components of the model?
Does it use standard, common industry structures or features?
Was the code written by a human or generated by a computer?
What is dictated by technical considerations or efficiency?
Did a human make creative, original choices in the creation of the code?
How much control did the human maintain over the creation of the code?

Creative AI model architecture code written by a human may be copyrightable.

There are many different types of models. Model architecture and the implementing code might be creatively conceived and written by a brilliant AI engineer. But who was the first author? The owner of any copyright would be the original author who made creative choices when writing the code. Only elements that are not dictated by functional and technical considerations or standard in the industry would potentially be copyrightable. Modifications made for technical efficiency or improvement to the code may not be protectable as a copyrightable derivative work. Creative expression, where different choices are possible, is required for protection.

AI model weights may not be copyrightable.

The heart of an AI system is the model weights. Mustafa Suleyman, the CEO of Microsoft AI has called AI model weights “the most valuable IP in the system.”[xxii] But model weights are only “IP” if they are protected by some type of intellectual property right.[xxiii] Computer code executes and interprets the weights. The code for weights, however, is typically functional, not creative human expression, and dictated by technical considerations.

AI neural network models do complex math and make sophisticated statistical guesses. A full model has architecture (about mathematical functions) and weights in computer readable code. Neural network model weights, also called parameters of the algorithmic function, are large strings of numbers. For example, ChatGPT-4 has 1.76 trillion parameters. The numbers for the weights are automatically generated by a computer as the model is trained using deep machine learning algorithms. These numbers reflect statistical information about patterns. Some optimized AI models have the weights in a computer file separated from the other types of model code, APIs, and user interfaces, code which may have been creatively written by a human and potentially copyrightable because of the creative human choices.

For large language models, words in training datasets are changed into tokens–numerical vectors that represent parts of a word, their meaning, and relationship to other words in a vocabulary. It’s valuable information in a numerical format. The model reads the tokens as it learns and updates its parameters based on the accuracy of its predictions. The parameters reflect the concepts in the tokens and patterns about the statistical relationships of the tokens. The model is only storing the updated parameters–not the tokens or the training dataset.

In the U.S., the human authorship requirement coupled with the requirements for human creative expression and control of the output means that computer generated computer code for AI model weights may not be copyrightable. The U.S. Copyright Office will not register works “produced by a machine or mere mechanical process” without creativity from a human author. It would be difficult to find human creativity in numbers describing information and concepts about patterns in language or images. In SouthCo Inc. v. Kanebridge Corp, the Third Circuit Court of Appeals rejected copyrightability of a catalog of descriptive machine part numbers dictated by the logic of the numbering system where the numbers expressed the characteristics of the parts. The court found “the numbers themselves are generated by a mechanical application of the rules and do not reflect even a spark of creativity. [xxiv] Even in the U.K., where computer generated computer code may be copyrightable, commentators in the treatise The Law of Artificial Intelligence are skeptical that AI models have sufficient human protectable expression, especially since the neural network is “dictated by technical considerations, rules or constraints.”[xxv]

Navigation Tip: Consider whether other parts of the model like the architecture, APIs, or user interfaces have been written by a human making creative choices. Consider whether other types of IP rights related to the AI model exist, like trade secret rights or patent rights. Consider functional license grants and contractual restrictions for controlling uses of an AI model.

Endnotes

[i] Copyright law varies in significant ways around the world. The answers to the existence of copyright, scope of permissible use, and defenses to infringement also vary by country.

[ii] U.S. Const. art. I, sec. 8, cl. 8 (Congress has the power “[t]o promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”).

[iii] 17 U.S. Code § 102(b) (“In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery”). https://www.copyright.gov/circs/circ33.pdf.

[iv] Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991).

[v]Oracle Am., Inc. v. Google Inc., 872 F. Supp. 2d 974 (N.D. Cal. 2012); reversed and remanded, 750 F.3d 3303 (Fed. Cir. 2014); cert. denied, 135 S. Ct. 2887 (2015) http://cafc.uscourts.gov/sites/default/files/opinions-orders/13-1021.Opinion.5-7-2014.1.PDF.

[vi]Google LLC v. Oracle America Inc., 141 S. Ct. 1163 (2021) https://www.supremecourt.gov/opinions/20pdf/18-956_d18f.pdf.

[vii] Other IP rights may protect an algorithm. Trade secret rights may protect a secret, protected algorithm that has actual or potential economic advantage. Patent protection may be possible for an algorithm performed by software after a prosecution process in the USPTO, which usually takes years.

[viii] 17 U.S.C. § 101.https://www.govinfo.gov/content/pkg/USCODE-2011-title17/pdf/USCODE-2011-title17-chap1-sec101.pdf.

[ix] https://www.copyright.gov/circs/circ61.pdf.

[x] https://www.copyright.gov/circs/circ33.pdf.

[xi] 17 U.S.C. § 101. Copyright Registration for Automated Databases, Appendix A, https://www.copyright.gov/reports/appendix.pdf

[xii] Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991) (one of the most important Supreme Court decisions on copyrightability of compilations finding a telephone directory uncopyrightable).

[xiii] Id. at 349 and 350.

[xiv] Report on Legal Protection of Databases, p.19 https://www.copyright.gov/reports/db4.pdf.

[xv] Assessment Technologies of WI, LLC v. Wiredata, Inc., 350 F.3d 640, 643 (7th Cir. 2003) (holding that the plaintiff had no protectable interest in the data contained in its copyrighted database and copying the entire database would be intermediate copying and fair use); PhantomALERT, Inc. v. Google Inc., Case No. 15-cv-03986-JCS (N.D. Cal. Dec. 14, 2015) (Waze copied the points of interest data about traffic conditions, route information, and speed traps from a competing GPS navigation app provider. The court found that the data was not copyrightable and therefore copying “raw data” was not infringement).

[xvi] Zalewski v. Cicero Builder Dev., Inc., 754 F.3d 95, 103 (2d Cir. 2014).

[xvii] Naruto v. Slater, 888 F.3d 418, 426 (9th Cir. 2018) (finding that a monkey is not entitled to register a copyright for a photo taken by the monkey).

[xviii] Urantia Found. v. Kristen Maaherra, 114 F.3d 955, 957–59 (9th Cir. 1997) (holding that “some element of human creativity must have occurred in order for the Book to be copyrightable” because “it is not creations of divine beings that the copyright laws were intended to protect”).

[xix] U.S. Copyright Office Review Board, Decision Affirming Refusal of Registration of a Recent Entrance to Paradise at 2–3 (Feb. 14, 2022), https://www. copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf (determining a work “autonomously created by artificial intelligence without any creative contribution from a human actor” was “ineligible for registration”).

[xx] https://copyright.gov/ai/ai_policy_guidance.pdf.

[xxi] Some plaintiffs have found out the hard way when their copyright infringement complaint was dismissed by the court for lack of copyright registration of the allegedly infringed works.

[xxii] Suleyman Mustafa, The Coming Wave: Technology, Power, and the Twenty-first Century’s Greatest Dilemma, 2023 Crown Publishing Group (p. 304).

[xxiii] In the U.S., AI model information might be a trade secret if protected and kept secret.

[xxiv] Southco, Inc., Appellant v. Kanebridge Corporation, 390 F.3d 276 (3d Cir. 2004). https://law.justia.com/cases/federal/appellate-courts/F3/390/276/506642/ (finding serial numbers lacked sufficient originality to be copyrightable).

[xxv] The Law of Artificial Intelligence, 8-134, 8-144, 8-145 (discussing copyrightability for computer-generated computer programs).

Warning & Disclaimer: This article is for education only. It is not intended as legal advice. No attorney- client relationship is created or implied with the author.

See AI Law Maze Map for more information.

Jill Hubbard Bowman is an attorney specializing in the intersection of artificial intelligence and the law. She is an intellectual property attorney with over 25 years of legal experience. For many years, Jill has advised companies on access, control, and monetization of AI technology in connected and autonomous cars and internet-of-things systems in factories, retail, and healthcare. Jill has also provided product counsel for AI applications and web platforms for AI development. She has helped companies assess complex legal risks and make smart business decisions to maximize the benefits of AI technology. Jill has also worked with corporate strategy and execution teams on ethical design and deployment of AI technology. Jill was previously Associate General Counsel, Intellectual Property for Intel Corporation and Mobileye Vision Technologies. Jill was also an IP litigation attorney at Brinks Hofer Gilson & Lione (Chicago) and Wilson Sonsini Goodrich & Rosati (Palo Alto and Austin). Jill is a registered patent attorney. Jill has her J.D. from the University of Michigan Law School.

This AI Law Maze Map blog is for education only. It is not intended as legal advice.

By using this website and information, you acknowledge and agree that no attorney-client relationship is created or implied.

U.S. Copyright Law Basics

I. U.S. Copyright Law Basics

Unprotectable Subject Matter [i]

Other Protection Requirements

AI Generated Computer Code

Endnotes

Sign up for our newsletter