On-demand webinar: navigating the challenges to AI success

The AI glossary

This glossary has been curated by data scientists and machine learning experts like you.

The Appen artificial intelligence glossary

To help those who are just learning about the nuances of AI, we have developed the below Artificial Intelligence Glossary, a list of words and terms which can help prepare you for when AI starts to become a part your everyday conversations.

More than just robots seeking to terminate or games looking to self-engage in a challenge versus humans, artificial intelligence (AI) is the application of complex programmatic math in which the outcome, combined with high quality training data, becomes the technological advances we see occurring in our everyday lives. From self-driving cars to finding cures for cancer, artificial intelligence applied in the real world is becoming a way of life.


A/B Testing

A controlled, real-life experiment designed to compare two variants of a system or a model, A and B.

Activation Function

In the context of Artificial Neural Networks, a function that takes in the weighted sum of all of the inputs from the previous layer and generates an output value to ignite the next layer.

Active Learning (Active Learning Strategy)

A special case of Semi-Supervised Machine Learning in which a learning agent is able to interactively query an oracle (usually, a human annotator) to obtain labels at new data points.


An unambiguous specification of a process describing how to solve a class of problems that can perform calculations, process data and automate reasoning.


A metadatum attached to a piece of data, typically provided by a human annotator.

Area Under the Curve (AUC)

A methodology used in Machine Learning to determine which one of several used models have the highest performance by measuring the area under the receiver operating characteristic (ROC) curve.

Artificial Intelligence

A broad concept encompassing machine learning, natural language processing, and other techniques, aiming to simulate human intelligence in machines.

Artificial Neural Networks

An architecture composed of successive layers of simple connected units called artificial neurons interweaved with non-linear activation functions, which is vaguely reminiscent of the neurons in an animal brain.

Association Rule Learning

A rule-based Machine Learning method for discovering interesting relations between variables in large data sets.


A type of Artificial Neural Network used to produce efficient representations of data in an unsupervised and non-linear manner, typically to reduce dimensionality.

Automated Speech Recognition

A subfield of Computational Linguistics interested in methods that enables the recognition and translation of spoken language into text by computers.


Backpropagation (Backpropagation Through Time)

A method used to train Artificial Neural Networks to compute a gradient that is needed in the calculation of the network’s weights.


The set of examples used in one gradient update of model training.

Bayes’s Theorem

A famous theorem used by statisticians to describe the probability of an event based on prior knowledge of conditions that might be related to an occurrence.

Bias (Inductive Bias, Confirmation Bias)

Inductive Bias: the set of assumptions that the learner uses when predicting outputs given inputs that have not been encountered yet.

Confirmation Bias: the tendency to search for, interpret, favor, and recall information in a way that confirms one’s own beliefs or hypotheses while giving disproportionately less attention to information that contradicts it.

Bias-Variance Tradeoff

A conflict arising when data scientists try to simultaneously minimize bias and variance, that prevents supervised algorithms from generalizing beyond their training set.


A Machine Learning ensemble meta-algorithm for primarily reducing bias and variance in supervised learning, and a family of Machine Learning algorithms that convert weak learners to strong ones.

Bounding Box

The smallest (rectangular) box fully containing a set of points or an object.



A computer program or an AI designed to interact with human users through conversation.


The task of approximating a mapping function from input variables to discrete output variables, or, by extension, a class of Machine Learning algorithms that determine the classes to which specific instances belong.


In Machine Learning, the unsupervised task of grouping a set of objects so that objects within the same group (called a cluster) are more “similar” to each other than they are to those in other groups.


A potential issue arising from the fact that a system cannot infer anything for users or items for which it has not gathered a sufficient amount of information yet.

Collaborative Filtering

A method used in the context of recommender systems to make predictions about the interests of a user by collecting preferences from a larger group of users.

Computer Vision

The field of Machine Learning that studies how to gain high-level understanding from images or videos.

Confidence Interval

A type of interval estimate that is likely to contain the true value of an unknown population parameter. The interval is associated with a confidence level that quantifies the level of confidence of this parameter being in the interval.


A human worker providing annotations on the Appen data annotation platform.

Convolutional Neural Network (CNN)

A class of Deep, Feed-Forward Artificial Neural Networks, often used in Computer Vision.

Central Processing Unit (CPU)

The electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output operations specified by the instructions.

Cross-Validation (k-fold Cross-Validation, Leave-p-out Cross-Validation)

A collection of processes designed to evaluate how the results of a predictive model will generalize to new data sets.

– k-fold Cross-Validation
– Leave-p-out Cross-Validation


Data (Structured Data, Unstructured Data, Data augmentation)

The most essential ingredient to all Machine Learning and Artificial Intelligence projects.

Unstructured Data: raw, unprocessed data. Textual data is a perfect example of unstructured data because it is not formatted into specific features.

Structured Data: data processed in a way that it becomes ingestible by a Machine Learning algorithm and, if in the case of Supervised Machine Learning, labeled data; data after it has been processed on the Appen data annotation platform.

Data Augmentation: the process of adding new information derived from both internal and external sources to a data set, typically through annotation.

Decision Tree

A category of Supervised Machine Learning algorithms where the data is iteratively split in respect to a given parameter or criteria.

Deep Blue

A chess-playing computer developed by IBM, better known for being the first computer chess-playing system to win both a chess game and a chess match against a reigning world champion under regular time controls.

Deep Learning (Deep Reinforcement Learning)

A broader family of Machine Learning methods based on learning data representations, as opposed to task-specific algorithms. Deep Learning can be supervised, semi-supervised or unsupervised.

Dimensionality (Dimensionality Reduction, Curse of Dimensionality)

Dimensionality Reduction: the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Also see Feature Selection.

Curse of Dimensionality: phenomena that arise when analyzing and organizing data in high-dimensional spaces due to the fact that the more the number of dimensions increases, the sparser the amount of available data becomes.


Embedding (Word Embedding)

One instance of some mathematical structure contained within another instance, such as a group that is a subgroup.

Ensemble Methods

In Statistics and Machine Learning, ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models but typically allows for a much more flexible structure to exist among those alternatives.


The average amount of information conveyed by a stochastic source of data.


In the context of training Deep Learning models, one pass of the full training data set.


Feature (Feature Selection, Feature Learning)

A variable that is used as an input to a model.

Feature Learning

An ensemble of techniques meant to automatically discover the representations needed for feature detection or classification from raw data.

False Positive

An error due to the fact a result did reject the null hypothesis when it shouldn’t have.

False Negative

An error due to the fact a result did not reject the null hypothesis when it should have.

Feed-Forward (Neural) Networks

An Artificial Neural Network wherein connections between the neurons do not go backward or form a cycle.


A measure of a model’s accuracy considering both the precision and the recall to compute the score. More specifically, the F-Score is the harmonic average of the precision and recall, where it reaches its maximal value at 1 (perfect precision and recall) and minimum at 0.


Garbage In, Garbage Out

A principle stating that whenever the input data is flawed, it will lead to misleading results and produces nonsensical output, a.k.a. “garbage”.

General Data Protection Regulation (GDPR)

A regulation in EU law on data protection and privacy for all individuals within the European Union aiming to give control to citizens and residents over their personal data.

Genetic Algorithm

A search heuristic inspired by the Theory of Evolution that reflects the process of natural selection where the fittest individuals are selected to produce offspring of the following generation.

Generative Adversarial Networks (GANs)

A class of Artificial Intelligence algorithms used in Unsupervised Machine Learning, implemented as the combination of two Neural Networks competing with each other in a zero-sum game framework.

Graphic Processing Unit (GPU)

A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the rendering of images thanks to its parallel processing architecture, which allows it to perform multiple calculations simultaneously.

Ground Truth

A piece of information obtained through direct observation as opposed to inference.



Human-in-the-loop (HITL) is a branch of artificial intelligence that leverages both human and machine intelligence to create machine learning models. In a traditional human-in-the-loop approach, people are involved in a virtuous circle where they train, tune, and test a particular algorithm.

Hyperparameter (Hyperparameter Tuning)

A configuration, external to the model and whose value cannot be estimated from data, that data scientists continuously tweak during the process of training a model.

– The process of manually determining the optimal configuration to train a specific model.



A large visual dataset made of 14 million URLs of hand-annotated images organized in twenty-thousand (20,000) different categories, designed for use in visual object recognition research.

Image Recognition

The problem in Computer Vision of determining whether an image contains some specific object, feature, or activity.


The process of making predictions by applying a trained model to new, unlabeled instances.

Information Retrieval

The area of Computer Science studying the process of searching for information in a document, searching for documents themselves, and also searching for metadata that describes data, and for databases of texts, images or sounds.


Layer (Hidden Layer)

A series of neurons in an Artificial Neural Network that process a set of input features, or, by extension, the output of those neurons. Hidden Layer: a layer of neurons whose outputs are connected to the inputs of other neurons, therefore not directly visible as a network output.


A new direction within the field of Machine Learning investigating how algorithms can change the way they generalize by analyzing their own learning process and improving on it.


The application of Machine Learning to the construction of ranking models for Information Retrieval systems.

Learning Rate

A scalar value used by the gradient descent algorithm at each iteration of the training phase of an Artificial Neural Network to multiply with the gradient.

Logit Function

The inverse of the sigmoidal “logistic” function used in mathematics, especially in statistics.

Long Short-Term Memory Networks

A variation of Recurrent Neural Network proposed as a solution to the vanishing gradient problem.


Machine Learning

The subfield of Artificial Intelligence that often uses statistical techniques to give computers the ability to “learn”, i.e., progressively improve performance on a specific task, with data, without being explicitly programmed.

Machine Learning Lifecycle Management

DevOps for Machine Learning systems.

Machine Translation

A subfield of computational linguistics that studies the use of software to translate text or speech from one language to another.


A model is an abstracted representation of what a Machine Learning system has learned from the training data during the training process.

Monte Carlo

An approximate methodology that uses repeated random sampling in order to generate synthetic simulated data.

Multi-Modal Learning

A subfield of Machine Learning aiming to interpret multimodal signals together and build models that can process and relate information from multiple types of data.

Multi-Task Learning

A subfield of Machine Learning that exploits similarities and differences across tasks in order to solve multiple tasks are at the same time.


Naive Bayes

A family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independence assumptions between the features.

Named Entity Recognition

A subtask of Information Extraction that seeks to identify and classify named entities in text into predetermined categories such as the names, locations, parts-of-speech, etc.

Natural Language Processing (NLP)

The area of Artificial Intelligence that studies the interactions between computers and human languages, in particular how to process and analyze large amounts of natural language data.

Neural Networks


A unit in an Artificial Neural Network processing multiple input values to generate a single output value.


See Neuron.


Optical Character Recognition

The conversion of images of printed, handwritten or typed text into a machine-friendly textual format.


The selection of the best element (with regard to some criterion) from some set of available alternatives.


The fact that a model unknowingly identified patterns in the noise and assumed those represented the underlying structure; the production of a model that corresponds too closely to a particular set of data, and therefore fails to generalize well to unseen observations.


Pattern Recognition

An area of Machine Learning focusing on the (supervised or unsupervised) recognition of patterns in the data.

Pooling (Max Pooling)

The process of reducing a matrix generated by a convolutional layer to a smaller matrix.

Personally Identifiable Information

Any piece of information that can be used on its own or in combination with some other information in order to identify a particular individual.


The number of correct positive results divided by the number of all positive results returned by a classifier.


The inferred output of a trained model provided with an input instance.


The process of transforming raw data into a more understandable format.

Pre-trained Model

A model, or the component of a model, that have been preliminary trained, generally using another data set. See also: Transfer Learning.

Principal Component Analysis

A process that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components.


The probability distribution that would represent the preexisting beliefs about a specific quantity before new evidence is considered.


Random Forest

An ensemble learning method that operates by constructing a multitude of decision trees at training time and outputting a combined version (such as the mean or the mode) of the results of each individual trees.

Reproducibility (crisis of)

A methodological crisis in science in which scholars have found that the results of many scientific studies are difficult or impossible to replicate or reproduce on subsequent investigation, either by independent researchers or by the original researchers themselves.


The fraction of all relevant samples that are correctly classified as positive.

Rectified Linear Unit

A unit employing the rectifier function as an activation function.

Recurrent Neural Networks

A class of Artificial Neural Network where connections between neurons form a directed graph along a sequence, allowing it to exhibit dynamic temporal behavior for a time sequence and to use their internal state (memory) to process sequential signals.

Regression (Linear Regression, Logistic Regression)

A set of statistical processes for estimating the relationships among variables.

Linear Regression: a simple type of regression taking a linear combination of features as an input, and outputting a continuous value.

Logistic Regression: a type of regression generating a probability for each possible discrete label value in a classification problem by applying a sigmoid function to a linear prediction.


A feature, an explanatory variable used as an input to a model.


The process of introducing additional information in order to prevent overfitting.

Reinforcement Learning

The subfield of Machine Learning inspired by human behavior studying how an agent should take action in a given environment to maximize some notion of cumulative reward.

Restricted Boltzmann Machines

A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.


Semi-Supervised Learning

A class of supervised learning techniques that also leverages available unlabeled data for training, typically using a small number of labeled instances in combination with a larger amount of unlabeled rows. See also Supervised Learning and Unsupervised Learning.

The use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affected states and subjective information.

Speech Recognition

See Automated Speech Recognition

Statistical Distribution

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

Supervised Learning

The Machine Learning task of learning a function mapping an input to an output based on example input-output pairs.

Support Vector Machines (SVM)

A class of discriminative classifiers formally defined by a separating hyperplane, where for each provided labeled training data point, the algorithm outputs an optimal hyperplane which categorizes new examples.

Synthetic Data

Data generated artificially when real data cannot be collected in sufficient amounts, or when original data doesn’t meet certain requirements.



An open-source library, popular among the Machine Learning community, for data flow programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks.

Time Series (Time Series Data)

A sequence of data points recorded at specific times and indexed accordingly to their order of occurrence.

Testing (Testing Data)

In the context of Supervised Machine Learning, the process of assessing the final performance of a model using hold-out data.

Testing Data: The subset of available data that a data scientist selected for the testing phase of the development of a model.

Topic Modeling

A category of Unsupervised Machine Learning algorithms that uses clustering to find hidden structures in textual data, and interpret them as topics.

Training Data

In the context of Supervised Machine Learning, the construction of algorithms that can learn from and make predictions from data.

Training Data: The subset of available data that a data scientist selected for the training phase of the development of a model.

Transfer Learning

An area of Machine Learning that focuses on using knowledge gained to solve a specific problem and apply this knowledge to a different but related problem.

Turing Test

A test developed by Alan Turing to evaluate a machine’s ability to exhibit intelligent behavior equivalent to that of a human. The test consists in having the machine chat with a human. If a human evaluator witnessing the conversation from outside the room where the test takes place can’t reliably tell the machine from the human apart, the machine is said to have passed the Turing test.

Type I Error

Type II Error



A range of values likely to enclose the true value.


The fact that a Machine Learning algorithm fails to capture the underlying structure of the data properly, typically because the model is either not sophisticated enough, or not appropriate for the task at hand; opposite of Overfitting.

Unsupervised Learning

The area of Machine Learning that consists in inferring a function that describes the structure of unlabeled data.



The process of using hold-out data in order to evaluate the performance of a trained model; by opposition to the testing phase which is used for the final assessment of the model’s performance, the validation phase is used to determine if any iterative modification needs to be made to the model.

Vanishing/Exploding Gradients

A dreaded difficulty and major obstacle to recurrent net performance that data scientists face when training Artificial Neural Networks with gradient-based learning methods and backpropagation, due to the neural network’s weights receiving an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training.


An error due to sensitivity to small fluctuations in the training set computed as the expectation of the squared deviation of a random variable from its mean.