Loading...
Loading...
(Work in progress. The focus is on terms that won't be part of most introductory courses since those definitions are easy to find and are usually in the [WildML glossary](http://www.wildml.com/deep-learning-glossary/).)
# Glossary
(Work in progress. The focus is on terms that won't be part of most introductory courses since those definitions are easy to find and are usually in the [WildML glossary](http://www.wildml.com/deep-learning-glossary/).)
See also the [basics glossary](basics-glossary.md).
#### Contents:
- [Training methods](#training-methods)
- [Models](#models)
- [Convolution-related layers](#convolution-related-layers)
- [Sequence-related layers](#sequence-related-layers)
- [Architectures](#architectures)
- [Bayesian Inference and Approximate Inference](#bayesian-inference-and-approximate-inference)
- [Black-box optimisation](#black-box-optimisation)
- [External memory](#external-memory)
- [Other models](#other-models)
- [Metrics and Measures](#metrics-and-measures)
- [Reinforcement Learning](#reinforcement-learning)
- [Deep Reinforcement Learning](#deep-reinforcement-learning)
## Training methods
- [Backpropagation](basics-glossary.md)
- Synthetic gradients
- Result: Faster updating of parameter weights
- Method: Using predicted 'synthetic gradients' (estimate based on local gradient) instead of true backpropagated error gradients
- [[Paper (Jaderberg et. al., Jul 2017)]](https://arxiv.org/pdf/1608.05343.pdf)
## Models
### Convolution-related layers
- Dilated convolutions
- Convolutions with filter cells that are regularly spaced out.
- Purpose: Receptive field grows quicker, so can merge more spatial information across input (keeping filter size constant).
- Skip connections
- Mappings (connections) that skip one or more layers.
- E.g. Adds (a 1x1 convolution of) an earlier layer to the most recent network layer
- 
- Image from He et. al., 2015.
- Component of a 'deep residual layer'
- Goal: help network to learn approximate identity layers (if that is what is locally optimal)
- in which case the output of the most recent network layer should be approx 0.
- Introduced by [He et. al., Dec 2015](https://arxiv.org/pdf/1512.03385.pdf) as part of deep residual networks, winner of ILSVRC 2015.
- [Useful Stackexchange post](https://stats.stackexchange.com/questions/56950/neural-network-with-skip-layer-connections)
- Also called *residaul connections*, *shortcut connections*.
- Decimation layer
- Down-sampling, usually either through max-pooling or average pooling
### Sequence-related layers
- Dilated LSTMs
- Nested LSTM
- Use nesting as an approach to constructing temporal hierarchies in memory
- **selective access to inner memories** -> frees inner memories to remember and process events on longer time scales
- [[Paper (Moniz et. al., Jan 2018)]](https://arxiv.org/abs/1801.10308)
### Architectures
- R-CNN (Region-based CNN)
- Object detection model
- [[Fast pytorch implementation]](https://github.com/jwyang/faster-rcnn.pytorch) [[Paper (Faster R-CNN, Ren et. al., Jan 2016)]](https://arxiv.org/abs/1506.01497) [[Explanatory blog post]](https://tryolabs.com/blog/2018/01/18/faster-r-cnn-down-the-rabbit-hole-of-modern-object-detection/)
- ResNets
- ['densely gathers features from previous layers in nthe network and combines them using summation'](https://arxiv.org/abs/1801.05895)
- [Paper (He et. al., 2015)](https://arxiv.org/abs/1512.03385)
- DenseNets
- Dense connection structure where each layer is directly connected to all its predecessors
- -> better information flow and feature reuse
- -> BUT dense skip connections also bring problems of potential risk of overfitting, parameter redundancy and large memory consumption
- ['densely gathers features from previous layers in nthe network and combines them using concatenation'](https://arxiv.org/abs/1801.05895)
- [SparseNet](https://arxiv.org/abs/1801.05895)
- Synthesised from ResNets and DenseNets
- [[Paper: Sparsely Connected Convolutional Networks (Zhu and Deng et. al., Jan 2018)]](https://arxiv.org/abs/1801.05895)
- Capsule Networks
- [TensorFlow implementation](https://github.com/JunYeopLee/capsule-networks)
- NADE
- MADE
- PixelCNN
- PixelCNN++
- PixelRNN
- More architectures
- VGG16
- ResNeXt
- Feature Pyramid Networks
### Bayesian Inference and Approximate Inference
- Autoencoders
- Variational Inference
- Variational Autoencoder
- Variational Lower Bound
- MCMC (Markov Chain Monte Carlo)
- Gibbs Sampling
- Monte Carlo EM
- EM
- Bayesian Neural Networks
- Laplace's approximation
- Metropolis-Hastings
- Hamiltonian Monte Carlo
### Black-box optimisation
- Evolution Strategies
- [[Basic Tutorial]](https://medium.com/@edersantana/mve-series-playing-catch-with-keras-and-an-evolution-strategy-a005b75d0505) [[OpenAI post]](https://blog.openai.com/evolution-strategies/)
- Guided ES
- uses surrogate gradients
### External memory
- [Neural Turing Machine](https://arxiv.org/abs/1410.5401)
- Neural network controller with read-write access to an external memory matrix
- [Differentiable Neural Computers](https://deepmind.com/blog/differentiable-neural-computers/)
- Neural network controller with read-write-erase access to an external memory matrix
- [Kanerva Machine](https://arxiv.org/abs/1804.01756)
- Differentiable Neural Dictionary (DND)
- from Neural Episodic Control (Pritzel et. al., 2017)
- $`M_a = (K_a, V_a)`$, $`K_a, V_a`$ dynamically sized arrays of vectors, each containing the same number of vectors (1-1 correspondence, like a dictionary)
- Operations
1. Lookup: map key h to output o:
- weighted sum of values in memory, weights give by normalised closeness (kernels) between lookup key and corresponding key in memory. Closer match = higher weight.
2. Write (after query/lookup)
- key = lookup key
- value = 'application-specific', e.g. Q-value for RL
- (k, v) appended to $`K_a, V_a`$. If key already exists, entry is updated instead of being duplicated.
- Use approximations in practice: kNN-like
### Other models
- Boltzmann Machines
- Hopfield Networks
- Linear Factor Models
- Independent Component Analysis (ICA)
- Sparse Coding
- Wake-sleep
- [Finite-state machine](https://en.wikipedia.org/wiki/Finite-state_machine) (Abstract model)
- can be in one of a finite number of states $`s_t`$ in S
- can change from one state to another in response to an input
- $`s_{t+1} = f(x_t, s_t), (s_t\in S \forall t, |S|`$ finite)
- memory limited by number of states FSM has, so cannot do some tasks that the Turing machine can.
- [Turing machine](https://en.wikipedia.org/wiki/Turing_machine) (Abstract model)
- Infinite memory tape divided into discrete cells
- Finite table of user-specified instructions
- HEAD positioned over a cell.
- READS symbol from cell,
- LOOKS UP symbol read in finite table of user-specified instructions
- WRITES in cell
- MOVES 1 left or right
- either CARRIES OUT instruction or HALTS computation (indicated in table of user-specified instructions)
- For any algorithm, a Turing machine capable of simulating that algorithm's logic can be constructed.
- (Turing-completeness: ability of sys of instructions to simulate a Turing machine, theoretically able of expressing all tasks accomplishable by computers, nearly all prog langs turing complete if limitations of finite memory are ignored.)
### Metrics and measures
- Alpha-divergence
- Special cases:
- Alpha=0: Variational Bayes
- Alpha=1: Expectation Propagation
- TODO: what is a 'mode' of a posterior p? What does it mean by a solution that aims to cover multiple modes?
## Reinforcement Learning
- Intuition of RL:
- Loop through two steps:
- Agent performs action.
- State may change, agent may get reward.
- Agent explores the environment by taking actions.
- Actions involve time
- Don't pre-program procedures in agent, but agent knows list of actions
- Bellman Equation
- $`V(s) = \max{a}(R(s,a)+\gamma E[V(s')])`$
- where $`\gamma`$ is the discount factor.
- Deterministic version: $`V(s) = \max{a}(R(s,a)+\gamma V(s'))`$
- Expanded for MDPs: $`V(s) = \max{a}(R(s,a)+\gamma \sum_{s'} P(s,a,s')V(s'))`$
- Plans vs Policies:
- Plans comprise the optimal action for each state, with no stochasticity. Policies incorporate stochasticity.
- Deterministic vs non-deterministic search:
- Deterministic search: Agent's intention maps 100% to agent's action.
- Non-deterministic search: Small chance of agent acting differently to how it intends to act
- Markov Decision Processes (MDP)
- Mathematical framework for modelling decision-making where outcomes are partly random and partly under the control of a decision-maker
- Markov Property:
- Memorylessness: Conditional P(X) dist depends only on present state
- Associated Bellman eqn: $`V(s) = \max{a}(R(s,a)+\gamma E[V(s')])`$
- aka $`V(s) = \max{a}(R(s,a)+\gamma \sum_{s'} P(s,a,s')V(s'))`$
- Q-learning
- Give values to actions $`Q(s_0,a_i)`$ instead of states
- $`Q(s,a) = R(s,a)+\gamma \sum_{s'} P(s,a,s')V(s')`$
- i.e. $`Q(s,a) = R(s,a)+\gamma \sum_{s'} P(s,a,s')\max{a'}Q(s',a')`$
- Temporal Difference
- TODO: refine
- (Consider Q-learning under deterministic search for convenienc)
- $`TD_t(a,s) = Q_t(s,a) - Q_{t-1}(s,a) = R(s,a)+\gamma\max{a'}Q(s',a') - Q_{t-1}(s,a)`$
- $`TD(a,s)`$ may be nonzero because of randomness. (Though we've *written* the deterministic search version of )
- Update eqn: $`Q_t(s,a) = Q_{t-1}(s,a) + \alpha TD_t(a,s)`$
- $`\alpha`$ is the learning rate.
- Hope: algorithm will converge to the 'correct' Q-value, unless the environment is constantly changing.
- Living penalty
- e.g. small negative reward when entering each non-terminal state to motivate agent to finish the game quickly
- Successor Representation
- Options framework
- involves abstractions over the space of actions
- at each step, the agent chooses either a one-step 'primitive' action or a 'multi-step' action policy (option). Each option defines a policy over actions (either primitive or other options) and can be terminated according to a stochastic function of $`\beta`$.
- Paper: Sutton et. al. Definition from Kulkarni and Narasimhan et. al (2016)
### Deep Reinforcement Learning
- Deep Q-learning
- **Learning**: Feed in state to NN, final layer gives q-values for each action
- Compares predicted value to previous observed value: loss $`L = \sum(Q_{prev_observed} - Q_{pred})`$
<!-- - TODO: but what if you haven't seen this state before?
- TODO: What if you've seen it multiple times?
-->
- Learning happens for each state
- **Acting**: Put final layer through softmax (or some other action selection policy, see below) and select the corresponding action.
- Experience replay
- Problem: Update after every action, so consecutive states that are similar may bias the neural network.
- Solution: Save state information. Start updating after some initial time period, and update with states drawn uniformly from memory in the interval $`(t-k_1, t-k_2)`$.
- [Schaul et al. (2016), Prioritized Experience Replay](#)
- Action selection policies
- Most commonly used:
- $`\epsilon`$-greedy
- Select highest q-value action $`(1-\epsilon)`$ of the time, randomly otherwise.
- Tokic (2010): can adapt $\epsilon$ depending on the state (smaller $`\epsilon`$ if the agent is certain about its state)
- $`\epsilon`$-soft $`(1-\epsilon)`$
- Opposite of $`\epsilon`$-greedy: select highest q-value action $`\epsilon`$ of the time, randomly otherwise.
- Softmax
- $`\sigma(\textbf{z})_j = \frac{e^{z_j}}{\sum_k e^{z_k}}`$ for $`j=1,...,K`$.
- Outputs across all actions sum to one
- Key is exploration vs exploitation
- Agent may find itself stuck in a local maximum (thinks e.g. a positive-reward action $`Q_2`$ is the best action because it hasn't found the better one $`Q_4`$.)
- On-policy vs off-policy
- On-policy: update value with action actually taken
- Off-policy: update value with max_a Q(s,a'), i.e. no constraint on next action.
- Policy Gradient Methods
- General Challenges
- Sensitive to choice in stepsize
- Often have poor sample efficiency, taking millions or billions of steps to learn simple tasks
- Approaches:
- constraining or optimising size of policy update
- Trust Region Policy Optimisation (TRPO)
- [[Implementation in PyTorch]](https://github.com/ikostrikov/pytorch-trpo)
- Pros
- Good for continuous control tasks
- Cons
- 'isn’t easily compatible with algorithms that share parameters between a policy and value function or auxiliary losses'
- Proximal Policy Optimisation (PPO)
- Tries to minimise cost while ensuring the deviation from the previous policy is relatively small
- Implementation:
- $`L^{CLIP}(\theta) = \hat{E_t}[\min(r_t(\theta)\hat{A_t},\clip(r_t(\theta),1-\epsilon,1+\epsilon)\hat{A_t})]`$
- $`r_t`$: ratio of probability under new and old policies respectively (?) check
- $`\hat{A_t}`$: estimated advantage at time t
- $`\epsilon`$: hyperparameter, usually 0.1 or 0.2
- Much simpler to implement than ACER
- Trust region update compatible with SGD
- [OpenAI blog post](https://blog.openai.com/openai-baselines-ppo/)
- PPO2
- GPU-enabled implementation of PPO by OpenAI.
- Actor Critic with Experience Replay (ACER)
- Sample-efficient ploicy gradient algorithm
- Uses a replay buffer, so it can perform more than one gradient update using each piece of sampled experience, as well as a Q-Function approximate trained with the Retrace algorithm.
- References:
- [OpenAI blog post on PPO](https://blog.openai.com/openai-baselines-ppo/)
- A3C (Asynchoronous Advantage Actor-Critic)
- Actor-critic:
- Two outputs:
1. Actor: outputs Policy, i.e. Q-values $`Q(s,a_i)`$ for all $`a_i`$, possible actions via Softmax
2. Critic: outputs Value of state we're in $`V(s)`$
- Asynchronous
- Multiple agents tackling the same environment, each initalised differently (diff random seed)
- More experience to learn from
- Reduces chance of all agents being stuck in a local max
- Can combine N nets into one single net,
- where N = number of agents.
- So weights are shared.
- Agents share experience by contributing to a common critic
- Advantage
- Have two losses, one for each output (Value loss, policy loss)
- Value loss: TODO (fill in)
- Policy loss:
- Let Advantage A = Q(s,a) - V(s)
- How much better is the Q-value you're selecting compared to the 'known' V value across agents?
- Goal is to maximise advantage: encourages actions that have Q(s,a) > V.
- A2C (Synchronous A3C: Advantage Actor-Critic)
- A2C tends to be unstable due to occasional entropy collapse. (AI Safety Gridworlds, Nov 2017)
- Particularly sensitive to hyperparameter(s) relating to policy entropy
- Rainbow
- Combination of improvements in deep RL
- DQN
- Policy gradient methods
### Other RL
- Batch reinforcement learning
- Do not interact with the system during learning (Used e.g. in real-world industrial settings since unrestricted exploration can damage the system)
-
## References:
- RL: AI A to Z course
title: OpenElections Glossary
*(Updated: December 31, 2025 – Expanded negative pole definitions and examples across all relational modes)*
Griptape Nodes is a toolkit that enables artists and creators to build AI-powered projects without the need for deep technical expertise. You can think of Griptape Nodes as a set of building blocks that you can connect together to create art, generate images, process text, or even build other workflow-centric applications.
| **Use when** | You encounter an unfamiliar term, or need to explain a concept to stakeholders |