Regex Quantifier Tutorial: Greedy, Lazy, Possessive
Regex Quantifiers Tutorial. Explains the fine details of quantifiers, including greedy, lazy (reluctant) and possessive.
Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
[1704.01652] Greed is Good: Near-Optimal Submodular Maximization via Greedy Optimization
"It is known that greedy methods perform well for maximizing monotone submodular functions. At the same time, such methods perform poorly in the face of non-monotonicity. In this paper, we show - arguably, surprisingly - that invoking the classical greedy algorithm O(k√)-times leads to the (currently) fastest deterministic algorithm, called Repeated Greedy, for maximizing a general submodular function subject to k-independent system constraints. Repeated Greedy achieves (1+O(1/k√))k approximation using O(nrk√) function evaluations (here, n and r denote the size of the ground set and the maximum size of a feasible solution, respectively). We then show that by a careful sampling procedure, we can run the greedy algorithm only once and obtain the (currently) fastest randomized algorithm, called Sample Greedy, for maximizing a submodular function subject to k-extendible system constraints (a subclass of k-independent system constrains). Sample Greedy achieves (k+3)-approximation with only O(nr/k) function evaluations. Finally, we derive an almost matching lower bound, and show that no polynomial time algorithm can have an approximation ratio smaller than k+1/2−ε. To further support our theoretical results, we compare the performance of Repeated Greedy and Sample Greedy with prior art in a concrete application (movie recommendation). We consistently observe that while Sample Greedy achieves practically the same utility as the best baseline, it performs at least two orders of magnitude faster."
Difference between off-policy and on-policy learning - Cross Validated
The reason that Q-learning is off-policy is that it updates its Q-values using the Q-value of the next state s′ and the greedy action a′. In other words, it estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it's not following a greedy policy.

The reason that SARSA is on-policy is that it updates its Q-values using the Q-value of the next state s′ and the current policy's action a″. It estimates the return for state-action pairs assuming the current policy continues to be followed.

The distinction disappears if the current policy is a greedy policy. However, such an agent would not be good since it never explores.
Negated Class Solution

Suppose we know that the character { will never be present between the delimiters {START} and {END}. Instead of the lazy quantifier, we can use a negated character class in our pattern:


The negated character class [^{]* greedily matches zero or more characters that are not an opening curly brace. Therefore, we are guaranteed that the dot-star will never jump over the {END} delimiter. This is a more direct and efficient way of matching between {START} and {END}.
regex - How can I make my match non greedy in vim? - Stack Overflow
Instead of .* use .\{-}.


Also, see :help non-greedy
