janpeuker + ai   195

Google AI Blog: Scalable Deep Reinforcement Learning for Robotic Manipulation
Additionally, we’ve found that QT-Opt reaches this higher success rate using less training data, albeit with taking longer to converge. This is especially exciting for robotics, where the bottleneck is usually collecting real robot data, rather than training time. Combining this with other data efficiency techniques (such as our prior work on domain adaptation for grasping) could open several interesting avenues in robotics. We’re also interested in combining QT-Opt with recent work on learning how to self-calibrate, which could further improve the generality.

Overall, the QT-Opt algorithm is a general reinforcement learning approach that’s giving us good results on real world robots. Besides the reward definition, nothing about QT-Opt is specific to robot grasping. We see this as a strong step towards more general robot learning algorithms, and are excited to see what other robotics tasks we can apply it to. You can learn more about this work in the short video below.
ai  research  Emergence 
5 days ago by janpeuker
Google AI Blog: Self-Supervised Tracking via Video Colorization
While we do not yet outperform heavily supervised models, the colorization model learns to track video segments and human pose well enough to outperform the latest methods based on optical flow. Breaking down performance by motion type suggests that our model is a more robust tracker than optical flow for many natural complexities, such as dynamic backgrounds, fast motion, and occlusions. Please see the paper for details.
ai  algorithm  Emergence 
5 days ago by janpeuker
Troubling Trends in Machine Learning Scholarship – Approximately Correct
Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on the following four patterns that appear to us to be trending in ML scholarship:

Failure to distinguish between explanation and speculation.
Failure to identify the sources of empirical gains, e.g. emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning.
Mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g. by confusing technical and non-technical concepts.
Misuse of language, e.g. by choosing terms of art with colloquial connotations or by overloading established technical terms.
research  psychology  ai  mathematics 
5 days ago by janpeuker
Introducing MLflow: an Open Source Machine Learning Platform - The Databricks Blog
Because of these challenges, it is clear that ML development has to evolve a lot to become as robust, predictable and wide-spread as traditional software development. To this end, many organizations have started to build internal machine learning platforms to manage the ML lifecycle. For example, Facebook, Google and Uber have built FBLearner Flow, TFX, and Michelangelo to manage data preparation, model training and deployment. However, even these internal platforms are limited: typical ML platforms only support a small set of built-in algorithms, or a single ML library, and they are tied to each company’s infrastructure. Users cannot easily leverage new ML libraries, or share their work with a wider community.
ai  opensource  library 
11 days ago by janpeuker
Overview - seq2seq
tf-seq2seq is a general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.
ai  library  literature 
12 days ago by janpeuker
AdamW and Super-convergence is now the fastest way to train neural nets · fast.ai
That means that we’ve seen (for the first time we’re aware of) super convergence using Adam! Super convergence is a phenomenon that occurs when training a neural net with high learning rates, growing for half the training. Before it was understood, training CIFAR10 to 94% accuracy took about 100 epochs.
In contrast to previous work, we see Adam getting about as good accuracy as SGD+Momentum on every CNN image problem we’ve tried it on, as long as it’s properly tuned, and it’s nearly always a bit faster too.
ai  performance 
12 days ago by janpeuker
A New Angle on L2 Regularization
Finally we have:
minimize: R(w,b)⟺∥w∥→0maximize: dadv
minimize: R(w,b)⟺‖w‖→0maximize: dadv
In words, when ∥w∥‖w‖ is small, minimizing the empirical risk for the hinge loss or the softplus loss is equivalent to maximizing the adversarial distance, which can be interpreted as minimizing the phenomenon of adversarial examples.
In practice, the value of ∥w∥‖w‖ can be controlled by adding a regularization term to the empirical risk, yielding the regularized loss:

empirical risk
L2 regularization
A small regularization parameter λλ lets ∥w∥‖w‖ grow unchecked while a larger λλ encourages ∥w∥‖w‖ to shrink.
ai  research  security 
13 days ago by janpeuker
Backpropagation demo
Backpropagation algorithm
The backpropagation algorithm is essential for training large neural networks quickly. This article explains how the algorithm works.
learning  visualization  ai 
18 days ago by janpeuker
Twitter meets TensorFlow
What is Data Record

Twitter’s choice data format is the DataRecord. It has a long history of use for ML tasks at Twitter. DeepBird v2 recognizes data saved using this format. Below is the Thrift struct of the DataRecord:

DataRecords were originally implemented as a way to conveniently store different combinations of sparse and dense features in single unified struct. It has since evolved to support more modern features like tensors and blobs.
ai  database  analytics 
24 days ago by janpeuker
Will Automation Push People Out of Architecture? - The Atlantic
As new banks go up and old airports remodel, architecture is beginning to catch up. If buildings once had been awkwardly repurposed to integrate automation, now they can be designed to streamline machine interfaces from the start.

But what does it mean to design a structure that focuses human attention on technology instead of other humans? Architecture, says Lynn, should be about “trying to make things as humane and rich and meaningful as possible”—yet increasingly people are thrust into spaces where their attention is devoted to swiping and punching and scanning devices and machines.
Architecture  society  ai 
28 days ago by janpeuker
Distributed Deep Learning with Polyaxon – Polyaxon – Medium
Distributed Tensorflow
To distribute Tensorflow experiments, the user needs to define a cluster, which is a set of tasks that participate in the distributed execution.

Tensorflow defines 3 different types of tasks: master, workers, and parameter servers.

To define a cluster in Polyaxon with a master, 2 parameter servers, and 4 workers, we need to add a tensorflow subsection to the environment section:
scalability  ai 
29 days ago by janpeuker
Develop ML with JavaScript
Use flexible and intuitive APIs to build and train models from scratch using the low-level JavaScript linear algebra library or the high-level layers API
Run Existing models
Use TensorFlow.js model converters to run pre-existing TensorFlow models right in the browser or under Node.js.
Retrain Existing models
Retrain pre-existing ML models using sensor data connected to the browser, or other client-side data.
ai  javascript  nodejs 
5 weeks ago by janpeuker
AI Needs New Clichés – Molly Wright Steenson – Medium
Minority Report was advised by science advisor John Underkoffler, founder and CEO of Oblong, which builds immersive human machine interface (HMI) platforms combining screens of different scales and different modes of interaction. It’s something he’s worked on for nearly 30 years, starting with his his master’s thesis at the MIT Media Lab on holograms and photographic reality in 1991, in which he investigated “the development of new techniques for calculating holographic interference patterns of objects and scenes with realistic visual characteristics.” This research was part of the MIT Holographic Video project — an agenda that was set over a decade earlier in the late 1970s, when Nicholas Negroponte and researchers at MIT’s Architecture Machine Group (the predecessor to the Media Lab) developed simulation environments that they said would be indistinguishable from reality. In 1978, Negroponte and colleagues wrote in a proposal, “We are reminded of the prompt from Bell. It is the next best thing to being there. This proposal is about being there.”
ai  usability  history 
5 weeks ago by janpeuker
VISxAI Workshop at IEEE VIS 2018
Workshop on
Visualization for AI Explainability
October 22, 2018 at IEEE VIS in Berlin, Germany

The role of visualization in artificial intelligence (AI) gained significant attention in recent years. With the growing complexity of AI models, the critical need for understanding their inner-workings has increased. Visualization is potentially a powerful technique to fill such a critical need.

The goal of this workshop is to initiate a call for “explainables” that explain how AI techniques work using visualizations. We believe the VIS community can leverage their expertise in creating visual narratives to bring new insight into the often obfuscated complexity of AI systems.
ai  visualization  conference 
5 weeks ago by janpeuker
Google AI Blog: Improving Deep Learning Performance with AutoAugment
Our AutoAugment algorithm found augmentation policies for some of the most well-known computer vision datasets that, when incorporated into the training of the neural network, led to state-of-the-art accuracies. By augmenting ImageNet data we obtain a new state-of-the-art accuracy of 83.54% top1 accuracy and on CIFAR10 we achieve an error rate of 1.48%, which is a 0.83% improvement over the default data augmentation designed by scientists. On SVHN, we improved the state-of-the-art error from 1.30% to 1.02%. Importantly, AutoAugment policies are found to be transferable — the policy found for the ImageNet dataset could also be applied to other vision datasets (Stanford Cars, FGVC-Aircraft, etc.), which in turn improves neural network performance.
ai  algorithm  blog 
5 weeks ago by janpeuker
AI winter is well on its way – Piekniewski's blog
So in fact, this graph which was meant to show how well deep learning scales, indicates the exact opposite. We can't just scale up AlexNet and get respectively better results - we have to fiddle with specific architectures, and effectively additional compute does not buy much without order of magnitude more data samples, which are in practice only available in simulated game environments.
ai  article  scalability 
6 weeks ago by janpeuker
Meet Michelangelo: Uber's Machine Learning Platform
The data management components of Michelangelo are divided between online and offline pipelines. Currently, the offline pipelines are used to feed batch model training and batch prediction jobs and the online pipelines feed online, low latency predictions (and in the near future, online learning systems).

In addition, we added a layer of data management, a feature store that allows teams to share, discover, and use a highly curated set of features for their machine learning problems.  We found that many modeling problems at Uber use identical or similar features, and there is substantial value in enabling teams to share features between their own projects and for teams in different organizations to share features with each other.
ai  engineering  Architecture  Patterns 
6 weeks ago by janpeuker
Python as a Declarative Programming Language
Using an imperative style means that you spend too much time wading through the glue, but declaring what operations you want leads to code thats efficient and clean.

The side effect of this is that in order to be a great Python programmer, you have to learn to program in a lower level language too. All of the most popular Python data libraries have native extensions: TensorFlow, scikit-learn, NumPy, Pandas, SciPy, spaCY etc all have significant portions of their code written in a native language. If you are comfortable just using these libraries its enough to be just a good Python programmer; however, if you want to be the type of programmer that can produce libraries like these you really should be learning something like C++ or Cython too.
engineering  functional  ai 
8 weeks ago by janpeuker
A Deep Dive into Monte Carlo Tree Search
The idea is simple. Instead of ranking according to estimated rating, you add a bonus based on how uncertain you are about the rating. In this example, the top submission on HN has fewer upvotes than the second rank submission, but it’s also newer. So it gets a bigger uncertainty bonus. The uncertainty bonus fades over time, and that submission will fall in ranking unless it can prove its worth with more upvotes.

This is an instance of the Multi Armed Bandit problem and has a pretty extensive literature if you want to learn more.

UCT = Upper Confidence bounds applied to Trees
So how does this help us understand AlphaGoZero? Playing a game has a lot in common with the multi-armed bandit problem: when reading into a game variation, you want to balance between playing the strongest known response, and exploring new variations that could turn out to be good moves. So it makes sense that we can reuse the UCB idea.
ai  Python  algorithm 
8 weeks ago by janpeuker
Google AI Blog: Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone
At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX). To obtain its high precision, we trained Duplex’s RNN on a corpus of anonymized phone conversation data. The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks. Finally, we used hyperparameter optimization from TFX to further improve the model.
ai  google  cybernetics 
9 weeks ago by janpeuker
Theseus Maze | The MIT 150 Exhibition
Theseus Maze
Howard Gardner said Claude Shannon’s MIT master’s thesis was “possibly the most important, and also the most famous, master’s thesis of the century.” If you’ve used the word “bit” (for binary digit), then you have an idea of what he meant. More important, Claude Shannon’s early ideas proved key to the redesign of the telephone system and the development of the modern computer. During World War II, he met the famous British mathematician Alan Turing. That exchange resulted in Shannon’s pioneering analysis of cryptography systems.(A declassified version of his original 1945 memo was published in 1949.) Most notably, Shannon’s 1948 paper, A Mathematical Theory of Communication, was hailed as “the Magna Carta of the information age.” This digital pioneer had another extremely imaginative and playful side. Shannon loved to build mechanical toys for his family. Theseus was more than an electromechanical maze in which a mouse blunders around looking for the “cheese.” Built with his wife Betty, Shannon’s maze was an elegant display of telephone switching technology. When you make a telephone call, information travels the telephone system labyrinth to find the right telephone to ring, just as the mouse in this maze searches for its cheese.

Theseus Maze
Claude Shannon
history  engineering  ai 
12 weeks ago by janpeuker
Introducing TensorFlow Probability – TensorFlow – Medium
What’s in TensorFlow Probability?
Our stack of probabilistic ML tools provides modular abstractions for probabilistic reasoning and statistical analysis in the TensorFlow ecosystem.

An overview of TensorFlow Probability. The probabilistic programming toolbox provides benefits for users ranging from Data Scientists and Statisticians to all TensorFlow Users.
Layer 0: TensorFlow. Numerical operations. In particular, the LinearOperator class enables matrix-free implementations that can exploit special structure (diagonal, low-rank, etc.) for efficient computation. It is built and maintained by the TensorFlow Probability team and is now part of tf.linalg in core TF.

Layer 1: Statistical Building Blocks

Distributions (tf.contrib.distributions, tf.distributions): A large collection of probability distributions and related statistics with batch and broadcasting semantics.
Bijectors (tf.contrib.distributions.bijectors): Reversible and composable transformations of random variables. Bijectors provide a rich class of transformed distributions, from classical examples like the log-normal distribution to sophisticated deep learning models such as masked autoregressive flows.
(See the TensorFlow Distributions whitepaper for more information.)

Layer 2: Model Building

Edward2 (tfp.edward2): A probabilistic programming language for specifying flexible probabilistic models as programs.
Probabilistic Layers (tfp.layers): Neural network layers with uncertainty over the functions they represent, extending TensorFlow Layers.
Trainable Distributions (tfp.trainable_distributions): Probability distributions parameterized by a single Tensor, making it easy to build neural nets that output probability distributions.
Layer 3: Probabilistic Inference

Markov chain Monte Carlo (tfp.mcmc): Algorithms for approximating integrals via sampling. Includes Hamiltonian Monte Carlo, random-walk Metropolis-Hastings, and the ability to build custom transition kernels.
Variational Inference (tfp.vi): Algorithms for approximating integrals via optimization.
Optimizers (tfp.optimizer): Stochastic optimization methods, extending TensorFlow Optimizers. Includes Stochastic Gradient Langevin Dynamics.
Monte Carlo (tfp.monte_carlo): Tools for computing Monte Carlo expectations.
Layer 4: Pre-made Models and Inference (analogous to TensorFlow’s pre-made Estimators)

Bayesian structural time series (coming soon): High-level interface for fitting time-series models (i.e., similar to R’s BSTS package).
Generalized Linear Mixed Models (coming soon): High-level interface for fitting mixed-effects regression models (i.e., similar to R’s lme4 package)
algorithm  ai 
april 2018 by janpeuker
Design in the Era of the Algorithm | Big Medium
The answer machines have an overconfidence problem. It’s not only a data-science problem that the algorithm returns bad conclusions. It’s a problem of presentation: the interface suggests that there’s one true answer, offering it up with a confidence that is unjustified.

So this is a design problem, too. The presentation fails to set appropriate expectations or context, and instead presents a bad answer with matter-of-fact assurance. As we learn to present machine-originated content, we face a very hard question: how might we add some productive humility to these interfaces to temper their overconfidence?

I have ideas. Here are ten design principles for conceiving, designing, and managing data-driven products.

Favor accuracy over speed
Allow for ambiguity
Add human judgment
Advocate sunshine
Embrace multiple systems
Make it easy to contribute (accurate) data
Root out bias and bad assumptions
Give people control over their data
Be loyal to the user
Take responsibility
ai  design  usability 
april 2018 by janpeuker
Interactive supervision with TensorBoard by IBM Scientist
It’s like a liquid thinking process that fluidly adapts to the user’s definition of structure. The user gets to compose a perspective of the data that is useful.

The general effect is, predictably, that same-label samples form tighter and combined clusters, which effectively clears space in the embedding that highlights outliers and unlabeled points. This may incrementally reduce the user difficulty in applying labels to a dataset, as the embedding progressively becomes organized into compact clusters. t-SNE is extremely useful in providing an initial view of the data structure, but then supervision can be injected into its objective and iterative gradient descent can compose a user perspective of the data.
ai  engineering 
april 2018 by janpeuker
Interpretable Machine Learning
If you can ensure that the machine learning model can explain decisions, the following traits can also be checked more easily (Doshi-Velez and Kim 2017):

Fairness: Making sure the predictions are unbiased and not discriminating against protected groups (implicit or explicit). An interpretable model can tell you why it decided that a certain person is not worthy of a credit and for a human it becomes easier to judge if the decision was based on a learned demographic (e.g. racial) bias.
Privacy: Ensuring that sensitive information in the data is protected.
Reliability or Robustness: Test that small changes in the input don’t lead to big changes in the prediction.
Causality: Check if only causal relationships are picked up. Meaning a predicted change in a decision due to arbitrary changes in the input values are also happening in reality.
Trust: It is easier for humans to trust into a system that explains its decisions compared to a black box.
ai  learning  book 
march 2018 by janpeuker
AI and Neurography - Interalia Magazine
My work happens on various levels of detail. At the very coarse level is my interest in systems of any kind. This includes actual physical systems like machines or humans, complex systems like society or bureaucracy or believe systems like art and religion. I try to understand what makes these systems tick and how they can be manipulated or augmented. Art as a system is particularly interesting to me since it involves several subsystems like human creativity and perception on the creator’s side and the social, commercial or cultural mechanisms on the audience’s side. Some questions I try to find answers to in this context are for example “What makes an image to be perceived as art and another one not?” or “What is an artist?” These questions then bring me into areas of very fine detail, like “how do you make a bunch of pixels look like an eye?” or “how can a machine create the 100.001st image and make sure that it looks different than all the previous ones and is interesting to a human spectator?”

Ultimately I do all this to satisfy my own curiosity, but since I have to live and eat and since art is not a one-way channel I try to make work that is relevant, interesting, entertaining or touching to other human beings as well.
ai  art  psychology 
march 2018 by janpeuker
A visual introduction to machine learning
machine learning
Finding patterns in data is where machine learning comes in. Machine learning methods use statistical learning to identify boundaries.

One example of a machine learning method is a decision tree. Decision trees look at one variable at a time and are a reasonably accessible (though rudimentary) machine learning method.
ai  visualization  html5  learning 
march 2018 by janpeuker
Meet Horovod: Uber's Open Source Distributed Deep Learning Framework
In early 2017, Baidu published an article, “Bringing HPC Techniques to Deep Learning,” evangelizing a different algorithm for averaging gradients and communicating those gradients to all nodes (Steps 2 and 3 above), called ring-allreduce, as well as a fork of TensorFlow through which they demonstrated a draft implementation of this algorithm. The algorithm was based on the approach introduced in the 2009 paper “Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations” by Patarasuk and Yuan.
ai  performance  algorithm  research 
march 2018 by janpeuker
Word Embeddings: Explaining their properties – Off the convex path
Why do Semantic Relations correspond to Directions?
Remember the striking discovery in the word2vec paper: word analogy tasks can be solved by simple linear algebra. For example, the word analogy question man : woman ::king : ?? can be solved by looking for the word w such that vking−vw is most similar to vman−vwoman; in other words, minimizes

This strongly suggests that semantic relations —in the above example, the relation is masculine-feminine—correspond to directions in space. However, this interpretation is challenged by Levy and Goldberg who argue there is no linear algebra magic here, and the expression can be explained simply in terms of traditional connection between word similarity and vector inner product (cosine similarity). See also this related blog post.
ai  bias  diversity  research 
march 2018 by janpeuker
Car Wars | this.
In this work of speculative fiction author Cory Doctorow takes us into a near future where the roads are solely populated by self-driving cars.
ai  future  literature 
march 2018 by janpeuker
TensorFlow Wide & Deep Learning Tutorial  |  TensorFlow
In this tutorial, we'll introduce how to use the tf.estimator API to jointly train a wide linear model and a deep feed-forward neural network. This approach combines the strengths of memorization and generalization. It's useful for generic large-scale regression and classification problems with sparse input features (e.g., categorical features with a large number of possible feature values). If you're interested in learning more about how Wide & Deep Learning works, please check out our research paper.

The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). At a high level, there are only 3 steps to configure a wide, deep, or Wide & Deep model using the tf.estimator API:

Select features for the wide part: Choose the sparse base columns and crossed columns you want to use.
Select features for the deep part: Choose the continuous columns, the embedding dimension for each categorical column, and the hidden layer sizes.
Put them all together in a Wide & Deep model (DNNLinearCombinedClassifier).
ai  library  algorithm 
march 2018 by janpeuker
AI Has a Hallucination Problem That's Proving Tough to Fix | WIRED
Solving that problem—which could challenge designers of self-driving vehicles—may require a more radical rethink of machine-learning technology. “The fundamental problem I would say is that a deep neural network is very different from a human brain,” says Li.

Humans aren’t immune to sensory trickery. We can be fooled by optical illusions, and a recent paper from Google created weird images that tricked both software and humans who glimpsed them for less than a tenth of a second to mistake cats for dogs. But when interpreting photos we look at more than patterns of pixels, and consider the relationship between different components of an image, such as the features of a person’s face, says Li.
psychology  bias  ai 
march 2018 by janpeuker
NSynth Super
As part of this exploration, they've created NSynth Super in collaboration with Google Creative Lab. It’s an open source experimental instrument which gives musicians the ability to make music using completely new sounds generated by the NSynth algorithm from 4 different source sounds. The experience prototype (pictured above) was shared with a small community of musicians to better understand how they might use it in their creative process.
music  hardware  ai 
march 2018 by janpeuker
Deep Neural Network implemented in pure SQL over BigQuery
Now let us look at the deeper implications of a distributed SQL engine in the context of deep learning. One limitation of warehouse SQL engines like BigQuery and Presto is that the query processing is performed using CPUs instead of GPUs. It would be interesting to check out the results with GPU-accelerated SQL databases like blazingdb and mapd. One straightforward approach to check out would be to perform query and data distribution using a distributed SQL engine and to perform the local computations using a GPU accelerated database.
database  analytics  ai 
march 2018 by janpeuker
Reptile: A Scalable Meta-Learning Algorithm
We’ve developed a simple meta-learning algorithm called Reptile which works by repeatedly sampling a task, performing stochastic gradient descent on it, and updating the initial parameters towards the final parameters learned on that task. This method performs as well as MAML, a broadly applicable meta-learning algorithm, while being simpler to implement and more computationally efficient.
humans’ fast-learning abilities can
be explained as Bayesian inference, and that the key to developing algorithms with human-level
learning speed is to make our algorithms more Bayesian. However, in practice, it is challenging to
develop (from first principles) Bayesian machine learning algorithms that make use of deep neural
networks and are computationally feasible.
Metalearning has emerged recently as an approach for learning from small amounts of data.
Rather than trying to emulate Bayesian inference (which may be computationally intractable), metalearning
seeks to directly optimize a fast-learning algorithm, using a dataset of tasks. Specifically,
we assume access to a distribution over tasks, where each task is, for example, a classification task.
From this distribution, we sample a training set and a test set. Our algorithm is fed the training
set, and it must produce an agent that has good average performance on the test set. Since each
task corresponds to a learning problem, performing well on a task corresponds to learning quickly.
ai  learning  research  psychology 
march 2018 by janpeuker
How to Make A.I. That’s Good for People - The New York Times
Sometimes this difference is trivial. For instance, in my lab, an image-captioning algorithm once fairly summarized a photo as “a man riding a horse” but failed to note the fact that both were bronze sculptures. Other times, the difference is more profound, as when the same algorithm described an image of zebras grazing on a savanna beneath a rainbow. While the summary was technically correct, it was entirely devoid of aesthetic awareness, failing to detect any of the vibrancy or depth a human would naturally appreciate.

That may seem like a subjective or inconsequential critique, but it points to a major aspect of human perception beyond the grasp of our algorithms. How can we expect machines to anticipate our needs — much less contribute to our well-being — without insight into these “fuzzier” dimensions of our experience?
ai  article  psychology 
march 2018 by janpeuker
12 Useful Things to Know about Machine Learning – James Le – Medium
10 — Simplicity Does Not Imply Accuracy
Occam’s razor famously states that entities should not be multiplied beyond necessity. In machine learning, this is often taken to mean that, given two classifiers with the same training error, the simpler of the two will likely have the lowest test error. Purported proofs of this claim appear regularly in the literature, but in fact there are many counter-examples to it, and the “no free lunch” theorems imply it cannot be true.
ai  research  bias 
march 2018 by janpeuker
The Building Blocks of Interpretability
In our view, features do not need to be flawless detectors for it to be useful for us to think about them as such. In fact, it can be interesting to identify when a detector misfires.

With regards to attribution, recent work suggests that many of our current techniques are unreliable. One might even wonder if the idea is fundamentally flawed, since a function’s output could be the result of non-linear interactions between its inputs. One way these interactions can pan out is as attribution being “path-dependent”. A natural response to this would be for interfaces to explicitly surface this information: how path-dependent is the attribution? A deeper concern, however, would be whether this path-dependency dominates the attribution.
ai  documentation  Emergence 
march 2018 by janpeuker
ND4J: N-Dimensional Arrays for Java - N-Dimensional Scientific Computing for Java
A usability gap has separated Java, Scala and Clojure programmers from the most powerful tools in data analysis, like NumPy or Matlab. Libraries like Breeze don’t support n-dimensional arrays, or tensors, which are necessary for deep learning and other tasks. Libraries like Colt and Parallel Colt use or have dependencies with GPL in the license, making them unsuitable for commercial use. ND4J and ND4S are used by national laboratories such as Nasa JPL for tasks such as climatic modeling, which require computationally intensive simulations.
ai  java  scala  Python  library 
march 2018 by janpeuker
Notes on Gartner’s 2018 Data Science and Machine Learning MQ | ML/DL
While Apache Spark remains the go-to tool for data engineering and application development. interest among data scientists peaked a year or so ago. TensorFlow is now the cool kid on the block. We’re also seeing renewed interest in Caffe/Caffe2, due to the hot market for image classification and recognition.

Yeah, I know. I forgot PyTorch.

Apache Flink has solid use cases in stream processing, but its champions no longer bother to say it’s a tool for machine learning. Here’s a bye-ku for Flink:

Ten guys in Berlin

Thought Flink would eat the world, but

Budding users yawned

We can also drop Mahout and Pig from the chart. And now that Neo4J has a Spark backend, you can stick a fork in GraphX. Please.
analytics  ai  opensource 
february 2018 by janpeuker
Preparing for Malicious Uses of AI
AI is a technology capable of immensely positive and immensely negative applications. We should take steps as a community to better evaluate research projects for perversion by malicious actors, and engage with policymakers to understand areas of particular sensitivity. As we write in the paper: “Surveillance tools can be used to catch terrorists or oppress ordinary citizens. Information content filters could be used to bury fake news or manipulate public opinion. Governments and powerful private actors will have access to many of these AI tools and could use them for public good or harm.” Some potential solutions to these problems include pre-publication risk assessments for certain bits of research, selectively sharing some types of research with a significant safety or security component among a small set of trusted organizations, and exploring how to embed norms into the scientific community that are responsive to dual-use concerns.
ai  security  future  research 
february 2018 by janpeuker
t-SNE – Laurens van der Maaten
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets. We applied it on data sets with up to 30 million examples. The technique and its variants are introduced in the following papers:
visualization  ai  algorithm 
february 2018 by janpeuker
The Benjamin Franklin Method of Reading Programming Books | Path-Sensitive
This process is a little bit like being a human autoencoder. An autoencoder is a neural network that tries to produce output the same as its input, but passing through an intermediate layer which is too small to fully represent the data. In doing so, it’s forced to learn a more compact representation. Here, the neural net in question is that den of dendrons in your head.

K. Anders Ericsson likens it to how artists practice by trying to imitate some famous work. Mathematicians are taught to attempt to prove most theorems themselves when reading a book or paper --- even if they can’t, they’ll have an easier time compressing the proof to its basic insight. I used this process to get a better eye for graphical design; it was like LASIK.

But the basic version idea applied to programming books is particularly simple yet effective.

Here’s how it works:
Read your programming book as normal. When you get to a code sample, read it over

Then close the book.

Then try to type it up.
learning  book  ai 
february 2018 by janpeuker
Attention and Augmented Recurrent Neural Networks
Neural Turing Machines [2] combine a RNN with an external memory bank. Since vectors are the natural language of neural networks, the memory is an array of vectors:

Memory is an array of vectors.
Network A writes and reads from this memory each step.
But how does reading and writing work? The challenge is that we want to make them differentiable. In particular, we want to make them differentiable with respect to the location we read from or write to, so that we can learn where to read and write. This is tricky because memory addresses seem to be fundamentally discrete. NTMs take a very clever solution to this: every step, they read and write everywhere, just to different extents.

As an example, let’s focus on reading. Instead of specifying a single location, the RNN outputs an “attention distribution” that describes how we spread out the amount we care about different memory positions. As such, the result of the read operation is a weighted sum.
ai  algorithm 
february 2018 by janpeuker
Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models | Blog | Explosion AI
An embedding table maps long, sparse, binary vectors into shorter, dense, continuous vectors. For example, imagine we receive our text as a sequence of ASCII characters. There are 256 possible values, so we can represent each value as a binary vector with 256 dimensions. The value for a will be a vector of 0s, with a 1 at column 97, while the value for b will be a vector of zeros with a 1 at column 98. This is called the "one hot" encoding scheme. Different values receive entirely different vectors.

Most neural network models begin by tokenising the text into words, and embedding the words into vectors. Other models extend the word vector representation with other information. For instance, it's often useful to pass forward a sequence of part-of-speech tags, in addition to the word IDs. You can then learn tag embeddings, and concatenate the tag embedding to the word embedding. This lets you push some amount of position-sensitive information into the word representation. However, there's a much more powerful way to make the word representations context-specific.

Step 2: Encode
Given a sequence of word vectors, the encode step computes a representation that I'll call a sentence matrix, where each row represents the meaning of each token in the context of the rest of the sentence.

The technology used for this purpose is a bidirectional RNN. Both LSTM and GRU architectures have been shown to work well for this. The vector for each token is computed in two parts: one part by a forward pass, and another part by a backward pass. To get the full vector, we simply stick the two together.
algorithm  ai 
february 2018 by janpeuker
Understanding LSTM Networks -- colah's blog
Attention isn’t the only exciting thread in RNN research. For example, Grid LSTMs by Kalchbrenner, et al. (2015) seem extremely promising. Work using RNNs in generative models – such as Gregor, et al. (2015), Chung, et al. (2015), or Bayer & Osendorfer (2015) – also seems very interesting. The last few years have been an exciting time for recurrent neural networks, and the coming ones promise to only be more so!
ai  algorithm  mathematics 
february 2018 by janpeuker
How to build your own AlphaZero AI using Python and Keras
This file contains the Residual_CNN class, which defines how to build an instance of the neural network.

It uses a condensed version of the neural network architecture in the AlphaGoZero paper — i.e. a convolutional layer, followed by many residual layers, then splitting into a value and policy head.

The depth and number of convolutional filters can be specified in the config file.

The Keras library is used to build the network, with a backend of Tensorflow.

To view individual convolutional filters and densely connected layers in the neural network, run the following inside the the run.ipynb notebook:
ai  Python  howto 
january 2018 by janpeuker
Ethics in Machine Learning – Roya Pakzad – Medium
The other issue is the need for collaboration between social scientists and AI researchers. You know, you can’t expect AI researchers themselves to come up with a clear understanding of fairness. Not only we need people in social sciences to collaborate with us in defining these words, but also we need to keep this collaboration all along to the end of the product research and development.

“One very important issue is the lack of a concrete definition of fairness.”
But it’s important to note that some collaborations between AI researchers and social scientists are already underway. For example, Solon Barocas (Cornell University) and Moritz Hardt at UC Berkeley have been working on the issue of defining and modeling fairness in active collaboration with social scientists.
society  ai  philosophy 
january 2018 by janpeuker
One model to learn them all | the morning paper
We’d need to be able to support different input and output modalities (as required by the task in hand), we’d need a common representation of the learned knowledge that was shared across all of these modalities, and we’d need sufficient ‘apparatus’ such that tasks which need a particular capability (e.g. attention) are able to exploit it. ‘One model to rule them all’ introduces a MultiModel architecture with exactly these features, and it performs impressively we
ai  Architecture 
january 2018 by janpeuker
Turning Design Mockups Into Code With Deep Learning - FloydHub Blog
LSTMs are a lot heavier for my cognition compared to CNNs. When I unrolled all the LSTMs they became easier to understand. Fast.ai’s video on RNNs was super useful. Also, focus on the input and output features before you try understanding how they work.
Building a vocabulary from the ground up is a lot easier than narrowing down a huge vocabulary. This includes everything from fonts, div sizes, hex colors to variable names and normal words.
Most of the libraries are created to parse text documents and not code. In documents, everything is separated by a space, but in code, you need custom parsing.
You can extract features with a model that’s trained on Imagenet. This might seem counterintuitive since Imagenet has few web images. However, the loss is 30% higher compared to to a pix2code model, which is trained from scratch. I’d be interesting to use a pre-train inception-resnet type of model based on web screenshots.
ai  design  engineering 
january 2018 by janpeuker
Google and Others Are Building AI Systems That Doubt Themselves - MIT Technology Review
The work reflects the realization that uncertainty is a key aspect of human reasoning and intelligence. Adding it to AI programs could make them smarter and less prone to blunders, says Zoubin Ghahramani, a prominent AI researcher who is a professor at the University of Cambridge and chief scientist at Uber.

This may prove vitally important as AI systems are used in ever more critical scenarios. “We want to have a rock-solid framework for deep learning, but make it easier for people to represent uncertainty,” Ghahramani told me recently over coffee one morning during a major AI conference in Long Beach, California.

Pyro, a new programming language released by Uber that merges deep learning with probabilistic programming.
library  ai  psychology  Emergence 
january 2018 by janpeuker
A gentle introduction to genetic algorithms | sausheong's space
More technically speaking, mutations get us out of a local maximum in order to find the global maximum. If we look at genetic algorithms as a mechanism to find the optimal solution, if we don’t have mutation, once a local maximum is found the mechanism will simply settle on that and never moves on to find the global maximum. Mutations can jolt the population out of a local maximum and therefore provide an opportunity for the algorithm to continue looking for the global maximum.
algorithm  Emergence  ai 
january 2018 by janpeuker
LeCun vs Rahimi: Has Machine Learning Become Alchemy?
LeCun agreed with Rahimi’s views on pedagogy, saying “Simple and general theorems are good… but it could very well be that we won’t have ‘simple’ theorems that are more specific to neural networks, for the same reasons we don’t have analytical solutions of Navier-Stokes or the 3-body problem.”

The Rahimi — LeCun debate grew into a wide-ranging discussion at NIPS and on the internet. Dr. Yiran Chen, Director of the Duke Center of Evolutionary Lab, attempted to make peace, suggesting LeCun had overreacted, and that the opposing positions were actually not so contradictory.
philosophy  ai  research 
january 2018 by janpeuker
Transfer Learning - Machine Learning's Next Frontier
In the real world, however, we would like an agent to be able to deal with tasks that gradually become more complex by leveraging its past experience. To this end, we need to enable a model to learn continuously without forgetting. This area of machine learning is known as learning to learn [36], meta-learning, life-long learning, or continuous learning.
ai  psychology  research 
january 2018 by janpeuker
Multivariate Linear Regression, Gradient Descent in JavaScript - RWieruch
Multivariate Gradient Descent (Vectorized) in JavaScript
Now it is time to implement the gradient descent algorithm to train the theta parameters of the hypothesis function. The hypothesis function can be used later on to predict future housing prices by their number of bedrooms and size. If you recall from the introductory article about gradient descent, the algorithm takes a learning rate alpha and an initial definition of the theta parameters for the hypothesis. After an amount of iterations, it returns the trained theta parameters.
javascript  howto  ai 
january 2018 by janpeuker
Neuroevolution: A different kind of deep learning - O'Reilly Media
deep learning traditionally focuses on programming an ANN to learn, while the concern in neuroevolution focuses on the origin of the architecture of the brain itself, which may encompass what is connected to what, the weights of those connections, and (sometimes) how those connections are allowed to change. There is, of course, some overlap between the two fields—an ANN still needs connection weights suited to its task, whether evolved or not, and it's possible that evolved ANNs might leverage the methods used in deep learning (for instance, stochastic gradient descent) to obtain those weights. In fact, deep learning might even be viewed as a sibling of neuroevolution that studies how weights are learned within either an evolved or preconceived architecture.

However, it's also conceivable that the mechanism of learning itself could be evolved, potentially transcending or elaborating the conventional techniques of deep learning as well. In short, the brain—including its architecture and how it learns—is a product of natural evolution, and neuroevolution can probe all the factors that contribute to its emergence, or borrow some from deep learning and let evolution determine the rest.
psychology  ai  Emergence 
december 2017 by janpeuker
How Adversarial Attacks Work
The simplest yet still very efficient algorithm is known as Fast Gradient Step Method (FGSM). The core idea is to add some weak noise on every step of optimization, drifting towards the desired class — or, if you wish, away from the correct one. Sometimes we will have to limit the amplitude of noise to keep the attack subtle — for example, in case a human might be investigating our shenanigans. The amplitude in our case means the intensity of a pixel’s channel — limiting it ensures that the noise will be almost imperceptible, and in the most extreme case will look like an overly compressed JPEG.
security  ai  research 
december 2017 by janpeuker
Feature Visualization
Neural feature visualization has made great progress over the last few years. As a community, we’ve developed principled ways to create compelling visualizations. We’ve mapped out a number of important challenges and found ways of a addressing them.

In the quest to make neural networks interpretable, feature visualization stands out as one of the most promising and developed research directions. By itself, feature visualization will never give a completely satisfactory understanding. We see it as one of the fundamental building blocks that, combined with additional tools, will empower humans to understand these systems.
ai  visualization  research  Emergence 
december 2017 by janpeuker
Crapularity Hermeneutics
Compared with 1970s/1980s database dragnets, contemporary big data analytics have only become even more speculative, since their focus is no longer on drawing conclusions for the present from the past, but on guessing the future, and since they no longer target people based on the fact that their data matches other database records but instead based on more speculative statistical probabilities of environmental factors and behavioral patterns. Whether or not human-created (and hence human-tainted) data is to be blamed for discrimination, or for the hidden assumptions hard-coded into algorithms that are employed for processing this data – or whether machine-generated data can even be biased – they all confirm Cayley’s observation that language is “easy to capture but difficult to read”;
The “open society” is now better known under the name coined by Popper’s Mont Pelerin Society collaborator Alexander Rüstow, “neoliberalism”,90 which has historically proven to be able to falsify anything but itself.

This explains the resurgence of fascism and other forms of populism in the context of the crapularity. On the basis of Carl Schmitt’s political theology, populism offers a more honest alternative to the existing regime: against equilibrium promises and crapular reality, the proposed antidote is the state of exception; against invisible hands, the remedy is decision-making as a virtue in itself, what Schmitt referred to as “decisionism”.91 In other words, the states of exception and decisionism that various “systems” (from international political treaties to big data analytics) and post-democratic powers currently conceal, seem to become tangible and accountable again through populist re-embodiment.
philosophy  ai  cybernetics  analytics 
december 2017 by janpeuker
How Cargo Cult Bayesians encourage Deep Learning Alchemy
Perhaps there is an equivalent to this in deep learning? “Every time you fire a statistician or Bayesian, then the performance of your deep learning system goes up.” ;-) The insinuation of Jelinek’s quote is that premature ideas of how complex systems work can be detrimental to its performance. We understand this in computer science as premature optimization, where if we pre-maturely optimize a subcomponent it can become a performance bottleneck later.
The legendary Isaac Newton was in fact very involved in alchemy. Here’s an image of his manuscript on the subject of transmutation for gold:
mathematics  algorithm  ai  history  Emergence 
december 2017 by janpeuker
Understanding Hinton’s Capsule Networks. Part I: Intuition.
Inspired by this idea, Hinton argues that brains, in fact, do the opposite of rendering. He calls it inverse graphics: from visual information received by eyes, they deconstruct a hierarchical representation of the world around us and try to match it with already learned patterns and relationships stored in the brain. This is how recognition happens. And the key idea is that representation of objects in the brain does not depend on view angle.
ai  psychology  research 
december 2017 by janpeuker
Machine Learning for Creativity and Design | NIPS 2017 Workshop, Long Beach, California, USA
We will look at algorithms for generation and creation of new media and new designs, engaging researchers building the next generation of generative models (GANs, RL, etc) and also from a more information-theoretic view of creativity (compression, entropy, etc). We will investigate the social and cultural impact of these new models, engaging researchers from HCI/UX communities.
conference  ai  innovation  art 
december 2017 by janpeuker
Understand Deep Residual Networks — a simple, modular learning framework that has redefined state…
Let us consider a shallower architecture and its deeper counterpart that adds more layers onto it. There exists a solution to the deeper model by construction: the layers are copied from the learned shallower model, and the added layers are identity mapping. The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart.
ai  algorithm  performance 
november 2017 by janpeuker
Solving Logistic Regression with Newton's Method
I like to think of the likelihood function as “the likelihood that our model will correctly predict any given yy value, given its corresponding feature vector x̂ x^”. It is, however, important to distinguish between probability and likelihood..

Now, we expand our likelihood function by applying it to every sample in our training data. We multiply each individual likelihood together to get the cumulative likelihood that our model is accurately predicting yy values of our training data:
mathematics  howto  ai  algorithm 
november 2017 by janpeuker
TensorFlow and deep learning, without a PhD
TensorFlow and deep learning, without a PhD
ai  howto  library 
november 2017 by janpeuker
Colaboratory – Google
Colaboratory is a research project created to help disseminate machine learning education and research. It’s a Jupyter notebook environment that requires no setup to use. For more information, see our FAQ.
Python  visualization  ai  google 
november 2017 by janpeuker
[1711.00165] Deep Neural Networks as Gaussian Processes
In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent.
ai  algorithm  research 
november 2017 by janpeuker
[1710.08864] One pixel attack for fooling deep neural networks
73.8% of the test images can be crafted to adversarial images with modification just on one pixel with 98.7% confidence on average. In addition, it is known that investigating the robustness problem of DNN can bring critical clues for understanding the geometrical features of the DNN decision map in high dimensional input space.
security  visualization  ai 
october 2017 by janpeuker
NeuroEvolution with MarI/O
Seth’s implementation (in Lua) is based on the concept of NeuroEvolution of Augmenting Topologies (or NEAT). NEAT is a type of genetic algorithm which generates efficient artificial neural networks (ANNs) from a very simple starting network. It does so rather quickly too (compared to other evolutionary algorithms).
ai  games  Emergence 
october 2017 by janpeuker
Edward – Home
A library for probabilistic modeling, inference, and criticism.

Edward is a Python library for probabilistic modeling, inference, and criticism. It is a testbed for fast experimentation and research with probabilistic models, ranging from classical hierarchical models on small data sets to complex deep probabilistic models on large data sets. Edward fuses three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming.

It supports modeling with

Directed graphical models
Neural networks (via libraries such as Keras and TensorFlow Slim)
Implicit generative models
Bayesian nonparametrics and probabilistic programs
ai  engineering  mathematics  model 
october 2017 by janpeuker
Machine Learning FAQ
Random Forests vs. SVMs

I would say that random forests are probably THE “worry-free” approach - if such a thing exists in ML: There are no real hyperparameters to tune (maybe except for the number of trees; typically, the more trees we have the better). On the contrary, there are a lot of knobs to be turned in SVMs: Choosing the “right” kernel, regularization penalties, the slack variable, …

Both random forests and SVMs are non-parametric models (i.e., the complexity grows as the number of training samples increases). Training a non-parametric model can thus be more expensive, computationally, compared to a generalized linear model, for example. The more trees we have, the more expensive it is to build a random forest. Also, we can end up with a lot of support vectors in SVMs; in the worst-case scenario, we have as many support vectors as we have samples in the training set. Although, there are multi-class SVMs, the typical implementation for mult-class classification is One-vs.-All; thus, we have to train an SVM for each class – in contrast, decision trees or random forests, which can handle multiple classes out of the box.

To summarize, random forests are much simpler to train for a practitioner; it’s easier to find a good, robust model. The complexity of a random forest grows with the number of trees in the forest, and the number of training samples we have. In SVMs, we typically need to do a fair amount of parameter tuning, and in addition to that, the computational cost grows linearly with the number of classes as well.
ai  howto  algorithm 
october 2017 by janpeuker
Research Blog: TensorFlow Lattice: Flexibility Empowered by Prior Knowledge
We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions
ai  google  library  analytics 
october 2017 by janpeuker
CS231n Convolutional Neural Networks for Visual Recognition
These notes accompany the Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition.
For questions/concerns/bug reports contact Justin Johnson regarding the assignments, or contact Andrej Karpathy regarding the course notes. You can also submit a pull request directly to our git repo.
We encourage the use of the hypothes.is extension to annote comments and discuss these notes inline.
ai  howto 
october 2017 by janpeuker
The Unreasonable Effectiveness of Recurrent Neural Networks
Viewed this way, RNNs essentially describe programs. In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights). But similar to universal approximation theorems for neural nets you shouldn’t read too much into this. In fact, forget I said anything.
ai  learning  engineering  Emergence  reference 
october 2017 by janpeuker
DAOs, DACs, DAs and More: An Incomplete Terminology Guide - Ethereum Blog
an AI is completely autonomous, whereas a DAO still requires heavy involvement from humans specifically interacting according to a protocol defined by the DAO in order to operate. We can classify DAOs, DOs (and plain old Os), AIs and a fourth category, plain old robots, according to a good old quadrant chart, with another quadrant chart to classify entities that do not have internal capital thus altogether making a cube:

DAOs == automation at the center, humans at the edges. Thus, on the whole, it makes most sense to see Bitcoin and Namecoin as DAOs, albeit ones that barely cross the threshold from the DA mark.
economics  ai  blockchain  reference 
october 2017 by janpeuker
[1705.07962] pix2code: Generating Code from a Graphical User Interface Screenshot
Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that deep learning methods can be leveraged to train a model end-to-end to automatically generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies).
gui  ai  research  design 
october 2017 by janpeuker
ML Algorithms addendum: Passive Aggressive Algorithms - Giuseppe Bonaccorso
Temporal - Time-based Algorithm

Crammer K., Dekel O., Keshet J., Shalev-Shwartz S., Singer Y., Online Passive-Aggressive Algorithms, Journal of Machine Learning Research 7 (2006) 551–585
ai  research 
october 2017 by janpeuker
Forget Killer Robots—Bias Is the Real AI Danger - MIT Technology Review
The problem of bias in machine learning is likely to become more significant as the technology spreads to critical areas like medicine and law, and as more people without a deep technical understanding are tasked with deploying it. Some experts warn that algorithmic bias is already pervasive in many industries, and that almost no one is making an effort to identify or correct it (see “Biased Algorithms Are Everywhere, and No One Seems to Care”).
psychology  bias  ai  analytics 
october 2017 by janpeuker
« earlier      
per page:    204080120160

Copy this bookmark: