Discover Top Posts Tagged with #probabilistic programming

New Research on Probabilistic Programming

We're excited to release the latest research from our machine intelligence R&D team!

This report and prototype explore probabilistic programming, an emerging programming paradigm that makes it easier to construct and fit Bayesian inference models in code. It's advanced statistics, simplified for data scientists looking to build models fast. Bayesian inference has been popular in scientific research for a long time. The statistical technique allows us to encode expert knowledge into a model by stating prior beliefs about what we think our data looks like. These prior beliefs are then updated in light of new data, providing not one prediction, but a full distribution of likely answers with baked-in confidence rates. This allows us to asses the risk of our decisions with more nuance. Bayesian methods lack widespread commercial use because they're tough to implement. But probabilistic programming reduces what used to take months of thorny statistical sampling into an afternoon of work. This will further expand the utility of machine learning. Bayesian models aren't black boxes, a criterion for regulated industries like healthcare. Unlike deep learning networks, they don't require large, clean data sets or large amounts of GPU processing power to deliver results. And they bridge human knowledge with data, which may lead to breakthroughs in areas as diverse as anomaly detection and music analysis.

Our work on probabilistic programming includes two prototypes and a report that teaches you:

How Bayesian inference works and where it's useful

Why probabilistic programming is becoming possible now

When to use probabilistic programming and what the code looks like

What tools and languages exist today and how they compare

Which vendors offer probabilistic programming products

Finally, as in all our research, we predict where this technology is going, and applications for which it will be useful in the next couple of years.

Probabilistic Real Estate Prototype

One powerful feature of probabilistic programming is the ability to build hierarchical models, which allow us to group observations together and learn from their similarities. This is practical in contexts like user segmentation: individual users often shares tastes with other users of the same sex, age group, or location, and hierarchical models provide more accurate predictions about individuals by leveraging learnings from the group.

We explored using probabilistic programming for hierarchical models in our Probabilistic Real Estate prototype. This prototype predicts future real estate prices across the New York City boroughs. It enables you to input your budget (say $1.6 million) and shows you the probability of finding properties in that price range across different neighborhoods and future time periods.

Hierarchical models helped make predictions in neighborhoods with sparse pricing data. In our model, we declared that apartments are in neighborhoods and neighborhoods are in boroughs; on average, apartments in one neighborhood are more similar to others in the same location than elsewhere. By modeling this way, we could learn about the West Village not only from the West Village, but also from the East Village and Brooklyn. That means, with little data about the West Village, we could use data from the East Village to fill in the gaps!

Many companies suffer from imperfect, incomplete data. These types of inferences can be invaluable to improve predictions based on real-world dependencies.

Play around with the prototype! You'll see how the color gradients give you an intuitive sense for what probability distributions look like in practice.

How to Access our Reports & Prototypes

We're offering our research on probabilistic programming in a few ways:

Single Report & Prototype (digital and physical copies)

Annual Research Subscription (access to all our research)

Subscription & Advising (research & time with our team)

Special Projects (dedicated help to build a great data product)

Write to us at [email protected] if you'd like to learn more!

#probabilistic programming #announcements #data science #bayesian statistics

Thomas Wiecki on Probabilistic Programming with PyMC3

A rolling regression with PyMC3: instead of the regression coefficients being constant over time (the points are daily stock prices of 2 stocks), this model assumes they follow a random-walk and can thus slowly adapt them over time to fit the data best.

Probabilistic programming is coming of age. While normal programming languages denote procedures, probabilistic programming languages denote models and perform inference on these models. Users write code to specify a model for their data, and the languages run sampling algorithms across probability distributions to output answers with confidence rates and levels of uncertainty across a full distribution. These languages, in turn, open up a whole range of analytical possibilities that have historically been too hard to implement in commercial products.

One sector where probabilistic programming will likely have significant impact is financial services. Be it when predicting future market behavior or loan defaults, when analyzing individual credit patterns or anomalies that might indicate fraud, financial services organizations live and breathe risk. In that world, a tool that makes it easy and fast to predict future scenarios while quantifying uncertainty could have tremendous impact. That’s why Thomas Wiecki, Director of Data Science for the crowdsourced investment management firm Quantopian, is so excited about probabilistic programming and the new release of PyMC3 3.0.

We interviewed Dr. Wiecki to get his thoughts on why probabilistic programming is taking off now and why he thinks it’s important. Check out his blog, and keep reading for highlights!

#probabilistic programming #data science #interview #bayesian statistics

Derivatives and differential equations

If y=f(x) is the function than y ‘ is the slope , defined as rate of change of y with respect to x.

We know that equation for line is y = mx +c, where m is the slope.

m = y / x

Therefore is there is a function that maps has two coordinates (x1, y1) and (x2,y2) on it then

y1 = f(x1) and y2= f(x2)

If we draw a secant to these the slope of that secant will be

m = y2-y1 / x2-x1

m = f(x2) – f(x1) /…

View On WordPress

#derivatives #differential equations #probabilistic programming

Why should you become skilled at Probabilistic Programming Language?

PPL is a programming language that is designed to describe probabilistic models and to perform inference in those models. PPL is closely related to graphical model and Bayesian networks, but are more expressive and flexible.

Probabilistic programming creates a system which helps to make the decision in the face of uncertainty. Probabilistic reasoning combines knowledge of a situation with the laws of probability. Probabilistic programming is a new approach to makes probabilistic reasoning systems easier to build and more widely applicable. The program used to inverse graphics as the basis of its conferencing.

It’s very crucial that you people know about the probabilistic reasoning that this is used for predicting stock prices, recommending movies, diagnosing computers, detecting cyber intrusions, and image detection. Another thing about this programming language is that it is also necessary for AI/AGI.

For decades, scientists developed probabilistic models in various fields of exploration without any of the benefits or dedicated programming languages or deep neural networks. Since these models involve Bayesian inference which is often intractable integrals, they sap the productivity of experts and are beyond the reach of non-experts. The compiler checks program for type errors and translates it to a form suitable for an inference procedure, which uses observed output data to fit the latent distributions. Probabilistic models which show great promise: they overtly represent uncertainty and demonstrated to enable explainable machine learning even in the important but difficult case of small training data.

Probabilistic programming is slowly gaining momentum over the past few years. There is an argument between "Intuitive Physics", "Inverse Graphics" and more generally for structured generative models. The traction in the industry also grown due to this, with uber releasing their own Probabilistic Programming library on top of PyTorch.

Students who are looking forward to learn Probabilistic Programming they can either go to institutes for learning it but nowadays there is trend of internet and everything is available online. But online platform has become the new trend both among students as well as teacher; teachers can also schedule there time accordingly and then give lectures and students can also gain the knowledge sitting anywhere in the world. As the online platform is the most convenient platform to learn anything at a very affordable price as well as it will save the time and the traveling expense and also you can watch the videos or notes provided by them again and again

#probabilistic programming #ppl

Gain some Knowledge about the Effectiveness of Probabilistic Programming

The machine learning communities and the programming languages have developed an amount of research interest under the department of Probabilistic Programming over the last few years. The idea behind this concept is to export efficient PL concepts like reuse to statistical modeling and abstraction that is arduous and arcane task.

What and Why

Probabilistic Programming

Probabilistic programming is not always about writing software that behaves like probabilistically. For example if any of your program calls r and (3) as a specific part of the work as it is specified to do- as in cryptographic key generator or say in ASLR implementation in an OS kernel or a simulated annealing optimizer for various circuit designs- that’s all techniques are good but this is not about what this topic is about.

It seems to be best of not writing software completely. By the method of analogy, traditional languages such as C++, Python and Haskell are very different from the philosophy but you can utilize their power for any of them to write, say, a categorized system for cat pictures or an alternative way to LaTeX. Amongst them one must be better for a given domain but they are all workable. But it is not definite with the probabilistic programming languages (PPL). Its more like prolog: its surely a programming language that can write software.

Why Probabilistic Programming should be chosen?

Basically probabilistic programming is a tool for statistical modeling. Mainly the idea is to borrow lessons from programming languages and implement them to the problem of designing and using statistical model. Experts construct the statistical model by hand by drawing mathematical notation on paper but it’s an expert only because it’s hard to support it through mechanical reasons. The key insight in PP is that when statistic is don many times then it feels a lot like programming. Many new tools become realistic when it comes to use a real language for the modeling. You can start automatically with the task that is utilized to write on a paper for instance use.

A probabilistic programming language is a common programming language with rand and a big pile of similar tool which will help you to understand the program’s statistical behavior.

Both of the definition is exact. They emphasize on different angles with same core of idea. What makes sense to you it only depends on what you want to utilize in PP. But don’t get distracted with the fact that PPL program looks mostly like ordinary software implementation where the job is to run the program and get some type of output. The main focus of PP is analysis on execution.

Modeling the Problem

The way of approaching this problem through machine is to model the situation using random variants in which some of them are latent. This is latent random variables in a specific model for explaining the situation completely. By allowing you with latent variables make it very easy to reach directly to the problem.

#Probabilistic Programming

Under the Hood of the Variational Autoencoder (in Prose and Code)

The Variational Autoencoder (VAE) neatly synthesizes unsupervised deep learning and variational Bayesian methods into one sleek package. In Part I of this series, we introduced the theory and intuition behind the VAE, an exciting development in machine learning for combined generative modeling and inference—“machines that imagine and reason.”

To recap: VAEs put a probabilistic spin on the basic autoencoder paradigm—treating their inputs, hidden representations, and reconstructed outputs as probabilistic random variables within a directed graphical model. With this Bayesian perspective, the encoder becomes a variational inference network, mapping observed inputs to (approximate) posterior distributions over latent space, and the decoder becomes a generative network, capable of mapping arbitrary latent coordinates back to distributions over the original data space.

The beauty of this setup is that we can take a principled Bayesian approach toward building systems with a rich internal “mental model” of the observed world, all by training a single, cleverly-designed deep neural network.

These benefits derive from an enriched understanding of data as merely the tip of the iceberg—the observed result of an underlying causative probabilistic process.

The power of the resulting model is captured by Feynman’s famous chalkboard quote: “What I cannot create, I do not understand.” When trained on MNIST handwritten digits, our VAE model can parse the information spread thinly over the high-dimensional observed world of pixels, and condense the most meaningful features into a structured distribution over reduced latent dimensions.

Having recovered the latent manifold and assigned it a coordinate system, it becomes trivial to walk from one point to another along the manifold, creatively generating realistic digits all the while:

In this post, we’ll take a look under the hood at the math and technical details that allow us to optimize the VAE model we sketched in Part I.

Along the way, we’ll show how to implement a VAE in TensorFlow—a library for efficient numerical computation using data flow graphs, with key features like automatic differentiation and parallelizability (across clusters, CPUs, GPUs…and TPUs if you’re lucky). You can find (and tinker with!) the full implementation here, along with a couple pre-trained models.

#code #deep learning #probabilistic programming

Introducing Variational Autoencoders (in Prose and Code)

Effective machine learning means building expressive models that sift out signal from noise—that simplify the complexity of real-world data, yet accurately intuit and capture its subtle underlying patterns.

Whatever the downstream application, a primary challenge often boils down to this: How do we represent, or even synthesize, complex data in the context of a tractable model?

This challenge is compounded when working in a limited data setting—especially when samples are in the form of richly-structured, high-dimensional observations like natural images, audio waveforms, or gene expression data.

Cue the Variational Autoencoder, a fascinating development in unsupervised machine learning that marries probabilistic Bayesian inference with deep learning.

Benefiting from advances in both research communities, the Variational Autoencoder addresses these challenges by leveraging innovative deep learning techniques grounded in a solid Bayesian theoretical framework...and can be explained through mesmerizing GIFs:

(Read on, and all will become clear...)

Intro

Traditional autoencoders are models (usually multilayer artificial neural networks) designed to output a reconstruction of their input. Specifically, autoencoders sequentially deconstruct input data into hidden representations, then use these representations to sequentially reconstruct outputs that resemble the originals. Fittingly, this process of teasing out a mapping from input to hidden representation is called representation learning.

The appeal of this setup is that the model learns its own definition of a "meaningful" representation based only on the data—no human-derived heuristics or labels! This approach stands in contrast to the majority of deep learning systems in production today, which rely on expensive-to-obtain labeled data ("This image is a kitten; this image is a panda."). Alternatives to such supervised learning frameworks provide a way to benefit from a world brimming with valuable raw data.

Though trained holistically, autoencoders are often built for the part instead of the whole: researchers might exploit the data-to-representation mapping for semantic embeddings, or the representation-to-output mapping for extraordinarily complex generative modeling

But an autoencoder with unlimited capacity is doomed to the role of a wonky, computationally-expensive Xerox machine. To ensure that the transformations to or from the hidden representation are useful, we impose some type of regularization or constraint. As a tradeoff for some loss in fidelity, such impositions push the model to distill the most salient features from a cacophonous real-world dataset.

Variational Autoencoders (VAEs) incorporate regularization by explicitly learning the joint distribution over data and a set of latent variables that is most compatible with observed datapoints and some designated prior distribution over latent space. The prior informs the model by shaping the corresponding posterior, conditioned on a given observation, into a regularized distribution over latent space (the coordinate system spanned by the hidden representation).

As a result, VAEs are an excellent tool for manifold learning—recovering the "true" manifold in lower-dimensional space along which the observed data lives with high probability mass—and generative modeling of complex datasets like images, text, and audio—conjuring up brand new examples, consistent with the observed training set, that do not exist in nature.

Building on other informative posts, this is the first installment of a guide to Variational Autoencoders: the lovechild of Bayesian inference and unsupervised deep learning.

In this post, we'll sketch out the model and provide an intuitive context for the math- and code-flavored follow-up. In Post II, we'll walk through a technical implementation of a VAE (in TensorFlow and Python 3). In Post III, we'll venture beyond the popular MNIST dataset using a twist on the vanilla VAE.

The Variational Autoencoder Setup

An end-to-end autoencoder (input to reconstructed input) can be split into two complementary networks: an encoder and a decoder. The encoder maps input $x$ to a latent representation, or so-called hidden code, $z$. The decoder maps the hidden code to reconstructed input value $\tilde x$.

Whereas a vanilla autoencoder is deterministic, a Variational Autoencoder is stochastic—a mashup of:

a probabilistic encoder $q_\phi(z|x)$, approximating the true (but intractable) posterior distribution $p(z|x)$, and

a generative decoder $p_\theta(x|z)$, which notably does not rely on any particular input $x$.

Both the encoder and decoder are artificial neural networks (i.e. hierarchical, highly nonlinear functions) with tunable parameters $\phi$ and $\theta$, respectively.

Learning these conditional distributions is facilitated by enforcing a plausible mathematically-convenient prior over the latent variables, generally a standard spherical Gaussian: $z \sim \mathcal{N}(0, I)$.

Given this conjugate prior, the encoder's job is to supply the mean and variance of the Gaussian posterior over each latent space dimension corresponding to a given input. Latent $z$ is sampled from this distribution, then passed to the decoder to be transformed back into a distribution over the original data space.

In other words, a VAE represents a directed probabilistic graphical model, in which approximate inference is performed by the encoder and optimized alongside an easy-to-sample generative decoder. For this reason, these complementary halves are also known as the inference (or recognition) network and the generative network. By reformulating this graphical model as a differentiable neural net with a single, pithy cost function (derived from the variational lower bound), the whole package can be trained by stochastic gradient descent (SGD) thanks to the "amusing" universe we live in.

Bayes, Meet Neural Networks

In fact, many developments in deep learning research can also be understood through a probabilistic, or Bayesian, lens. Some of these analogies are more theoretical, whereas others share a parallel mathematical interpretation. For example, $\ell_2$-regularization can be viewed as imposing a Gaussian prior over neural network weights, and reinforcement learning can be formalized through variational inference.

VAEs exemplify a case where this relationship is made explicit and elegant, and variational Bayesian inference is the guiding principle shaping the model's cost function and instrinsic architecture.

Why does this setup make sense?

In the Bayesian worldview, datapoints are observations drawn from some data-generating distribution: (observed) variable $x \sim p(x)$. So, the MNIST dataset of handwritten digits describes a random variable with an intricate set of dependencies among all 28*28 pixels. Each MNIST image offers a glimpse into one arrangement of 784 pixel values with high probability—whereas a 28*28 block of white noise, or the Jolly Roger, (theoretically) occupy low probability mass under the distribution.

It would be a headache to model the conditional dependencies in 784-dimensional pixel space. Instead, we make the simplifying assumption that the distribution over these observed variables is the consequence of a distribution over some set of hidden variables: $z \sim p(z)$. Intuitively, this paradigm is analogous to how scientists study the natural world, by working backwards from observed phenomena to recover the unifying hidden laws that govern them. In the case of MNIST, these latent variables could represent concepts like number identity and tiltedness, whereas more complex natural images like the Frey faces could have latent dimensions for facial expression and azimuth.

Inference is the process of disentangling these rich real-world dependencies into simplified latent dependencies, by predicting $p(z|x) -$ the distribution over one set of variables (the latent variables) conditioned on another variable (the observed data). (This is where Bayes' theorem enters the picture.)

With this Bayesian frame-of-mind, training a generative model is the same as learning the joint distribution over the data and latent variables: $p(x, z)$. This approach lends itself well to small datasets, since inference relies on the data-generating distribution rather than individual datapoints per se. It also lets us bake prior knowledge into the model by imposing simplifying a priori distributions over variables.

Classical (iterative, non-learned) approaches to inference are often inefficient and do not scale well to large datasets. With a few theoretical and mathematical tricks, we can train a neural network to do the dirty work of both variational inference and generative modeling...while reaping the additional benefits deep learning provides (universal approximating power, cheap test-time evaluation, minibatched SGD, advances like batch normalization and dropout, etc).

The next post in the series will delve into these theoretical and mathematical tricks and show how to implement them in TensorFlow (a toolbox for efficient numerical computation with data flow graphs).

MNIST

For now, we will take our VAE model for a spin using handwritten MNIST digits.

import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data import vae # this is our model - to be explored in the next post IMG_DIM = 28 ARCHITECTURE = [IMG_DIM**2, # 784 pixels 500, 500, # intermediate encoding 50] # latent space dims # (and symmetrically back out again) HYPERPARAMS = { "batch_size": 128, "learning_rate": 1E-3, "dropout": 0.9, "lambda_l2_reg": 1E-5, "nonlinearity": tf.nn.elu, "squashing": tf.nn.sigmoid } mnist = input_data.read_data_sets("mnist_data") v = vae.VAE(ARCHITECTURE, HYPERPARAMS) v.train(mnist, max_iter=20000)

Let's verify the model by eye, by plotting how well it parses random MNIST inputs (top) and reconstructs them (bottom):

Note that these inputs are from the test set, so the model has never seen them before. Not bad!

For latent space visualizations, we can train a VAE with 2-D latent variables (though this space is generally too small for the intrinsic dimensionality of real-world data). Picturing this compressed latent space lets us see how the model has disentangled complex raw data into abstract higher-order features.

We'll visualize the latent manifold over the course of training in two ways, to see the complementary evolution of the encoder and decoder over (logarithmic) time.

This is how the encoder/inference network learns to map the training set from the input data space to the latent space...

...and this is how the decoder/generative network learns to map latent coordinates into reconstructions of the original data space:

Here we are sampling evenly-spaced percentiles along the latent manifold and plotting their corresponding output from the decoder, with the same axis labels as above.

Looking at both plots side-by-side clarifies how optimizing the encoder and decoder in tandem enables efficient pairing of inference and generation:

This tableau highlights the overall smoothness of the latent manifold—and how any "unrealistic" outputs from the generative decoder correspond to apparent discontinuities in the variational posterior of the encoder (e.g. between the "7-space" and the "1-space"). These gaps could probably be improved by experimenting with model hyperparameters.

Whereas the original data dotted a sparse landscape in 784 dimensions, where "realistic" images were few and far between, this 2-dimensional latent manifold is densely populated with such samples. Beyond its inherent visual coolness, latent space smoothness shows the model's ability to leverage its "understanding" of the underlying data-generating process to generalize beyond the training set.

Smooth interpolation within and between digits—in contrast to the spotty latent space characteristic of many autoencoders—is a direct result of the variational regularization intrinsic to VAEs.

Take-aways

Bayesian methods provide a framework for reasoning about uncertainty. Deep learning provides an efficient way to approximate arbitrarily complex functions, and ripe opportunities to probe uncertainty (over parameters, hyperparameters, data, model architectures...).

While differences in language can obscure overlapping ideas, recent research has revealed not just the power of cross-validating theories across fields (interesting in itself), but also a productive new methodology through a unified synthesis of the two.

This research becomes ever more relevant as we seek to leverage today's most interesting real-world data, which is often high-dimensional and rich in structure, yet limited in number and wholly or partially unlabeled.

(But don't take my word for it.)

Variational Autoencoders are:

A reminder that productive sparks fly when deep learning and Bayesian methods are not treated as alternatives, but combined.

Just the beginning of creative applications for deep learning.

Stay tuned for more technical details (math and code!) in Part II.

- Miriam

#code #deep learning #probabilistic programming

Probabilistic Programming for Anomaly Detection

The Fast Forward Labs research team is developing our next prototype, which will demonstrate an application of probabilistic programming. Probabilistic programming languages are a set of high-level languages that lower the barrier to entry for Bayesian data analysis.

Bayesian data analysis is often seen as the best approach to machine learning. Models derived by this process are highly interpretable, in contrast to other modern models like neural networks and support vector machines. Transparency like this is crucial in industries - such as healthcare and financial services - that have a legal or ethical duty to ensure safety or fairness.

On top of that transparency, the results of Bayesian modeling are complete probability distributions, which means their predictions come with meaningful confidence intervals. Confidence is an important part of interpretability, but is also a key ingredient for deciding whether to act on a prediction immediately or incur the cost of obtaining more data (as in active learning).

Interpretability and confidence have made Bayesian inference very popular in experimental science, where the explicit goal is interpreting a model in the context of data and obtaining more data can be expensive. But Bayesian inference was little used outside academia until recently: as it turns out, the practical engineering challenges of applying it in businesses are enormous.

Probabilistic programming languages are changing the game. The algorithms used in Bayesian inference are baked into these languages as primitives, and the syntax is optimized to permit precise and concise specification of complex models. Thanks to recent algorithmic advances, users don’t even have set tuning parameters: they simply state the structure of the model, feed in the data, and let the language take care of the rest.

To illustrate the power of probabilistic programming, we developed an iPython notebook that shows how it simplifies and improves anomaly detection. In it, we show a traditional approach to anomaly detection, notice where that approach starts to fail, and show how probabilistic programming provides a more rigorous and robust approach.

- Noam

#code #whitepaper #probabilistic programming #anomaly detection

Under the Hood of the Variational Autoencoder (in Prose and Code)

These benefits derive from an enriched understanding of data as merely the tip of the iceberg—the observed result of an underlying causative probabilistic process.

In this post, we’ll take a look under the hood at the math and technical details that allow us to optimize the VAE model we sketched in Part I.

#code #deep learning #probabilistic programming

#probabilistic programming

Trending Tags

Recently Viewed Tags

#probabilistic programming