Machine Vision and Intelligence @robovis - Tumblr Blog

the RNN time series data transformation

Face mask detection with Georgios Ouzounis. The method uses the Yolo face detector and custom NN models for real-time edge-based analytics

#facemask #georgiosouzounis #yolo

super-aidev

Snagging Parking Spaces with Mask R-CNN and Python ☞ https://school.geekwall.in/p/SJRJn01Nr/snagging-parking-spaces-with-mask-r-cnn-and-python

#ai #deeplearning

super-aidev

Intelligent Scanning Using Deep Learning for MRI ☞ https://school.geekwall.in/p/HyIrsP3VPE/intelligent-scanning-using-deep-learning-for-mri

#ai #tensorflow

super-aidev

Deep Learning for Computer Vision with Tensor Flow and Keras ☞ http://bit.ly/2vqYXPKa

#ai #TensorFlow

illustrationinphysics-blog-blog

Complex analysis - the perfect subject for combining visual and analytic thinking. I just read “Visual Complex Analysis” by Tristan Needham, a basic book on complex analysis: lots of historical references, uncompromising explanations, lots of problem solving and plenty of beautiful illustrations!

shadowpeoplearejerks

I have this book. It is very good.

youtalk-deactivated20200611

ディープラーニングを使った物体検出アルゴリズムとして名を馳せるYOLOがOpenCVにマージされたそうです。これで使いやすくなりますね。

(via YOLO v2 - YouTube)

astromanes

Robot Readable World

tensorflow4u

Coming up, we’re covering updates to TensorFlow Datasets, helping models and data to play nice. #TFDevSummit

Watch here → https://t.co/L1MWwrvpR9 https://t.co/QzZfb7aSxX

tensorflow4u

A new Boosted Trees model is available in TensorFlow 2.0! #TFDevSummit

Check out the article and tutorials to learn more → https://t.co/XnTV6F7ctg https://t.co/9s0sjAwu1z

datascience-from-the-trenches

Viewing Matrices & Probability as Graphs

Source: https://www.math3ma.com/blog/matrices-probability-graphs

Viewing Matrices & Probability as Graphs

aiweirdness

It takes a bot to know one?

A couple of weeks ago, I wrote about GPT-2, a text-generating algorithm whose huge size and long-term analysis abilities mean that it can generate text with an impressive degree of coherence. So impressive, in fact, that its programmers at OpenAI have only released a mini version of the model for now, worried that people may abuse the full-size model’s easy-to-generate, almost-plausibly-human text.

(below: some text generated by mini-GPT-2, in response to the prompt in italics)

This was a fantastic recipe for chocolate cake with raspberry sauce! I only made a couple of changes to the recipe. First, I added vanilla candles instead of meringues for a more mild and exotic fragrance. Once again, I only used 1 tsp of vanilla syrup for clarity. Second, the chocolate cake whipped cream was tempered by an additional 1 tsp of canola oil. The regular vegan whipped cream is soothing and makes it pleasing to the hungry healthiest person I know!

In the meantime, as OpenAI had hoped, people are working on ways to automatically detect GPT-2′s text. Using a bot to detect another bot is a strategy that can work pretty well for detecting fake logins, video, or audio. And now, a group from MIT-IBM Watson AI lab and Harvard NLP has come up with a way of detecting fake text, using GPT-2 itself as part of the detection system.

The idea is fairly simple: GPT-2 is better at predicting what a bot will write than what a human will write. So if GPT-2 is great at predicting the next word in a bit of text, that text was probably written by an algorithm - maybe even by GPT-2 itself.

There’s a web demo that they’re calling Giant Language model Test Room (GLTR), so naturally I decided to play with it.

First, here’s some genuine text generated by GPT-2 (the full-size model, thanks to the OpenAI team being kind enough to send me a sample). Green words are ones that GLTR thought were very predictable, yellow and red words are less predictable, and purple words are ones the algorithm definitely didn’t see coming. There are a couple of mild surprises here, but mostly the AI knew what would be generated. Seeing all this green, you’d know this text is probably AI-generated.

Here, on the other hand, is how GLTR analyzed some human-written text, the opening paragraph of the Murderbot diaries. There’s a LOT more purple and red. It found this human writer to be more unpredictable.

But can GLTR detect text generated by another AI, not just text that GPT-2 generates? It turns out it depends. Here’s text generated by another AI, the Washington Post’s Heliograf algorithm that writes up local sports and election results into simple but readable articles. Sure enough, GLTR found Heliograf’s articles to be pretty predictable. Maybe GPT-2 had even read a lot of Heliograf articles during training.

However, here’s what it did with a review of Avengers: Infinity War that I generated using an algorithm Facebook trained on Amazon reviews. It’s not an entirely plausible review, but to GLTR it looks a lot more like the human-written text than the AI-generated text. Plenty of human-written text scores in this range.

And here’s how GLTR rated another Amazon review by that same algorithm. A human might find this review to be a bit suspect, but, again, the AI didn’t score this as bot-written text.

What about an AI that’s really, really bad at generating text? How does that rate? Here’s some output from a neural net I trained to generate Dungeons and Dragons biographies. Whatever GLTR was expecting, it wasn’t fuse efforts and grass tricks.

But I generated that biography with the creativity setting turned up high, so my algorithm was TRYING to be unpredictable. What if I turned the D&D bio generator’s creativity setting very low, so it tries to be predictable instead? Would that make it easier for GLTR to detect? Only slightly. It still looks like unpredictable human-written text to GLTR.

GLTR is still pretty good at detecting text that GPT-2 generates - after all, it’s using GPT-2 itself to do the predictions. So, it’ll be a useful defense against GPT-2 generated spam.

But, if you want to build an AI that can sneak its text past a GPT-2 based detector, try building one that generates laughably incoherent text. Apparently, to GPT-2, that sounds all too human.

For more laughably incoherent text, I trained a neural net on the complete text of Black Beauty, and generated a long rambling paragraph about being a Good Horse. To read it, and GLTR’s verdict, enter your email here and I’ll send it to you.

nvidia-deactivated20190911

Wondering what color you should go with? Trained on our GPUs, ModiFace’s new AI colorizes hair in real time.

ModiFace’s team trained their collaborative neural networks on 220,000 carefully annotated hair images — the largest such database in the world. Their networks detect hair in each video frame and adjust the coloration of hair in a photorealistic way.

intentandoseringeniero

3Blue1Brown hizo hace tiempo una serie de vídeos sobre redes neuronales que ojalá lo hubiera descubierto antes. Sólo llevo la mitad y me está encantando porque sin entrar en grandes complejidades desarrolla bastante bien el concepto. Yo tenía una idea general de qué hacen las redes neuronales y el machine learning pero ahora voy teniendo también una idea de cómo lo hacen :D

danlowe-blog

The Future of Game Animation

Recently Ninja Theory Senior Animator Chris Goodall posed a question on Twitter: What do people think the future of game animation is going to be.

This is one of my favorite topics to think about, and so I was eager to share some thoughts.

Short Term: Motion Matching

GDC 2016 was Motion Matching’s big coming out party. The core ideas had been floating around the world of academic research for years before that, but this was the first time that actual game studios were starting to show this tech in practical scenarios. Two presentations were made: One by Kristjan Zadziuk, about prototypes in development at Ubisoft Toronto, and another by Simon Clavet about his work on For Honor. The buzz at the conference was palpable, and since then, there have been rumors circulating the industry, that a lot of other AAA teams are now starting to build their own Motion Matching technology.

For those who aren’t familiar, Motion Matching is a method of automatically picking which piece of animation should play next on a character, by allowing the system to make its own choices, as opposed to relying on Stateflow logic; which is the current, manually-crafted method, of deciding which animations should play.

The Motion Matching system makes these choices based on high-level goals that you feed into the system. So one of these high-level goals might be “2 seconds from now, I want the character to be in this position, and this facing direction”, which the system gets by predicting the future position of the character based on player inputs. Another common high level goal is, “match the position and velocity of the feet and the hips, as closely as possible to what was already happening in the previous frame”.

The end result is that Motion Matching has the potential to dramatically reduce the amount of work required when creating animation systems. It also tends to produce very high quality results: Since transitions from one move to the next, are taking into account hip and foot position and velocity, you tend to get really smooth blending, which is sometimes not the case with a traditional State Machine approach.

I expect that in the next few years, we’ll start to see Motion Matching used more and more in games. Of course, it doesn’t have to be an all or nothing switch from traditional systems; you can embed a Motion Matching system into a traditional State Machine, so for a while, you’ll see a kind of hybrid approach, where some moves will be using Motion Matching (e.g. locomotion), and others might use a more traditional implementation (e.g. scripted events). But I think gradually Motion Matching will replace the majority of moves that we see in games.

The initial response from some animators towards Motion Matching, was concern; that the ease with which you can create systems, might potentially reduce the need for animators. From what I’ve experienced so far, this is absolutely not the case: Motion Matching systems typically still benefit from the usual clipping down of data (or otherwise tagging data), and of course, that data is still better if it is cleaned up animation, rather than raw mocap.

The initial vision for Motion Matching was that you would be able to just throw a bunch of unstructured mocap into a Motion Matching database and the system would do everything for you, but it turns out this kind of approach doesn’t produce good results. Technically it does still work, but the system often makes unwanted choices (e.g. sometimes deciding that rather than playing a run cycle, it’s going to play the last two footsteps of an Idle to Start over and over and considers that a run), and so a lot of teams are finding that curating your animation data can give better results.

So in short, there will still be plenty for animators to do, in a Motion Matching world.

Short Term: Script Based Automation

At GDC 2016, I presented a new animation tool that Zach Hall and I had developed when I was working at Ubisoft Montreal. The tool automatically processed raw motion capture data into shippable quality animation. Before building this tool, we did an analysis of how our mocap animators were working, which showed that an estimated 50-80% of the tasks that they were doing, were things that could be automated. So, we set about automating those things.

In a way, what we did wasn’t particularly revolutionary: Every studio writes scripts to automate repetitive tasks, the only difference in our case was the degree to which were willing to do it. I’d also say that a key point was that we were really looking closely at what the animators were actually doing, whereas sometimes technical animators can think they know the problems animators are facing, but they’re actually building solutions for things that aren’t necessarily the most important things.

I got a very positive response to the GDC talk, though I’m yet to hear of other studios trying a similar approach.

I would hope that in future, more teams start to look seriously at automation and pipeline efficiency, because it really is a huge opportunity. A single technical animator can potentially save the work of many, many animators, if they’re aimed towards the right things. It’s just unfortunate, that it seems like more often than not, people tend to rely on what they’re familiar with, and so a manager might prefer to hire 10 more animators to brute force the work, rather than assign a technical animator to focus purely on improving efficiency.

I’m hopeful though that things will happen in this area.

Short to Mid-Term: Neural Networks - Motion Generation at Runtime

If you haven’t seen Daniel Holden et al’s paper on Phase-Function Neural Networks, drop what you’re doing and watch this now. This is the future of game animation, right here.

In my view, Neural Networks and Deep Learning are going to change everything (not just about game animation, not just about game development: everything). While we may not see Neural Network based animation systems shipping in games for a while, some developers are already doing experiments using something similar to Daniel’s approach.

Studios will begin to use animation data to train neural networks, and those networks will then be able to generate animation at runtime. Just like Motion Matching the data that it generates is based on high-level goals, so it makes this a natural successor to the some of the work that’s being done with Motion Matching.

There are a number of benefits to Neural Network (NN) based animation systems over a Motion Matching approach…

They’re cheaper memory-wise: You only store the trained network weights, and not actual animation data.

Motion Matching is picking from a pre-existing set of animation data. NNs on the other hand can generate poses that weren’t in the original data, just that makes sense in context with the original data. This allows for far more adaptive characters. So for example, if you want your character to run past a table and pick up an object from that table, the position of the object doesn’t have to perfectly match what was in the training data; there just needs to be enough examples of picking up objects from tables while moving, correlated with appropriate high-level goals, for the system to understand how that type of action works. Then when you’re generating animation at runtime, you can set goals that never existed exactly that way in the training data (like different object positions on the table, different speeds, etc), and it should be able to deal with that.

NNs need to be fed lots of training data, but one approach to creating this data is to do offline procedural adjustments to your mocap (the kind of adjustments that might normally be inappropriate to use at runtime), and then use the result as training data for the NN. This essentially gives you something similar to a runtime version of that offline process. So for example, Adjustment Blending is a method of adjusting animation, that produces high quality results, but is most suitable for offline processing. This is because it relies on knowledge of what the character is going to do in the future. However, you could use Adjustment Blending to create lots of examples of adjusted data, and then use that adjusted data to train the NN. This would essentially give you similar results to Adjustment Blending, but at runtime. Another example of this type of approach is the uneven terrain example used in Daniel’s PFNN paper.

There are some challenges with NNs too, that the industry will need to work through…

NNs are currently slow to train. You can’t see the results of your changes until hours later. This will hopefully get faster as time goes on, but it’s currently an issue.

NNs are even more of a black box than Motion Matching. If the NN does something you don’t want it to do, it can be incredibly difficult to figure out why.

NNs rely on being fed a lot of example data. The more data, and the higher quality the data, the better. With this in mind, it’s likely only going to be appropriate for mocap, at least at first. You’ll also have “style transfer” which will help us to produce more stylized animation, but it’ll be a long time before we’re able to generate high-quality, Pixar style animation because there isn’t enough of that animation in the world, to train the system.

Short to Mid-Term: Animation Capture - Quality and Volume

As mentioned, NNs need to be fed vast amounts of data, and you generally need this data to be consistent and high-quality. Part of the reason that Deep Learning has made such rapid advancements in the last few years, is because of the vast amounts of data available on the Internet.

With this in mind, I see there being huge benefits to focussing on animation capture quality, and methods for capturing large amounts of animation data, very quickly. The amounts of data that we’re talking about here are so large, that it would be too much for an animator to clean up manually, so ideally we’ll need to use the raw data that comes out of the capture system, or treat the data in some sort of automated way.

Improvements in synchronized, body, finger, and facial motion capture will certainly help. Longer term I would expect to see far more full body 4D capture, and a focus on surfaces and muscles rather than bones and traditional skinning methods.

One area that I expect to get very good in the next few years is the ability for NNs to generate motion data from a single video source, rather than dedicated capture systems. Researcher Michael Black and his team are already working on this kind of thing, and I’m guessing that very soon, the results will start to be as good or better than optical systems.

If this happens, it’ll be an absolute game changer: Teams will be able to source their data from any video footage, so imagine the entire wealth of movies, TV, CCTV footage, people’s home videos, etc. all being sources for mocap data. Moreover, depending on the fidelity of video footage, and the quality of the NN system, you’ll likely be able to derive more than just skeletal data from this footage. You’ll eventually be able to estimate fingers, facial, muscle, subcutaneous layers, skin, etc: All things that are useful, and usable.

Long Term - The Incredible and Scary Future - Semantics

Some NNs are already able to derive semantic information from photos and video footage, and this is where things really start to get crazy. These systems are able to make accurate guesses about who and what are in images, what the relationships are, and so on. These types of systems are continuing to improve at a super-fast rate.

So say for example, you build an NN that can look at video footage and not only generate the motion of the person in the footage, but also accurately guess whether the person in the footage is male or female, guess how old they are, guess their ethnicity, maybe even guess their personality traits, their level of education, how wealthy they are, what type of job they do, what their political stance is, etc. Imagine that you then associate all that information with the generated motion, and then use that as part of the motion generation training data for the NN that generates animation on-the-fly.

So now you can set character traits as high-level goals for the system. So maybe your game director can simply say: Create me a character that moves like a 50 year old, overweight man, who is shy, and is recovering from an injured ankle: The system sets those parameters as part of it’s goals, and so when it generates the motion it generates with those parameters in mind.

I’ve just been talking about animation so far, but the same advancement is happening in other game development disciplines, and so by this point, there will also be systems to generate faces, bodies, clothing, etc. and so these same parameters can be applied in those systems to generate character meshes that are also context appropriate.

So you’re now able to build any type of character just by asking.

Let’s get more crazy…

What if you derive semantic understanding from scenes and places. For example: What if you use CCTV footage, along with the NN that analyzes people, to get a semantic understanding of city demographics. What type of people travel through what type of areas. What type of people live in what types of appartment buildings. What type of people drive vs use public transport. Which people go to Starbucks vs the artisinal local coffee chain. What type of people give money to the homeless, etc.

Now you feed this information into an NN that generates city neighborhoods, or whole cities, or whole continents full of cities. First, it generates a set of demographics, then it uses the character and animation NNs, to populate each city in appropriate ways.

So maybe now all the game director has to say is “Make me a city like London circa 1975”, and as long as there are enough data sources for what a city like that should be like, the system will generate an appropriate city, with appropriate people, who have appropriate behaviour.

Want to get even crazier…

Maybe at this point the game director who’s asking for all of this, isn’t even a game director anymore; maybe it’s just the player, asking directly for what they want.

“I want to play a game in the style of James Bond, but set it in the 1800s.”

“I want to play a brand new Star Wars story, from the perspective of Chewbacca.”

Eventually, we tie this in to devices that track emotional responses as the player is playing e.g. cameras that look at facial responses, or wearables like smart watches that track heart rate. Maybe you don’t even ask for a subject matter, maybe you say how you want to feel.

“I want to play an experience that makes me feel happy.”

“I want to have an experience that gives me a sense of family and belonging.”

“I want to experience a story that gives me the same sense of childish wonder as when I first read the Harry Potter books.”

Maybe in the next step you don’t even ask the system for anything. Maybe the system scans you as soon as you enter your door, understands what mood you’re in, and generates a complimentary experience.

At this point you start to delve into philosophical questions about what it even is to be human and whether the human experience means anything, if you’re just having your every whim automatically appeased, so maybe I should leave things there.

So yeah, that’s my road-map for the crazy future of game animation and game development as a whole.

Oh also I guess at some point we’ll get good full body IK.

automatic damage detection of roof shingles using the max-tree algorithm

#max-tree #shingles #damage-detection

Odometer reading segmentation using the max-tree algorithm and discrete distance transform

#max-tree #odometer #segmentation

Trending Blogs

Recently Viewed Blogs

Machine Vision and Intelligence