Learn with Google AI

#batman#dc comics#bruce wayne#dc#tim drake#batfamily#batfam#dc fanart#dick grayson




seen from Türkiye
seen from United States
seen from Argentina
seen from United Kingdom
seen from China
seen from United Kingdom
seen from Greece
seen from United States

seen from Malaysia

seen from United Kingdom
seen from United States

seen from United States
seen from United States

seen from United States
seen from United States
seen from United States

seen from Sudan
seen from Argentina

seen from Sweden

seen from Greece
Learn with Google AI
Dyn & Max #AiClass
ansatsu... aikatsu...... those names are way too similiar
Professor Sebastian Thrun quits Stanford to teach people
The text of his homepage today:
One of the most amazing things I've ever done in my life is to teach a class to 160,000 students. In the Fall of 2011, Peter Norvig and I decided to offer our class "Introduction to Artificial Intelligence" to the world online, free of charge. We spent endless nights recording ourselves on video, and interacting with tens of thousands of students. Volunteer students translated some of our classes into over 40 languages; and in the end we graduated over 23,000 students from 190 countries. In fact, Peter and I taught more students AI, than all AI professors in the world combined. This one class had more educational impact than my entire career. Just watch this video.
Now that I saw the true power of education, there is no turning back. It's like a drug. I won't be able to teach 200 students again, in a conventional classroom setting. I've just peeked through a window into an entire new world, and I am determined to get there.
(and yes, I gave up my tenured position at Stanford)
I could not be more impressed with Sebastian. From a story about this:
... the physical class at Stanford... dwindled from 200 students to 30 students because the online course was more intimate and better at teaching..."
Using nltk on the AI-Class mini-shredder problem
The Free On-line Stanford AI Class didn't have the programming problems that the in-person class did, due to lack of grading resources - but we did get a simple, optional, mini shredder challenge where it was suggested that we use NLP techniques to reassemble 19 vertical (2 character wide) slices of a paragraph back together in the right order. The popular approach was to simply produce some letter statistics (usually bigram) on english words, and then glue stripes together picking the addition that improves the score most.
Most of the approaches discussed afterwards on the aiclass Reddit involved getting a big chunk correct using machine techniques and then fixing up the last chunks by hand; my personal approach found "Claude" and "Shannon" recognizably, leaving very little hand-work. (Note that the actual shredder challenge was won by a team using a hybrid computer-vision/nlp/human approach, so one could argue that this is a state-of-the-art method.)
In my example, I grabbed /usr/share/dict/words and chewed it into statistics by hand; not very sentence like, but by pairing spaces with initial and terminal letters I got good word-assembly out of it. Later, I figured that a good example of "coding well with others" would be to actually use the widely acclaimed NLTK Natural Language Toolkit, especially if that showed that the additional effort of understanding an existing solution would have some measurable benefit towards executing that solution.
Corpus
The first approach was to replace my dictionary extractor:
ascii_words = [word.lower().strip() for word in file("/usr/share/dict/words") if word.strip() and not (set(word.strip()) - set(string.letters))]
with code that pulled the words from the Project Gutenberg sample:
import nltk nltk.download("gutenberg") ascii_words = nltk.corpus.gutenberg.words()
Then I reran the code (as a benchmark to see how much of a difference having "realistic" words-in-sentences probabilities would make.) Inconveniently (but educationally), it made so much of a difference that the first run produced a perfect reassembly of the 19 slices, with no need for further human work (either on the output data or on the code...) Still, there are a bunch of ad-hoc pieces of code that could be replaced by pre-existing functions - which should in theory be more robust, and perhaps also easier to share with others who are familiar with NLTK without forcing them to unravel the ad-hoc code directly.
Bigrams and Trigrams
A thread on nltk-users points out the master identifier index where one can find a bunch of tools that relate to word-pair bigrams, but also the basic nltk.util.bigrams that we're looking for. In fact, our existing code was a somewhat repetitive use of re.findall:
for ascii_word in ascii_words: for bi in re.findall("..", ascii_word): in_word_bigrams[bi] += 1 for bi in re.findall("..", ascii_word[1:]): in_word_bigrams[bi] += 1
which turns into
for ascii_word in ascii_words: for bi in nltk.util.bigrams(ascii_word): in_word_bigrams["".join(bi)] += 1
(in_word_bigrams is a defaultdict(int); the join is necessary to keep this a piecewise-local change - otherwise I'd have to look at the later scoring code which uses substrings and change it to use tuples.)
The use of nltk.util.trigrams is nearly identical, replacing three more re.findall loops.
Statistics
It turns out we can go another step (perhaps a questionable one) and replace the defaultdict above with nltk.collocations.FreqDist() giving this:
in_word_bigrams = nltk.collocations.FreqDist() for ascii_word in ascii_words: for bi in nltk.util.bigrams(ascii_word): in_word_bigrams.inc("".join(bi))
In practice, most Python developers should be more familiar with defaultdict (those with long term experience might even have a bit of jealousy about it, as it wasn't introduced until Python 2.5.) One benefit of using FreqDist is that it includes a useful diagnostic method, .tabulate, which lets us confirm that the statistics are what we expect from english-language text:
print "In Word Bigrams:" in_word_bigrams.tabulate(15) In Word Bigrams: th he an in er nd re ha ou at en hi on of or 310732 284501 153567 147211 140285 134364 112683 103110 95761 88376 84756 82225 81422 77028 76968
and similarly for the other tables. Even better, adding one more line:
all_bigrams.plot()
uses matplotlib to show us a nice graph that again confirms our suspicion that bigram statistics have a tail, without having to get distracted by the mechanics of doing the plotting ourselves - some past nltk developer decided that frequency distributions were sometimes worth visualizing, and we get to take advantage of that.
Probabilities
Since the actual task involves coming up with the probability that a given partial combination of strips of text is in english, as the completely unscrambled one will be - bigram frequency counts aren't directly useful, we need to normalize them into probabilities. Again, this isn't much Python code:
bi_total_weight = sum(all_bigrams.values()) norm_bigrams = dict(((k, v/bi_total_weight) for k,v in all_bigrams.items()))
but the nltk.probability module has a class that performs this conversion directly:
norm_bigrams_prob = nltk.probability.DictionaryProbDist(all_bigrams, normalize=True)
A quick loop to print out the top ten from both shows that
my ad-hoc implementation does actually implement the same thing other people are talking about when they say probability distribution
I've correctly understood what DictionaryProbDist is implementing
There is an interface difference; instead of dictionary lookup, DictionaryProbDist provides .prob() and .logprob() methods. Nothing else that leaps to the eye as helpful to this particular problem - but it's useful to notice the .generate() method, which is proportional sampling with replacement - which is basically the core of the Particle Filter algorithm.
Laplace Smoothing
For the final step, we need to apply our bigram (and trigram) probabilities to our partially assembled bits of text, so that we can greedily pick the combinations that are approaching english the most quickly. However, we have bigrams that appear in the target that don't appear in the model corpus; as explained in Unit 5, Section 21, if you assign those missing values a probability of zero, they falsely dominate the entire result. (If you use the model that way anyhow, it does in fact come up with a poor combination of strips.)
In the original implementation, I assigned an arbitrary and non-rigorous "10% of a single sample" value, which was at least effective:
for i in range(len(word) - 1): # go back and do the smoothing right! s *= bigrams.get(word[i:i+2], 0.1/bi_total_weight) return math.log(s)
It turns out that nltk.probability has some more useful surprises in store - we can simply use LaplaceProbDist directly, since k=1 is sufficient. It turns out that the general form is also called Lidstone smoothing so we could use LidstoneProbDist(gamma=k) if we needed to experiment further with smoothing; in the mean time,
norm_bigrams = nltk.probability.LaplaceProbDist(all_bigrams) for i in range(len(word) - 1): s += bigrams.logprob(word[i:i+2]) return s
simply (and convincingly) "multiplies" probabilities (adding logs, just to get numbers that are easier to handle - that is, easier to read in debugging output.)
Final Output
One interesting result of looking at the logs is that while the chosen strips are typically 2**15 to 2**90 more likely than the second best choice, once we've assembled "Claude Shannon founded" it jumps out that the best next choice (by a factor of "merely" 5) is actually to (incorrectly) add a strip on the left... so at least with the gutenberg corpus, the assemble-to-the-right bias actually helps reach the desired answer, so there's more room to improve the method. The numbers also tell us that it's a path problem, not a result problem - the evaluation function gives a 2**94 higher probability for the correct result than for the (still slightly-scrambled) result that the symmetric (but still greedy) version produces.
Conclusions
While it was certainly faster to code this up in raw Python the first time around - note that this was a single instance problem, with new but thoroughly explained material, where the code itself was not judged and would not be reused. Once you add any level of reuse - such as a second problem set on the same topic, other related NLP work, or even vaguely related probability work - or any level of additional complexity to the original problem, such as a result that couldn't be judged "by eye" - digging in to NLTK pays off rapidly, with dividends like
directly applicable interfaces (LaplaceProbDist)
incidentally useful tools (tabulate, plot)
broader field-related knowledge (recognizing DictionaryProbDist.generate as relevant to Particle Filters.)
clear distinction between doing "special" things (hand-written code) and "convential/normal things" (LaplaceProbDist, util.bigrams)
The last one is perhaps the most relevant benefit, because even though the code may perform identically, and the line counts are the same, the version that uses the NLTK-supplied abstractions should communicate more clearly to the next human reader of the code - an important "coding well with others" principle.
Stanford AI Class - 2 - Problem Solving
- Talking about problems whose complexity comes from the number of possible states
Defining a problem
The example - route finding - how to get from Arad to Bucharest?
1. Initial State, S (agent is in Arad)
2. Actions, A(S) = {a1, a2, a3, ...}
- Returns a possible set of actions agent can execute from state S
- Here, able to travel to any of the neighbouring cities
3. Result, S' = R(S, A)
- A new state is produced when a particular action is chosen from a preceding state
- Could end up in any city neighbouring Arad
4. Goal Test, T(S) = T|F
- Are we in Bucharest? If we are, returns True.
5. Path Cost, N(S1, a1, S2, a2, S3)
- Usually additive; the sum of the costs of the Individual steps
5.(a) Step Cost, n(S, a, S')
- Route finding: could be number of miles travelled/number of minutes/number of journeys
- As we explore outwards through the state space, by taking actions, the furthest points on these paths are called the frontier
Exploration Techniques
1. Tree Search
- Superimposes a search tree over the state space
- Start state at the root node, children correspond to successor states
- Expand out nodes to explore states. Want to expand as few nodes as possible to reach the goal state. Maintain a frontier of unexpanded nodes.
- NB nodes in search trees represent the path (sequence of actions) to get to that state, not true of nodes in state space graphs
Ways of expanding the frontier:
i. Breadth-First Search. Shallowest node in the tree first. Guaranteed to find the solution, given it can be reached in a finite # of steps
ii. Depth-First Search. Deepest node in the tree first. Potentially quicker, but has problems with very deep/infinite trees
- BUT, tree search does not recognise when it returns to a state it has already visited. Need to fix this, creates a lot of unnecessary work
2. Graph Search
- Keep a list of explored nodes, that we won't expand in subsequent encounters. Solves the performance problem
Introduce cost to our ways of exploring the space:
iii. Uniform Cost Search: Expand the cheapest node first (to give the cheapest total cost along the path). But, explores in every direction. IF we have info. about the direction of the goal, we should be able to speed this up.
iv. Greedy Best First Search: Expand the node that s nearest to the goal. But, this can lead us astray!
v. Preferred technique: A* search: seeks to minimize the sum f(n) = g(n) + h(n), with g=path cost & h=distance to goal
- Only stop search when we take the goal off the frontier - want to ensure we have found the lowest cost way to get there
- Need estimates of distance (h) to be lower than the actual goal distance in order for A* to find the optimal solution. So that if we reach the goal through a sub-optimal path, the optimal path will always appear more attractive - since h underestimates the distance (at this point, we know the true distance for the sub-optimal path)
Conditions
Problem solving only works when:
- Domain is fully observable, and we know the set of available actions
- We have a discrete domain, and a finite # of actions available
- Domain is static, and deterministic. Only our own actions change it, and they do so in a predictable way.
I should really be watching the lectures instead of making pictures.