To Come: Language & Maps
Do you know what Chico or Huapi means?
Or what are the origins of Juneau and Sitka (formerly Novoarkhangelsk)?
Today's Document
i don't do bad sauce passes
noise dept.
let's talk about Bridgerton tea, my ask is open
AnasAbdin
Keni

oozey mess
Lint Roller? I Barely Know Her
Sweet Seals For You, Always

Andulka
Misplaced Lens Cap

Product Placement
"I'm Dorothy Gale from Kansas"

祝日 / Permanent Vacation
KIROKAZE
No title available
RMH
hello vonnie

No title available

tannertan36
seen from United States
seen from Germany
seen from United States

seen from United States

seen from Singapore
seen from United States
seen from United States
seen from United States

seen from Malaysia

seen from Brazil

seen from Malaysia

seen from Vietnam
seen from United States
seen from United States
seen from United States

seen from Mexico
seen from Malaysia

seen from Hungary
seen from United States
seen from United States
@polarbearby
To Come: Language & Maps
Do you know what Chico or Huapi means?
Or what are the origins of Juneau and Sitka (formerly Novoarkhangelsk)?
To Come(r): El Bulli
So close to heaven....
An American History Quiz
I love maps and more historical maps. For that reason I have collected some old maps. The image below is a piece of an old cloth travel map of North America:
So here is the quiz:
1. How many countries are in the map?
2. Name them.
3. In which country and year this map was made?
I am American
"I am American", he said. "Me too", I answered, while he knit his eyebrows in surprise.
We have to keep our culture, our language. This does not mean to promote nationalism, just to express well what we really mean (versión en español)
A Bit of Spanish
How many times do we hear the term americanos in Spanish in reference to U.S. citizens? The precise designation is estadounidense (literally, "United States-ian"). But aren't we also Americans? Reducing America to just the United States goes beyond titles for nationality. This influence can be seen in incorrect translations (see table below which does not include many other technical terms), the dominance of fast food (which wouldn't be a bad thing if the food were decent), violent movies and TV (which feature horrendous acts of gruesome bloodshed, but carefully avoid explicit sexual acts, which are natural and beautiful), satanic music, etc.
Without disregarding the good things of U.S. culture, its enslavic economic invasion is packed with a Ptolemaic vision of the world. Communication media centered in national news, with the paucity of international news often portraying only disasters. Local sports with "world" series. All this creates ignorance in general knowledge (recall the famous National Geographic survey more many years ago and the one done in 2002. In this sense, Canada is an interesting country, which tries to mix the best of the US with the social benefits of Europe.
A Geography Lesson
The use of American has a deeper connotation. United States of America is really a bad name, because America is more than just the original 13 British colonies. In English, many people use America to mean the US. A partial way to resolve this ambiguity, is the six continents postulate, where North and South America are different continents, and then we can say Americas. Let's assume that this is true, then:
Why does the Olympic flag have only five circles?
To which continent do Central America and the Caribbean belong?
Note that these historical views were defined before the Antarctic was considered as another continent (and then we have six or seven continents).I have seen America divided as North and Latin American, to imply that Mexico is not part of North America (although difficult to deny after NAFTA). Do not allow the usurpation of our continent. There is just one America and we can all share it. Hence, we have to reply with pride that we are also Americans. Even more, Americans in the continental sense, although the true Americans are the few natives whose ancestors lived here before we arrived.
Note: this is a revised version of something that I wrote more than 20 years ago and that has always been available in my personal website.
Beware of Averages!
We use averages all the time without really thinking what we are doing. We believe in them and worst, we take decisions based on them. But most of the time averages can mislead you as this famous joke:
Once upon a time there was a statistician with the head in a freezer and the legs in a fire. A person worried that he was in danger to die asked him: how do you feel? He answered: in average I am fine.
Unless you have data that is pretty homogenous, averages are not enough. For example parents are obsessed with their kid’s grade average. What is better: A- or B? Well, as most things in life, it depends. If the A- is a kid with all As and one D, I guess I would prefer my kid to have all Bs!
So, when you look at uni-dimensional data, e.g. school grades, you need to look more:
- The maximum and the minimum: this will tell you the spread of your data and rapidly point out to outliers, data errors, or missing data.
- The median: the value that splits your data in two sets of the same size. Many times this value is more meaningful than an average.
- The most frequent value: this is called the mode and makes sense if you are dealing with a finite set of values (that is, you have a discrete distribution). The most common value may point out to another data issue (e.g. a default value).
If you know a bit of Excel, the best would be to plot the distribution. For that is important to bucket well the data, so you can get a nice histogram. Beware here that bucketing data using equally space segments of the possible range of values only makes sense for uniform distributions. Otherwise, the buckets should have the same volume. That is, each bucket has the same number of samples and then you can take the average of those values as the value for that bucket.
Let us look at an example. Consider the grade sets for two students, A = {5,5,6,6,6,6,6,6,7,7} and B = {3,3,3,3,5,5,8,10,10,10}. I give them ordered for simplicity, but in general data will come in any order. Notice also that the order can be different for each student, so we are comparing all the grades as a whole and not the grade in each subject.
Well, A and B have the same average, 6, however B is failing 4 subjects while A is approving everything. The minimum will point out this difference right away. The maximum on the other hand tells us that B, on the other hand, excels in some subjects, so he might be very successful in them (however our school systems force all kids to have a minimum to pass to the following grade, why?). The median of A is 6 while the median of B is 5, so in this case seems better than the average. The mode for A is 6 while for B is 3, showing even more the differences.
One way to plot the distribution is a direct graph of the ten values as:
Here we can see that A is above in 6 values and below in 4. However as the order might be different, we can only say that A is above in 4 values because there might be two ties (5). One solution would be to order the data of B by the order induced by A (that is, the subject order of A, sorted by grade). Another way to see this is to do a histogram that counts the number of subjects with each grade like this:
This histogram highlights the mode of each student (the tallest bars in each color). Here clearly B has a lower grade in 4 subjects and a higher grade in the same number of subjects.
When we have too much data to understand, we can collapse data in buckets and here is important to have the same number of values per bucket to be able to compare fairly the two data sets. For example, if we use groups of two values we obtain the following side-by-side comparison:
I am a Maphead!
Recent I received as a birthday gift from a dear old friend the Maphead book by Ken Jennings. I devoured the first chapter! I was completely identified with the author and I was thinking in all the geographical knowledge that I knew while I was learning many things that I never knew. The best was the concept of a triple island. That is, an island in a lake that is in an island in a lake that is in an inland! As a computer scientist I find surprising to find 3 levels of recursion in something so real. The largest island that has such configuration is Victoria in the north of Canada, being Luzon in Philippines the second largest.
Then I did the test at the end to see if I was a Maphead and I answered 34 correctly, so I am a Maphead (at least 31 of 40 correct). I could argue in my defense that 4 wrong answers were for North America questions and only 2 from the rest of the world. Considering that 10 questions were about North America, the test is biased (bias, a good topic for the data part of the blog).
Below is my own 3 questions test to know if you are a Maphead, a couple influenced by another Maphead, my former PhD supervisor, Gaston Gonnet:
1. Name at least 5 countries that have two separate unconnected pieces of land in the same piece of land (that is, an island does not count unless there two separate pieces of the same country in that island -if you find this particular case, you have extra credit).
2. Which animal is two different countries in two different languages (yes, this implies knowledge of more than one language which is biased against Americans and even works with 3 different countries & languages). A simple variation of this question is: name a country that is a food in another country.
3. Make groups with all the cities of the same name (e.g. all Londons are a group, all Parises are another group, etc.). Now sort the groups by the population of the second largest city in each group. Which city wins?
Topics & Keywords
To save your time and mine, today I will just list topics & keywords that I want to cover in the three different areas:
Data: statistics, machine learning, power laws, bias, spam, diversity, scalability, information, wisdom of crowds, popularity, long tail, digital desert
Applied Geography: maps, old maps, strange maps, history, historical atlases, etymology, languages, traveling, special people, special places, anecdotes
Food & Wine: El Bulli, restaurants, chefs, good wines, strange food, anecdotes
Special places: Torres del Paine, Easter Island, Mount Athos, Galapagos Islands, Machu Picchu, Lady Elliot Island, .....
Preface
I always wanted to write a blog although a blog it is worse than a most prison sentences. Indeed, a prison sentence would typically be a few years and one day. Blogs are like a life sentence.
So here we are thinking that I am writing to the world while most probably nobody will read this. Yes, another content that will be part of what I define as the “digital desert” (I need to write about this in a next entry :-).
I love maps and I guess that has marked my life. Why? Because if you like maps you like data and information as maps is one of the best ways that we have to represent lots of data and information. So I love data and my work is related to data science research (search and data mining).
If you love maps you then love geography and traveling. Traveling is what I call applied geography. As I also love research, traveling implies learning and trying new things and my favorite research is food & wine research.
Now I wonder where to start .....
Talk at UCSD, March 12, 2015.