Developify @developify-blog-blog - Tumblr Blog

Posts

goo.gl url shortener is an extension which allows you to shorten the current website URL with the Google URL Shortener service http://goo.gl/. It's trivial to use. ● Auto copy to clipboard ...

goo.gl URL shortener

goo.gl url shortener is an extension which allows you to shorten the current website URL with the Google URL Shortener service http://goo.gl/. It's trivial to use.

● Auto copy to clipboard ● goo.gl history ● Keyboard shortcut ● Context menu ● QR Code ● Highly customizable ● Incognito mode ● Share with your default mail client ● Share with many different services - Blogger - Delicious - Digg - Evernote - Facebook - FriendFeed - Gmail - Google Bookmarks - Google Buzz - Google Reader - Hotmail - Instapaper - LinkedIn - Mail - MySpace - Netlog - Orkut - Ping.fm - Posterous - Reddit - Read It Later - StumbleUpon - Technorati - Tumblr - Twitter - Yahoo! Bookmarks - Yahoo! Mai

If you are not interested in the keyboard shortcuts or you are worried for your privacy you can install the lite version http://goo.gl/kZiQ

#goo.gl URL shortener

This tutorial shows you how to install Oracle's VirtualBox on a Ubuntu desktop. With VirtualBox, you can create and run guest operating systems ("virtual machines") such as Linux and Windows on top...

Howto: Install VirtualBox on Ubuntu

This tutorial shows you how to install Oracle's VirtualBox on a Ubuntu desktop.

With VirtualBox, you can create and run guest operating systems ("virtual machines") such as Linux and Windows on top of a host operating system.

In order to run VirtualBox on your machine, you need:

Reasonably powerful x86 hardware. Any recent Intel or AMD processor should do.

Memory. Depending on what guest operating systems you want to run, you will need at least 512 MB of RAM (probably more, and the more the better). Basically, you will need whatever your host operating system needs to run comfortably, plus the amount that the guest operating system needs. For example, if you want to run Windows XP on Windows XP, you probably won't enjoy the experience much with less than 1 GB of RAM. If you want to try out Windows Vista as a guest, it will refuse to install if it is given less than 512 MB RAM. So you'll need to set aside that much RAM for the guest operating system, plus the RAM your host operating system normally consumes.

Hard disk space. While VirtualBox itself is very lean (a typical installation will only need about 30 MB of hard disk space), the virtual machines will require fairly huge files on disk to represent their own hard disk storage. So, to install Windows XP, for example, you will need a file that will easily grow to several GB in size.

A supported host operating system. Presently, VirtualBox supports Windows (XP and later), many Linux distributions, Mac OS X, Solaris and OpenSolaris.

A supported guest operating system. An up-to-date list is here.

There are two ways to install VirtualBox: from pre-compiled binaries that are available for some distributions and come under the PUEL license, and from the sources that are released under the GPL. This tutorial will demonstrate both ways.

1. Installing VirtualBox From Precompiled Binaries

The VirtualBox binaries can be downloaded directly from http://www.virtualbox.org/wiki/Downloads, if the PUEL license applies.

To install the VirtualBox .deb package, open up a terminal window and become root:

sudo su

Then install some prerequisites for VirtualBox:

apt-get install bcc iasl xsltproc xalan libxalan110-dev uuid-dev zlib1g-dev libidl-dev libsdl1.2-dev libxcursor-dev libqt3-headers libqt3-mt-dev libasound2-dev libstdc++5 linux-headers-`uname -r` build-essential

Next, go to the VirtualBox download page. Pick the right .deb package for your Ubuntu version and download it to your system:

cd /tmpwget http://download.virtualbox.org/virtualbox/4.0.4/virtualbox-4.0_4.0.4-70112~Ub...

After the download is finished, you can install VirtualBox like this:

dpkg -i virtualbox-4.0_4.0.4-70112~Ubuntu~maverick_i386.deb

Voila! Now You can find VirtualBox under Applications > System Tools.

2. Installing VirtualBox from the Sources

If VirtualBox's PUEL license doesn't work for you, you prefer the GPL, or there's no .deb package for your Ubuntu version, you can compile VirtualBox from the sources. The sources are released under the GPL.

To install VirtualBox from the source, open up a terminal window and become root:

sudo su

Then install some prerequisites for VirtualBox:

The latest VirtualBox source can be checked out from VirtualBox's SVN repository. To do so, we need to install Subversion first:

apt-get install subversion

Next, check out the VirtualBox source to the /usr/src/virtualbox directory:

mkdir /usr/src/virtualboxcd /usr/src/virtualboxsvn co http://virtualbox.org/svn/vbox/trunk vbox

TheN, compile VirtualBox like this:

cd vbox./configuresource ./env.shkmk allcd out/linux.x86/release/bin/srcmakemake install

Load the vboxdrv kernel module, and copy the VirtualBox files to appropriate locations on the disk:

cd ../modprobe vboxdrvecho vboxdrv >> /etc/modules

cp -prf *.so /usr/lib/mkdir /usr/local/virtualboxcp -prf * /usr/local/virtualbox/ln -s /usr/local/virtualbox/VirtualBox /usr/local/bin/VirtualBoxln -s /usr/local/virtualbox/VBoxSVC /usr/local/bin/VBoxSVC

Next, create the group vboxusers and add a desktop user (e.g. guest) to it:

groupadd vboxusersusermod -G vboxusers -a guest

Change the permissions of /dev/vboxdrv, so that it can be accessed by the vboxusers group:

chmod 660 /dev/vboxdrvchgrp vboxusers /dev/vboxdrv

We don't want the permissions of /dev/vboxdrv to be reset at each boot time, so we edit /etc/udev/rules.d/40-permissions.rules:

gedit /etc/udev/rules.d/40-permissions.rules

Add the following line to the end of the above file:

[...]

KERNEL=="vboxdrv", GROUP="vboxusers",

MODE="0660"

We're almost there! Now all that's left to do is create a menu entry for VirtualBox. Right-click on Applications and select Edit Menus:

In the window that pops up, select System Tools and then click on New Item:

In the Create Launcher window, fill in the following details:

Type: Application

Name: VirtualBox

Command: VirtualBox (pay attention to upper/lower cases -- this is a command. If you don't type it right, the application won't start)

Comment: You can fill in anything you like or leave it blank.

If you like, you can also select an icon for the new application. This is optional.

VirtualBox should now be found in the list of items under the System Tools menu. Click on Close to leave the window:

Then, you can find VirtualBox under Applications > System Tools:

This is what it looks like when VirtualBox is started:

#Install VirtualBox on Ubuntu

worshiptheglitch

I work at Apple as a manager at one of its stores in Japan. The earthquake hit while I was working on the first floor of one of their stores. As the entire building swayed, the staff calmly led people from the top 5 floors down to the first floor, and under the ridiculously strong wooden tables that hold up the display computers. 7 hours and 118 aftershocks later, the store was still open. Why? Because with the phone and train lines down, taxis stopped, and millions of people stuck in the Tokyo shopping district scared, with no access to television, hundreds of people were swarming into Apple stores to watch the news on USTREAM and contact their families via Twitter, Facebook, and email. The young did it on their mobile devices, while the old clustered around the macs. There were even some Android users there. (There are almost no free wifi spots in Japan besides Apple stores, so even Android users often come to the stores.) You know how in disaster movies, people on the street gather around electronic shops that have TVs in the display windows so they can stay informed with what is going on? In this digital age, that’s what the Tokyo Apple stores became. Staff brought out surge protectors and extension cords with 10s of iOS device adapters so people could charge their phones & pads and contact their loved ones. Even after we finally had to close 10pm, crowds of people huddled in front of our stores to use the wifi into the night, as it was still the only way to get access to the outside world.

#urbancomputing

XXXXX, Great Tohoku Earthquake Survivor 2011 (via ericmortensen)

Original Post Obviously, Ruby and Rails are tied together, but seeing the growth trends starting to separate is a good sign for Ruby as a standalone language. What I did not realize was that m...

Do Frameworks Spur Adoption of Programming Languages?

Original Post

Obviously, Ruby and Rails are tied together, but seeing the growth trends starting to separate is a good sign for Ruby as a standalone language.

What I did not realize was that many people considered Groovy and Grails to be similarly tied. In my experience, Groovy has been used as a scripting language, but I have worked in heavy Java shops where Spring or Struts ruled the MVC world. After hearing some of the comments that outside of my world, Groovy is heavily tied to Grails, I did some digging. More than anything, it is obvious that I was missing a data point.

To illustrate, look at the job trends for Ruby, Rails, Groovy and Grails from Indeed.com:

As you can see, there are similarities between the Ruby/Rails pairing and the Groovy/Grails pairing. So, my apologies to all those people that complained, I was obviously wrong. Even though Groovy is tied to Grails, it is interesting that Groovy job trends are starting to distance itself from its Grails heritage. So, if the job trends are starting to separate, what does the relative growth look like:

This graph is interesting for two reasons. First, the Ruby growth and Rails growth trends are almost identical. I am not saying this proves my point, but the growth definitely looks correlated. The second reason this graph is interesting, is the disparity of growth between Groovy and Grails. Very surprising to me is that the Grails growth is ridiculous, while the Groovy growth is still showing a very rapid rise. This shows me that Groovy and Grails are being considered separately as opposed to the Ruby/Rails combination. This trend is something that should be watched over the next several months.

Of course, this comparison made me wonder about some of the other languages reviewed, specifically PHP and Python. Before I start looking at those trends, there is an excellent Wikipedia page that lists a bunch of web frameworks for various languages. So, I first looked at some of the popular PHP web frameworks, Zend, CakePHP and CodeIgniter. I noticed that there was no correlation between those frameworks and the growth of PHP. I found this a little odd until I reviewed the list of web frameworks on Wikipedia again. They listed Drupal and Joomla as frameworks, but they are really web content management systems (CMS). So, I decided to graph the job trends of the frameworks and the PHP-based web CMS applications, WordPress, Drupal and Joomla:

Now that shows a little more correlation. As you can see, the web frameworks have limited demand when compared to WordPress, Drupal and Joomla. You can also see that there are similar growth trends between PHP and the web CMS applications. Granted, this is not a strong correlation, but there are some similarities. So, what about Python?

Here is what Indeed has for Python, Django, Zope and Pylons:

As you can see there is no relationship between the web frameworks and Python growth. Django seems to be the only framework gaining significant adoption as well, but it still does not show the same type of growth pattern as Python.

Overall, watching the growth of these languages is interesting. Seeing the growth of Ruby and Groovy slowly trend away from their web frameworks is a good sign for adoption of those languages. This means that people are starting to see them as being more of a general purpose language than the language used for a specific framework. The contrasting trends of these languages and those of PHP and Python show how the market can move in different situations as well. The growth of CMS applications like WordPress, Drupal and Joomla is due to the rapid growth of PHP itself, as well as the maturity of those platforms. So, sometimes a framework drives language adoption and sometimes the language can drive platform adoption.

#groovy on grails #programming language frameworks #programming language job trend #ruby on rails

Lift vs. other web frameworks (e.g. Spring MVC, Grails)

I've been using Grails for the past few months, and I really like it, specially GORM. Grails is currently on top of the latest cloud-friendly technologies, with features like native RabbitMQ messaging support, and turnkey GORM support for MongoDB and Redis. However, Groovy on Grails isn't as popular in the NYC tech community as I thought. 90% of the developers I've come across in NYC use Ruby on Rails. Once I asked a web developer, "Why don't you use Groovy on Grails instead of Ruby on Rails?" Guess what, the answer was, "What's that? I've never heard of it."

I started to have the feeling that Grails is far from reaching a critical mass, and it still remains obscure (in the past few months I had the opportunity to work with middle-sized companies and IT startups who were working mostly with the JVM stack, and only one person knew and used Grails). One of the comments I heard was "I tend to go back to Java, even reimplementing every app that needs further development in a plain Java framework - I think wicket and Seam."

Recently, I got interested in Scala and Lift. The fact that Twitter and Foursquare are using them now gives Scala and Lift even more presence in the blogosphere with lots of lovers and haters making the buzz. It is famous for being fast and scalable. However, the language seems somewhat hard to understand and learn for lots of developers (so maybe it will never gain mainstream status). Lift is little known and I have read some reports that it is better suited for small apps (less than 20 domain classes).

Recently, I got interested in Scala's Lift. So I searched the web to get more feelings and takes on it.

Lift vs. Other Web Frameworks

There is no best web framework around. It all depends on your needs. Get the right tool for the job.

Lift is one of the most secure web frameworks around because, by default, it's resistant to replay attacks, cross site scripting, cross site request forgeries, etc.

Lift's comet support is the best around.

Lift's Ajax support is easier than any other web framework that I know of.

Performance

Lift is said to be faster than Grails (Groovy is slow). No benchmarks have been done against Spring MVC or Grails.

Usability

Lift apps are going to be smaller and more maintainable than either Grails or Spring MVC apps, and because Scala is strongly typed, the compiler will help you.

Architecture

Keeping stuff around in XML rather than Strings and associating HTML elements with functions is considered by some people to be vastly architecturally superior to what Grails and Spring MVC do.

Deployment

No difference. A WAR file is a WAR file.

Miscellaneous

Where Lift doesn't do better than other web frameworks is generally CRUD-related. Doing CRUD apps in Lift is not the simplest thing in the world. But then, Lift is not tied to a single persistence mechanism (like Grails),

+++

I have yet to look at Scala/Lift myself, but for now here is a JVM Web Framework Matrix, followed by some analysis, which are both interesting to look at.

#Lift vs. Grails #Lift vs. other web frameworks #lift vs. Spring

Everyone loves Shazam, a mobile app that can name any tune you play to it. It's the most miraculous app ever created since sliced bread. I just had to find out, once and for all, what on earth is the...

How does Shazam work?

About Shazam

Shazam is an application which you can use to analyze/match music. When you install it on your phone, and hold the microphone to some music for about 20 to 30 seconds, it will tell you which song it is.

When I first used it, I was instantly intrigued by its magic. “How did it do that!?” Even after using it a lot, it still has a bit of magical feel to it.

There are a couple of ways to use Shazam, but one of the more convenient is to install their free app onto an iPhone. Just hit the “tag now” button, hold the phone’s mic up to a speaker, and it will usually identify the song and provide artist information, lyrics, as well as a link to purchase the album and share on the social network.

What is so remarkable about the service, is that it works on very obscure songs and will do so even with extraneous background noise. I’ve gotten it to work while sitting down inside a crowded coffee shop and pizzeria.

(ARTICLE FROM SLATE.COM)

Shazam is the closest a cell phone can come to magic. Say you’re in a restaurant, a song comes on, and you can’t quite place the tune. In the past, your options were limited; you could try asking your spouse or the waiter for a clue, but that approach risked revealing your ignorance. (That’s “Sex Machine,” dumb ass.) Shazam—which launched in the United Kingdom in 2002 as a call-in service and became widely known in the United States last year when it hit the iPhone—solves the dilemma in a few clicks. Press a button on your phone, and in seconds you’ll get the artist and song title. Other than playing video games, it’s the most useful thing you can do on your phone.

Last week, Shazam announced that more than 50 million people worldwide have used the service—up from 35 million at the start of the year. The company also said that it’s received an undisclosed investment from the fabled Silicon Valley venture-capital firm KPCB. Shazam’s success seems justified—it’s the one app you can show to iPhone skeptics to get them to reconsider their position (though Shazam is also available on Android, BlackBerry, Windows Mobile, and pretty much any other phone). Yet for all the acclaim it garners, Shazam’s inner workings are pretty mysterious. How does it actually ID your song? How does the company make money? (Here’s one hint: iPhone users should expect to see a pay version soon.) And what are the long-term prospects for a firm whose sole purpose is satisfying an acute, very occasional need?

First, a short explanation of how Shazam works. The company has a library of more than 8 million songs, and it has devised a technique to break down each track into a simple numeric signature—a code that is unique to each track. “The main thing here is creating a ‘fingerprint’ of each performance,” says Andrew Fisher, Shazam’s CEO. When you hold your phone up to a song you’d like to ID, Shazam turns your clip into a signature using the same method. Then it’s just a matter of pattern-matching—Shazam searches its library for the code it created from your clip; when it finds that bit, it knows it’s found your song.

OK, but how does Shazam make these fingerprints? As Avery Wang, Shazam’s chief scientist and one of its co-founders, explained to Scientific American in 2003, the company’s approach was long considered computationally impractical—there was thought to be too much information in a song to compile a simple signature. But as he wrestled with the problem, Wang had a brilliant idea: What if he ignored nearly everything in a song and focused instead on just a few relatively “intense” moments? Thus Shazam creates a spectrogram for each song in its database—a graph that plots three dimensions of music: frequency vs. amplitude vs. time. The algorithm then picks out just those points that represent the peaks of the graph—notes that contain “higher energy content” than all the other notes around it, as Wang explained in an academic paper he published to describe how Shazam works (PDF). In practice, this seems to work out to about three data points per second per song.

You’d think that ignoring nearly all of the information in a song would lead to inaccurate matches, but Shazam’s fingerprinting technique is remarkably immune to disturbances—it can match songs in noisy environments over bad cell connections. Fisher says that the company has also recently found a way to match music that has been imperceptibly sped up (as club DJs sometimes do to match a specific tempo or as radio DJs do to fit in a song before an ad break). And it can tell the difference between different versions of the same song. I just tried it on three different versions of “Landslide”—the original by Fleetwood Mac and covers by the Smashing Pumpkinsand the Dixie Chicks—and it nailed each one.

Fisher declined to tell me Shazam’s overall hit-and-miss rate. All he would say is that the service is good enough to keep people coming back for more—the average user looks for songs eight times a month. The most common reason Shazam fails to identify a song is that it doesn’t have enough data. The system needs at least five seconds of music to make a match, and sometimes people turn it on just as the song is ending. There are also frequently errors when people look up live performances—if you hold up your phone to your TV during the musical segment on Saturday Night Live, Shazam will most probably fail to ID the song. (If you do get a match from SNL, you’re probably watching that episode with Ashlee Simpson—Shazam is a great way to catch lip-syncers in the act.) Fisher says that Shazam is technically capable of working on live performances, but they’ve turned off that ability for what he terms “business reasons.” “Right now people trust the brand—trying to match live songs wouldn’t get very high accuracy,” he says. (If you’ve got a tune stuck in your head, try using Midomi, a rival of Shazam’s that can ID songs based on your humming or singing.)

Shazam’s iPhone version has been a blockbuster, but it still represents just 20 percent of the service’s customer base, which spans more than 150 countries and pretty much every mobile carrier in the world. The iPhone version also marked a departure for the company—it was the first version that Shazam offered for free. Fisher says this proved to be a good idea; it brought Shazam instant renown, and the company now has enough of a customer base that it can make decent money through in-app ads and by getting a cut of each song purchase people make through the app. But staying fully free forever isn’t sustainable, Fisher says. The company recently unveiled a Windows Mobile version of its app that operates under a “freemium” pricing model—users who download the free version can search for five songs a month, while a premium version that goes for a one-time fee of $5 will allow unlimited song searches. Fisher says that the $5 version for the iPhone (and most other platforms) will launch by the end of the year.

The company is also planning to add a lot more services to its apps—a recommendations engine, a way to let you share your musical tastes with your friends, and charts that show the songs that people are searching for. Every Monday, Shazam sends out its charts to record labels, and execs have been known to sign artists based on the data. This has led to a new way for artists to break into the mainstream: getting featured in TV ads. In 2005, for instance, Volkswagen ran an ad in Europe for the Golf GTI that featured a remixed version of “Singin’ in the Rain” by Mint Royale. The song inspired a lot of searching on Shazam—and prompted the band’s label to release the track, which then shot to the top of the European charts. “We probably see that at least once a month around the world,” Fisher says. In other words, Shazam doesn’t only help an audience find music. Sometimes it helps music find an audience.

Luckily, I found a paper written by one of the developers explaining just on how Shazam works. It's worth checking out. Of course, they leave out some of the details, but the basic idea is exactly what you would expect: it relies on fingerprinting music based on the spectrogram.

Here are the basic steps:

1. Beforehand, Shazam fingerprints a comprehensive catalog of music, and stores the fingerprints in a database. 2. A user “tags” a song they hear, which fingerprints a 10 second sample of audio. 3. The Shazam app uploads the fingerprint to Shazam’s service, which runs a search for a matching fingerprint in their database. 4. If a match is found, the song info is returned to the user, otherwise an error is returned.

Here’s how the fingerprinting works:

You can think of any piece of music as a time-frequency graph called a spectrogram. On one axis is time, on another is frequency, and on the 3rd is intensity. Each point on the graph represents the intensity of a given frequency at a specific point in time. Assuming time is on the x-axis and frequency is on the y-axis, a horizontal line would represent a continuous pure tone and a vertical line would represent an instantaneous burst of white noise. Here’s one example of how a song might look:

Spectrogram of a song sample with peak intensities marked in red. Wang, Avery Li-Chun. An Industrial-Strength Audio Search Algorithm. Shazam Entertainment, 2003. Fig. 1A,B.

The Shazam algorithm fingerprints a song by generating this 3d graph, and identifying frequencies of “peak intensity.” For each of these peak points it keeps track of the frequency and the amount of time from the beginning of the track. Based on the paper’s examples, I’m guessing they find about 3 of these points per second. [Update: A commenter below notes that in his own implementation he needed more like 30 points/sec.] So an example of a fingerprint for a 10 seconds sample might be:

Frequency in Hz Time in seconds 823.44 1.054 1892.31 1.321 712.84 1.703 . . . . . . 819.71 9.943

Shazam builds their fingerprint catalog out as a hash table, where the key is the frequency. When Shazam receives a fingerprint like the one above, it uses the first key (in this case 823.44), and it searches for all matching songs. Their hash table might look like the following:

Frequency in Hz Time in seconds, song information 823.43 53.352, “Song A” by Artist 1 823.44 34.678, “Song B” by Artist 2 823.45 108.65, “Song C’ by Artist 3 . . . . . . 1892.31 34.945, “Song B” by Artist 2

[Some extra detail: They do not just mark a single point in the spectrogram, rather they mark a pair of points: the "peak intensity" plus a second "anchor point". So their key is not just a single frequency, it is a hash of the frequencies of both points. This leads to less hash collisions which in turn speeds up catalog searching by several orders of magnitude by allowing them to take greater advantage of the table's constant (O(1)) look-up time. There's many interesting things to say about hashing, but I'm not going to go into them here, so just read around the links in this paragraph if you're interested.]

Top graph: Songs and sample have many frequency matches, but they do not align in time, so there is no match. Bottom Graph: frequency matches occur at the same time, so the song and sample are a match. Wang, Avery Li-Chun. An Industrial-Strength Audio Search Algorithm. Shazam Entertainment, 2003. Fig. 2B.

If a specific song is hit multiple times (based on examples in the paper I think it needs about 1 frequency hit per second), it then checks to see if these frequencies correspond in time. They actually have a clever way of doing this They create a 2d plot of frequency hits, on one axis is the time from the beginning of the track those frequencies appear in the song, on the other axis is the time those frequencies appear in the sample. If there is a temporal relation between the sets of points, then the points will align along a diagonal. They use another signal processing method to find this line, and if it exists with some certainty, then they label the song a match.

#Avery Wang #computer science #data structure #hash #how shazam works #music frequency #spectrogram

Say you have a massive database of customers' mailing addresses. A data entry person puts an address in as "Two N Boulevard." Later, someone else searches for it using "2 North Blvd." Unfortunately,...

A simple PHP code to standardize U.S. mailing addresses

This script takes a Delivery Address Line entered by a user and reformats it in conformity with the United States Postal Service's Addressing Standards as outlined in Publication 28, dated November 1997. It makes things a lot easier and neater when you search for records by address.

(By the way, standardization also improves data quality, which helps with obtaining lower bulk mailing rates.)

For example, I have a test file containing the following addresses:

335 Madison Avenue, New york, NY 10017

Eleven Fifth Avenue, Apartment Six, New York, NY 10022

334 Glory Lane, Building nineteen, Providence, Rhode Island

Thirteen Sunshine Boulevard, Cape Hatteras, North Carolina

Then I run my PHP code:

The output is also written to a text file called Output.txt.It just looks just fine.

A few days ago, I came across an Address Standardization package. It's written in object-oriented PHP, so in order to use it for my own test data, I need to create an instance of the class and pass in each line of the address. (I'm a PHP newbie, so please feel free to critique or comment. I'd love to know how to make the code better!) Here is my initial code:

<?php

/**

* @author Justine Leng

* @date Feb 2011

* Require the auto loader

* Use dirname(__FILE__) because "./" can be stripped by PHP's safety

* settings and __DIR__ was introduced in PHP 5.3.

require dirname(__FILE__) . '/autoload.php';

$Address = new AddressStandardizationSolution;

echo "\n ***** Standardizing a batch of addresses from a file named test.txt.\n\n";

$file = 'test.txt' or die('could not open file!');

$handle = @fopen($file, "r") or die("could not open file!");

$fp = @fopen("output.txt", "w");

if ($handle) {

while(!feof($handle)) {

//checks if the "end-of-file" (EOF) has been reached, good for looping through data of unknown length.

$line = fgets($handle,4096);

//handle $line here. fgets() function reads a single line from a file.

//Max length of a line in this case is 4096 characters. Adjust based on needs.

echo $line, "\n";

$result = $Address->AddressLineStandardization($line);

echo $result."\n";

fwrite($fp, $result."\n");

}

echo "\n ***** Standardized adddresses are saved to a file named output.txt.\n\n";

fclose($handle);

fclose($fp);

Soon, I started to realize that oftentimes addresses are stored with other data. It might sound like a pain to sift through all data just to extract the address colum/field. Then I modified the above code, which now could take in all data and process only addresses:

<?php

/**

*@author Justine Leng

*@date February 2011

* Require the auto loader

* Use dirname(__FILE__) because "./" can be stripped by PHP's safety

* settings and __DIR__ was introduced in PHP 5.3.

require dirname(__FILE__) . '/autoload.php';

$Address = new AddressStandardizationSolution;

echo "\n***** Standardizing a batch of addresses from a file named testData.csv.\n\n"; $file = 'testData.csv' or die('could not open file!'); $handle = @fopen($file, "r") or die("could not open file!"); $fp = @fopen("output.txt", "w");

if ($handle) {

//Skip the header for processing, but write it to the output file: $line = fgets($handle,4096);

echo "\nOpening test file and writing out the column headers to the output file:\n"; echo $line, "\n\n";

fwrite($fp, $line."\n");

while(!feof($handle) ) { //checks if the "end-of-file" (EOF) has been reached, good for looping through data of unknown length. $line = fgets($handle,4096);

if (preg_match('/^.+,.+,.+,.+$/', $line) > 0) { //handle $line here. fgets() function reads a single line from a file. //Max length of a line in this case is 4096 characters. Adjust based on needs. echo "\nThe line before splitting off the address:\n"; echo $line, "\n\n";

//Split the string into two parts and store the result in an array $array = explode(",\"", $line);

echo "\nThe address before standardization:\n"; echo $array[1]."\n\n";

//The 2nd item in the list is the Address, so you want to standardize it. The index for the 2nd //item is 1. $array[1] = $Address->AddressLineStandardization($array[1]);

echo "\nThe address after standardization:\n"; echo $array[1]."\n\n";

//Add the double quotes back to the address $array[1] = "\"".$array[1]."\"";

echo "\nThe address after adding the enclosing double quotes back:\n"; echo $array[1]."\n\n";

//Recreate the comma separated list after standardizing the list... $result = implode(",", $array);

echo "\nThe line before writing to the output file:\n"; echo $result."\n\n";

fwrite($fp, $result."\n"); } } } echo "\n ***** Standardized adddresses are saved to a file named output.txt.\n\n"; fclose($handle); fclose($fp);

It can read both CSV and TXT input files.

Here’s my input data:

Name,ID,Phone,Address

Abc Abc,123456,212-232-3432,"Fifty Six Ave, Building Six, New York, NY 10024"

Xyz Xyz,97532543,353-452-4624,"334 Glory Lane, Building nineteen, Providence, Rhode Island"

Opq Opq,534545422,234-535-2352,"Thirteen Sunshine Boulevard, Cape Hatteras, North Carolina"

Rst Rst,795736,327-687-4573,"Eleven Fifth Avenue, Apartment Six, New York, NY 10022"

And there's the output in text file:

Name,ID,Phone,Address

Abc Abc,123456,212-232-3432,FIFTY 6TH AVE,BLDG 6,NEW YORK,NY 10024

Xyz Xyz,97532543,353-452-4624,334 GLORY LN,BLDG 19,PROVIDENCE,RHODE IS

Opq Opq,534545422,234-535-2352,13 SUNSHINE BLVD,CPE HATTERAS,N CAROLINA,

Rst Rst,795736,327-687-4573,11 5TH AVE,APT 6,NEW YORK,NY 10022

Please feel free to tailor the code, reuse it, and contribute to it. Of course, I'd appreciate knowing how I could've made it better.

#PHP code to standardize U.S. mailing addresses

The Dark Side of NoSQL

Original Post

There is a dark side to most of the current NoSQL databases. People rarely talk about it. They talk about performance, about how easy schemaless databases are to use. About nice APIs. They are mostly developers and not operation and system administrators. No-one asks those. But it’s there where rubber hits the road.

The three problems no-one talks about – almost noone, I had a good talk with the Infinispan lead [1] – are:

ad hoc data fixing – either no query language available or no skills

ad hoc reporting – either no query language available or no in-house skills

data export – sometimes no API way to access all data

In an insightful comment to my blog post “Essential storage tradeoff: Simple Reads vs. Simple Writes”, Eric Z. Beard, VP Engineering at Loop, wrote:

My application relies on hundreds of queries that need to run in real-time against all of that transactional data – no offline cubes or Hadoop clusters. I’m considering a jump to NoSql, but the lack of ad-hoc queries against live data is just a killer. I write probably a dozen ad-hoc queries a week to resolve support issues, and they normally need to run “right now!” I might be analyzing tens of millions of records in several different tables or fixing some field that got corrupted by a bug in the software. How do you do that with a NoSql system?

Data export: NoSQL data bases are differently affected by those problems. Each of them is unique. With some it’s easy to export all our data, mostly the non distributed ones (CouchDB, MongoDB, Tokyo Tyrant) compared to the more difficult ones (Voldemort, Cassandra). Voldemort looks especially weak here.

Ad hoc data fixing: With the non-distributed NoSQL stores, which do posess a query and manipulation language, ad hoc fixing is easier, while it is harder with distributed ones (Voldemort, Cassandra).

Ad hoc reporting: The same with ad hoc reporting. The better the query capabilities (CouchDB, MongoDB) the easier ad hoc reporting becomes. For some of those reporting woes Hadoop is a solution. But as the Scala Swarm author Ian Clarke notes, not every problem is applicable to map/reduce. Either way you need to train customers and their expectations as they have become addicted to ad hoc reporting. This is not only a technical question, but a cultural one.

One solution is to split data that needs to be queried or reported (User, Login, Order, Money) and data which needs best performance (app data, social network data). Use a tradition SQL database for the first kind of data, and a fast, distributed NoSQL store for the second kind of data. Joining will be difficult, you need to support more different systems and skills are an issue. But the three problems can be solved this way.

What is your NoSQL strategy? Please leave a comment, I would like to know.

[1] they plan a distributed query language for ad hoc reporting in distributed environments

#The dark side of NoSQL

How to Install iTunes 10 on Ubuntu 10.10

Wine is a compatibility layer that lets you install Windows programs on Unix-like systems, including Ubuntu 10.10 Maverick Meerkat. Using Wine, it’s possible to install Apple’s iTunes jukebox software on a Ubuntu system. The implementation isn’t quite perfect, but it will let you play music and video files. (However, the iTunes store itself will not work.) Granted, you’re probably better off using Rhythmbox and the Ubuntu One music service, but if you insist upon iTunes, Wine will let you install it. And here’s how.

(Note: depending upon your video hardware and system configurations, these instructions may or may not work.)

First, you’ll need to lay some groundwork. You’ll need to install the ubuntu-restricted-extras package, so that iTunes will be able to play MP3 and AAC files. (If you clicked the option to install the MP3 codec during the Ubuntu install, you can skip this step.) You can do so with this terminal command:

sudo apt-get install ubuntu-restricted-extras

Enter your password to authenticate, and apt-get will download and install the ubuntu-restricted-extras package for you.

Once that is done, you’ll need to install Wine itself.

sudo apt-get install wine

Enter your password to authenticate, and apt-get will download and install Wine. The combined packages come to a little over a hundred megabytes, so it might take a while to install depending on your connection speed.

After the installation is finished, you’ll have a new Wine category added to your Applications menu. The Wine category will have four subcategories:

Programs, which lets you browse the installed Wine programs on your system.

Browse C: Drive, which lets you browse the C: drive structure Wine emulates for Windows programs.

Configure Wine, which lets you tweak settings for both individual programs and Wine as a whole.

Uninstall Wine software, which lets you remove Windows programs installed via Wine.

To install Windows software in Wine, you need to right-click on the installer file and select “Open With Wine Windows Program Loader”.

You can also install Windows software with Wine through the command line, which is often the easier way to do it. For instance, to install the example above, you would use this command:

wine ~/Downloads/INSTALL.EXE

If the application did not install, you’ll probably have to change the Wine settings. The best course is to probably browse the Wine application database, and see if you can find the correct settings there.

After Wine is installed, download the iTunes installer from Apple. Once the download is complete, go to your Downloads folder, right click on the iTunesSetup.exe file, and click “Open With Wine Windows Program Loader”. After a few moments you should see the installer program for iTunes appear. Follow the default prompts to install iTunes.

Depending on your video hardware, your screen might go black during the installation. You can usually get it back by clicking the mouse and moving the pointer around.

After the installer has completed (which should take several minutes), let iTunes run for the first time, and follow the default settings. (If iTunes should freeze up during the first run, reboot your computer, and it should resume normally the next time you launch iTunes).

iTunes will then gather data about any music files in your Music folder, and add them to your iTunes library. From that point on, iTunes should be functioning, though depending upon your video hardware and configuration, you might continue to see graphical artifacts on the screen. And note also that the iTunes store will probably not work.

After you close iTunes, you can launch it again by going to the Applications menu, to the Wine category, to Programs, to iTunes, and clicking on the iTunes icon.

Happy iTuning!

#How to Install iTunes 10 On Ubuntu 10.10 #Install iTunes 10 On Ubuntu 10.10 With Wine

Trending Blogs

Recently Viewed Blogs

Developify