All social media engagements from Big Data Week - powered by eventifier
The event of this week is from London, UK. If you're not serious about data - you're missing out. When James Bond's arch enemy owns a data centre in the middle of the desert in Morocco, it's time even you start taking Big Data seriously.
Big Data Week is one of the most unique global platforms of interconnected community events focusing on the social, political and technological impacts of Big Data (twitter - @bigdataweek, #bdw15).
The festival connects a number of global cities through locally hosted meetups, events, networking functions, data visualization demos, debates, discussion and hackathons. Here are the top twitter handles during the event.
Big Data Week Leaderboard - by Eventifier.
They're working to connect communities of specialists from various backgrounds – Data Science, Data Technology, Data Visualization and Data Business – across major industry sectors: Media and Entertainment, Health and Science, Financial Sector, Retail and FMCG, Public and Government, and Social and Personal. That's the entire Human Civilization pretty much, don't you think?
Make sure you check BDW's regular updates through twitter and other channels. And play a significant part in this data driven change in our society. Showcase your own event at eventifier.
It's Big Data Week this week, with a wide range of data-related events taking place in cities around the world.
My wife is travelling this week, so I'm staying close to home. Luckily, the excellent C4DI just down the road in Hull is hosting two events and I can make one of them...
Get out there, and support an event close to you...
Big Data Week Day 2: How Argonne is using supercomputer Mira to crunch mega-sized data to create visualizations from the formation of galaxies to aneurysms
(A visualization by the Argonne Leadership Computing Facility utilizing real patient MRI data displays blood flow of a brain aneurysm.)
Joseph Insley, principal software development specialist at Argonne National Laboratory, gave a presentation about how the lab is using its supercomputer Mira to create visualizations that crunch data at a galactic size and monstrous computation speeds.
Insley works with the Argonne Leadership Computing Facility (ALCF), which allows outside groups to apply for time to use the facilities. (It's really competitive!
It's that processing power that allows the ACLF to process and store massive amounts of data.
It's not enough to just process the data, but displaying it can require a bit of horsepower as well. Insley said that's where "Tukey" comes into the picture. Tukey is the compute cluster that processes the images for visualizations.
To provide some context, a typical MacBook Pro utilizes a single Nvidia GeForce GT 650M graphics chip. Tukey contains 96 AMD Dual Operation Compute nodes, with each node containing 16 CPUs, 64 gigs of RAM, 2 NVIDIA Tesla M2070 GPUs —at 6GB of RAM each.
It's that visualization tonnage paired with Mira's computing power that allow ALCF to tackle projects in climate, nuclear energy, medicine and astrophysics.
One of the projects Insley has worked on is the visualization of arterial blood flow, to accurately model physical and biological systems.
Insley said this requires "simulating on multiple scales, [which] results in very large, complex data sets."
"Problems like this and many others that run on our biggest machines, visualization is often the best and sometimes the only way to really make sense of all this data."
In the animation example below, they used real patient MRI data to study an aneurysm in the brain, analyzing large scale blood flow.
The vizualization is essentially figuring out how a person's blood flood on a large scale affects one tiny region of the brain on the particle-level.
Another example that Insley worked on visualizing was a cosmology simulation, which sought to accomplish a little task like "the evolution of the universe."
According to Insley, the job is so large that it is an ongoing simulation that occasionally pauses to collect data and before resuming.
Insley said code that powers it tracks "the behavior and interactions of individual particles, actually 1.1 trillion of them."
He said it's the largest simulation of its kind to date.
Insley said there's certainly a limit to the amount of data they can process. Being able to write the data to disk becomes difficult with size.
He said the ACLF actively looks at ways where we can do analysis and visualization before it gets written to disk.
When asked about the importance of visualization for scientist Insley said it can be used for validation. "They expect things to be one way, and the visualization can reaffirm that."
He said at some point in the future you can take the scans and immediately be able to run a simulation what's happening in a patient.
For now, the tricorders and starship scanners may be on hold.
Big Data Week Day 2: Cook County seeks to use Big Data and analytics to cut costs — and implement Obamacare
(Dr. Bala Hota of Cook County Heath and Hospitals System, and Lydia Murray CIO of Cook County)
Patient records is one of the largest pillars of Big Data to be tackled.
While the politics of the Patient Protection and Affordable Care Act, often dubbed ObamaCare, are far from over, there were several key provisions that mandated a switch to electronic records.
The integration of patient records into broader federal government reforms has been a big task, which has strained rural hospitals, but jump started larger ones to cut costs.
In Illinois, the task of getting Cook County's records on the grid falls on Dr. Bala Hota.
Hota heads up the effort to digitize medical records and streamline workflows at Cook County Health and Hospitals System.
He has a masters in public health and specializes in infectious diseases having done a residency at Rush University Medical Center.
He and Lydia Murray, Cook County's chief information officer, made a presentation and took online questions for Big Data Week.
Murray was recently appointed to that position by county President Toni Preckwinkle last July. She was a former deputy chief of staff under Mayor Richard M. Daley.
In an odd role reversal, Murray actually interviewed Hota, with host Steve Boyce occasionally popping in with questions.
"The Affordable Care Act has been something really driving change at the health system, pretty incredibly," he said.
"Cook County Health and Hospitals system is one of the first sites in the country to be able to get access to the expanded Medicaid population that is going to be present in 2014 for most of the country," he said referring to a waiver granted last year.
It's that waiver that he says the county is using as a launchpad to streamline electronic records, essentially forcing the Cook County Hospital to assume the financial risk of not adapting fast enough.
For Cook County residents, those at the poverty line, U.S. citizens, "Cook County Health and Hospitals System will assume the risk for care for those patients," Hota said.
"In other words, we will receive a per-member-per-month payment from the federal government, and in exchange, we need to improve the outcomes for those patients and decrease costs."
Like businesses, Cook County is seeking a quick return on investment by consolidating data warehouses and applying analytics to measure performance in hopes of streamlining care.
Hota also cited the need for the system to adapt to adhere to the HITECH Act, which passed in 2009, mandates the use of "meaningful use of health information technology."
What does this all come down to?
This essentially requires Cook County to adopt a massive IT overhaul, while retraining staff to adhere to new practices.
This can take the form of doctors being required to fill out fields and notes while observing patients.
(This slide is an example of the information doctors will have available when observing a patient.)
The end result takes the form of interfaces that feed into a database, which allow administrators to measure the performance of doctors —and and they hope— the quality of care.
And doctors, staff and others using the system will have a report card of sorts documenting how well they're inputting data as show in the image below.
Hota said that utilizing metrics could better assist in preventative care, being able to use a database history of medications, previous visits and measurement of symptoms to help staff diagnose and better treat patients.
From an adminstrative point of view, it's about managing resources, reducing paperwork and expediting scheduling.
The end result would look like this, reducing the amount of patients that don't show up to appointments — or preempt an appointment with analytics to better inform doctors of a health history.
Like all Big Data systems, privacy is a central concern for health records.
The same day that Hota gave his presentation for Big Data Week, the Chicago Tribune published an article about breeches in security with patient records in Illinois, including three at the Cook County Health and Hospitals System.
Hota said their new systems will include security permissions that insure only the care providers (doctors) that need access to patient records will have them.
The Health Insurance Portability and Accountability Act (HIPAA) is the law that requires health care providers to ensure patient records are kept private.
However, for the purposes of addressing public health problems on a larger scale, Hota said the data collected can be used in collaboration with outside research institutions.
Hota said there opportunities for the information to be released, but would have to done very carefully with guidelines.
When asked what type of information would need to be stripped out for the data to remain private, but useful, Hota gave a few examples.
"Personally identifiable information, personal health information, the security level around that data, it's essential that that be preserved."
Accordng to Hota, HIPPA has language that describes what information should be private such as name, date of birth, date of care, address and social security number.
In order to release information to outside groups or insitutions that may use the data to study infectious dieseases for instance, may require the use of a "limited data set."
Hota said a limited data set might be "the removal of a name, use of the age, and not the date of birth; removal of all dates to the resolution of the month and year – and use of zip, not the single point address."
Big Data Week Day 2: Translating Web metrics into English with Quill
By Elliott Ramos
Does your site use Google Analytics, but you have no idea what the results mean?
Larry Adams (@larryadams) Vice President of Product for Narrative Science is hoping Quill can help deal with the Big Data of websites: metrics.
While news sites and big sales sites such as Amazon regularly employ analytics experts who parse site performance, find out how readers and customers are using their site, many businesses and even smaller governments don't know what their users need.
The most basic of these tools are the Google Analytics platform, which can tell a small business owner how well their site is performing. For larger sites, the industry standard is Adobe Analytics, formerly Ominture.
While nice if a company can afford resources dedicated to a large-scale site, often it's the case that a person managing a particular department or small municipal website has to attempt to figure out the performance of a product or project, only to have a webmaster hand them the credentials to their Google webmaster account and be told: 'figure it out.'
Adams touts Quill as a potential solution for people with the three following needs:
Content-oriented site (After an audience)
Lead generation (After contacts)
Sales-oriented sites (After dollars)
Adams said they wanted to "have a little more empathy with our own customers."
He said quill will automatically analyze key metrics and generate user friendly reports for website owners.
He said quill will also find trends and make comparisons across one year of data.
While Google is considered the mass market option for many in the tech world, for those who can't even pick apart HTML, it may as well be Latin.
Adams said Quill hasn't built in functionality for the public sector yet, but is seeking input from nonprofits, and that there has been a heavy focus on making the reports mobile friendly.
"A lot of companies have been using big data for years and years, but we just didn't call it that."
That's what Russell Lankenau (@RussLankenau) outline in his online presentation "Beyond Social Media. Big Data for Business.
Lankenau is a Central Solutions Architect for MapR Solutions.
His company helps other firms take existing infrastructure and make them more adaptable.
Last month, the Wall Street Journal reported about big data and Big-Data solutions that are being used by product development teams and even HR groups.
While predicting presidential elections and scary futuristic advertising that knows your needs is the more attention-grabbing part of Big Data, the infrastructure that supports the likes of Twitter, bitly, Craigslist and Facebook.
Many of those companies use a framework called Hadoop, which is the same framework that powers the city of Chicago's data portal website — at no cost. (Still paying for Socrata, though!)
He said businesses should address the three V's when it comes to Big Data: Velocity, variety and volume.
Velocity of data:
He used Facebook or Twitter as an example of data having velocity, with millions of users contributing. He says, however, that there are problems with velocity. He says typically your toolset is too slow, not fast enough to cope with big data.
The key problem you're looking at is I have data, need to process it and it takes this long. It's a problem with your technology and the way you're processing it.
Variety of data:
Lankenau says this is when businesses have a lot of different types of data. (For media outlets, this would be tantamount to combining and making sense of multiple data sets to tell a single story.) Lankenau said companies typically develop "point solutions" for each problem, but using multiple data sets can spell trouble because the tools can't adapt. They work in a single type of way. "It can take months to build out new data. Single-purpose tools are not flexible. It's not easy to add new data or new tasks," he said.
Volume:
"Why do we have so much more data than we do before? We don't. New uses of existing data. To do that, they need to store more data," Lankenau said. This is where Hadoop comes in.
He said this allows for "one platform to do all of your analysis needs... a single source of truth for our data."
In corporate terms, Lankenau is talking about moving old-school methods of storing business information, or in the case of a city, records on police computers, CTA computers and others on a single platform, that doesn't crash or require expensive software and massive super computers.
"Existing applications can tie directly into your Hadoop infrastructure. Make it easy to manage via a single Web interface," he said.
His company specializing in upgrading firms to be in sync with Web 2.0. "Single point solutions" can take the form of an expensive piece of software that IBM will sell a business or government to address a particular need, but when a new problem or hurdle arises, it can be hard for that software to do the job, requiring yet another piece of software or hardware that will be outdated by the time it's implemented.
When asked about the expensive solutions he said they can run in the millions of dollars. And that security with Hadoop, while always a concern, is only now becoming a focus. And that often private firms will try to restrict access as much as possible.