As I was walking through the exhibits of O’Reilly’s “Making Data Work” Conference a few weeks ago, a vendor stepped in my path:
Vendor: “What is your organization’s Hadoop strategy?”
Having done a metric crap-ton of events as a vendor, I was sympathetic to what he was trying to do. However, his premise of a tool being the One True Technology (OTT) for which my business would be expected to have embraced as a strategy struck me as absurd. He might as well asked about what we’re doing about Microsoft PowerPoint or vim.
Jim: “We’ve been using it the last five years as a floor wax. However, I recently discovered it also makes an excellent dessert topping.”
Flummoxed by a non sequitur his Solution Selling course could not prepare him for, he disengaged, letting me go about my business. There seemed to be a lot of businesses offering Hadoop Business Strategy Optimization.
It was great being out of the office again, talking to people working on similar types of data problems but in completely different environments. This is one of the things I miss most about leaving Tecplot.
I’m still compiling my notes for an internal presentation, but several of the keynote presentations have been posted. My top three:
- David McRaney, Survivorship Bias and the Psychology of Luck, which is a redux of his podcast, but still a great talk. The premise is that when failures become invisible, you tend to focus more on successes, not realizing that you’re missing some vital pieces of information.
- David Epstein, Small Data in Sports: Little Differences That Mean Big Outcomes – this was timely given the Olympics starting. The performance between the winner and runner up is typically less than 0.5%. While there are efforts to gather lots and lots of data, there are successful applications using small data, reducing a sport to a small handful of things they could affect. (His longer talk went into this in more depth.) The punchline: the 10,000 hour rule is missing the +/- 10,000 hours.
- Rodney Mullen, The Art of Good Practice. What I liked most about this was the meta-message that “Everyone from the community comes with their own backgrounds, own attributes, something you don’t have.” Though it would have been easy to gloss over the presentation because of the skateboarding vernacular (“bracketing the feeble grind”), he offered some interesting ideas about focusing the type of practice.
The tutorials were generally good. I’d planned to sit in on the MLBASE track, but at the last minute, switched to John Foreman’s, Dissecting Data Science Algorithms Using Spreadsheets, based on his book, Data Smart, where he provides an overview on a handful of important algorithms using Excel.
While you’d be unlikely to use Excel for any non-trivial problem, it lets you learn the underlying algorithm (so you can apply it in the appropriate business context) rather than learning programming. For the non-trivial case, you’d likely use R, Weka, or the Berkeley Data Analytics Stack (BDAS).