Friday, 11 March 2016

What (or who) is Kaggle?

Introduction

I met a good friend for lunch today who is well-informed on most anything (I really mean abso-bloomin’-lutely everything) in the IT arena.  I brought up the subject of Kaggle in conversation and he looked completely blank.  I tried to explain what it is to him but I don’t think I did a very good job so I thought I’d put that to rights here.
Kaggle, as per their website (https://www.kaggle.com/), is a “vibrant community [which] comprises experts from many quantitative fields and industries (science, statistics, econometrics, math, physics). They come from over 100 countries and 200 universities. In addition to prize money & data, they use Kaggle to learn, network, and collaborate with experts from related fields.”  In other words it brings the opportunity of working with real-life data to a vast audience which may not have access to the kind of projects that are out there and that they may be exceptionally good at.
Also on the website is the why? – “Many organizations don't have access to the advanced machine learning that provides the maximum predictive power from their data. Meanwhile, data scientists and statisticians crave real-world data to develop their techniques. Kaggle offers companies a cost-effective way to harness this 'cognitive surplus' of the world's best data scientists.”
I have to say, at this stage, that I was only recently introduced to Kaggle myself and think that it’s a great idea.  It’s a great way to learn data analytics and also receive feedback and the recognition deserved to anyone who takes part in the many competitions.  The main languages used are R and Python, although SQLite and Julia scripts are also shown in the scripts section of the site.

History

Anthony Goldbloom founded Kaggle in 2010.  Early in 2014, according to Inc.com, Kaggle had over 140,000 members (the article calls the members data scientists but the list includes many leading lights in the data analytics field, IT people who want to know more (like me) , students and a whole plethora of other interested parties).  The idea came to Goldbloom that he could create a way of solving problems through data science. Companies could post their problems on the website, and then any statistician who was interested could submit a solution that would be scored against any other entries.  He came by the name by writing an algorithm to see what one-word names he could get a URL for.

The site

Most, if not all, of the competitions cannot be solved by one area of expertise alone.  The member or team must have a unique blend of skills and several different factors available to them.  The aim of the competition is to find the solution that is the best fit, as there can be many ways to solve the proposed challenge.  On many websites and blogs the same words appears over and over again – perseverance and persistence.  Both are required as there is a lot of trial and error involved in getting to the finishing line.
Most competitors are not motivated by money but by challenge of the competition itself.  Many more members use their own algorithms and develop them on a continuous basis, so there is no such thing as resting on your laurels.  The competitions seem to be pictured as an ongoing development process rather than one-off challenges to the members.

That’s a very high-level overview and doesn’t get anywhere near the depth of knowledge about the site whatsoever, but is only a taster.  The competitions are free to enter and the site is well worth a visit.

No comments:

Post a Comment