Introduction
I met a good friend for lunch today who is well-informed on most
anything (I really mean abso-bloomin’-lutely everything) in the IT arena. I brought up the subject of Kaggle in
conversation and he looked completely blank.
I tried to explain what it is to him but I don’t think I did a very good
job so I thought I’d put that to rights here.
Kaggle, as per their website (https://www.kaggle.com/), is a “vibrant community [which] comprises experts from many
quantitative fields and industries (science, statistics, econometrics, math,
physics). They come from over 100 countries and 200 universities. In addition
to prize money & data, they use Kaggle to learn, network, and collaborate
with experts from related fields.” In
other words it brings the opportunity of working with real-life data to
a vast audience which may not have access to the kind of projects that are out
there and that they may be exceptionally good at.
Also on the website is the
why? – “Many organizations don't have
access to the advanced machine learning that provides the maximum predictive
power from their data. Meanwhile, data scientists and statisticians crave
real-world data to develop their techniques. Kaggle offers companies a
cost-effective way to harness this 'cognitive surplus' of the world's best data
scientists.”
I have to say, at this stage, that I was only recently
introduced to Kaggle myself and think that it’s a great idea. It’s a great way to learn data analytics and
also receive feedback and the recognition deserved to anyone who takes part in
the many competitions. The main
languages used are R and Python, although SQLite and Julia scripts are also
shown in the scripts section of the site.
History
Anthony Goldbloom founded Kaggle in 2010. Early in 2014, according to Inc.com, Kaggle
had over 140,000 members (the article calls the members data scientists but the
list includes many leading lights in the data analytics field, IT people who
want to know more (like me) , students and a whole plethora of other interested
parties). The idea came to Goldbloom
that he could create a way of solving problems
through data science. Companies could post their problems on the website, and
then any statistician who was interested could submit a solution that would be scored against any other
entries. He came by the name by writing
an algorithm to see what one-word names he could get a URL for.
The site
Most, if not all, of the
competitions cannot be solved by one area of expertise alone. The member or team must have a unique blend
of skills and several different factors available to them. The aim of the competition is to find the
solution that is the best fit, as there can be many ways to solve the proposed
challenge. On many websites and blogs
the same words appears over and over again – perseverance and persistence. Both are required as there is a lot of trial
and error involved in getting to the finishing line.
Most competitors are not motivated by money but by challenge
of the competition itself. Many more
members use their own algorithms and develop them on a continuous basis, so
there is no such thing as resting on your laurels. The competitions seem to be pictured as an
ongoing development process rather than one-off challenges to the members.
That’s a very high-level overview and doesn’t get anywhere
near the depth of knowledge about the site whatsoever, but is only a
taster. The competitions are free to
enter and the site is well worth a visit.
No comments:
Post a Comment