Horse with a Pointy Hat

Musings of a data scientist and recovering astrophysicist

Why "Horse with a Pointy Hat"?

We keep hearing that big data is the new oil and that data scientist is the sexiest job of the 21st century. What I knew, in 2012-13, was that I was living in the heart of Silicon Valley seeing a lot of cool tech start-ups and hearing about lots of cool, new big data techniques and algorithms. At the time I was an astrophysicist working at Stanford; I'd always been closer to the data than the theory and had had to incorporate disparate datasets. I had also recently started dabbling in machine learning so all the buzz in the valley sounded really exciting, there were lots of new challenges out there for which the skillset I'd developed since my PhD was hugely suited. And I was looking for new challenges...

However, the first part of that challenge was how to make a transition from an academic environment to an industry environment. Part of it I knew was skilling-up in certain areas of data science that my research career had not fully prepared me for but also important was getting some real experience and building a strong network. Fortunately I saw a post from Kim Nilsson; she had founded a company, Pivigo, and was planning to run a data science bootcamp called Science 2 Data Science (S2DS). Kim was looking for insight into what current academics would look for in a programme to aid transition into a data science role. The fact that I was already exploring these questions myself meant that we had a good exchange of ideas and directly led to my (successful) application to be a member in the first S2DS London cohort back in August 2014.

At any rate I had a great experience, and had the opportunity to do data science in a business context; all of which contributed to my decision to leave academia. The details of the experience aren't really the point of this post (perhaps a future edition), what it's been working towards is why this blog is called what it is. We often hear about people searching for Data Science "unicorns", mythical beings that don't really exist, and as this was frequently referenced during S2DS when I gave the class graduation speech at the graduation dinner I effectively said that while we might not be unicorns we are at the very least horses with pointy hats. Some friends requested it so I've included the full text of the speech below:

Good evening everyone, it is an honour to be standing here before you and speaking on behalf of all the S2DS participants: despite having graduated far too many times, this is the first time I’ve been asked to speak. "May you live in interesting times" is an apocryphal quote, that regardless of its origins, is somewhat apt in the era "Big Data". And it has certainly been "interesting times" for my fellow participants and I over the past five weeks. The volume, velocity and variety of the information that has been thrown at us in the lectures and projects has been pretty intense and I just hope that the veracity was there too!

All of us who have grasped hold of the opportunity to join S2DS have come from advanced academic backgrounds as well as from diverse subject areas and specialties and, one of the greatest things I have found is that we all bring unique ideas and approaches to the table that indicates to me the shear wealth of power that we all can leverage in an industry that needs innovative, analytic problem solvers.

I think that one thing, that without doubt, everyone in S2DS will now whole-heartedly agree on is that the old data science adage that 80% of your time is spent cleaning and wrangling your data is 100% true. And even then the chances that any single data sample will be complete is pretty low, which reminds me:

"There are two types of people in this world: those who can extrapolate from incomplete data and those that ..."; I hope that we are all in the former category

I come from an Astrophysics background and have been fortunate enough to work around the globe and in some large space telescope collaborations. It was while I was working at Stanford in California that I first encountered the term "data scientist" and that I saw friends and colleagues moving into the sector. And what surprised me was how exciting and "research-like" the work they were doing was and how it required the skills I’d been building in academia, was innovative and had great discovery potential. Plus the salaries were rather nice too! I just hope that the data science salaries in the UK and London markets start catching up with Silicon Valley.

But this is meant to be a speech on behalf of the S2DS participants not simply my thoughts and experiences, so taking a leaf from Kickstarter, I crowd sourced part of this speech, although the only reward I offered was anonymity!

“The Spark tutorial: Beauty of Scala + Machine Learning = Awesome!”

“A gargantuan multitude of great people, buzzwords, and non stop discussion.”

“How to chew the cud, and the nipple dance”

“Being in London with all these great people has really helped me decide what I want to do in life... Which probably isn't academia!”

“I finally graduate having worked with the best team I've ever had.”

"I came to S2DS and I didn't even get a lousy T-shirt"

“I already knew that working with data is both fun and useful. The point is that thanks to this programme now I know how to do it properly.”

Despite being a small sample I think I can safely say that the sentiment analysis was resoundingly positive! From my perspective, this has been a great opportunity to meet some very interesting people both working and aspiring data scientists. The practical experience of working with commercial data and interacting with business problems has been a fantastic challenge in applying skills and techniques that we have been developing over our academic careers in a context that is out of our comfort zone and with the realisation that we all can do it successfully. And if I’ve learned nothing else these past five weeks it’s that most data scientists we’ve met all seem to HATE the label "Big Data".

So let me draw to a close by offering a big thank you for all of the support, both financial and technical, from all of the S2DS sponsors and mentors; and remember if you’re looking to employ some data science unicorns you are currently sat in a room full of, at the very least, data science horses with pointy hats on.

A tremendous thank you to Kim and Jason for organising and running the programme and giving us all the opportunity to develop our practical data science skills. And all that remains is for me to congratulate everyone once again for making it to graduation, good luck in the future and stay in touch.


Comments powered by Disqus