On Building A Successful Big Data Analytics Organization
Nowadays, Big Data Science and Analytics are some of the hottest areas in the fields of science, technology, and business. Most business leaders and executives have realized by now that they need some “Data Scientists” or “Big Data Analysts” to do something special, new, and beyond traditional techniques and tools. Business leaders have started to understand that they need to pay more attention to data capital, which should be treated on par with financial and human capital. Data is becoming a valuable asset for those companies that can take advantage of it. To do so, one needs the right people and needs them now.
Finding the right data scientist is not easy to do because most of these folks are either hired already or are too expensive
Where is one to find all those folks who know Hadoop and Spark, SAS and R, and have expertise with random forests and support vector machines? Specifically, how does one find them quickly when everyone else is also looking for them? How does one hire them without breaking the bank? Also, how does one avoid hiring the wrong people–those who cleverly use big data jargon to mask an absence of true depth?
Finding the right data scientist is not easy to do because most of these folks are either hired already or are too expensive. Thus, hiring great data scientists is becoming quite difficult as the supply and demand balance is heavily skewed towards demand.
With this in mind, I would like to share some tips from my personal experience in building such organizations in the last 3-4 years. Judging by our results at Seagate, these rules have led to several successful organizations. However, this process might not be suitable for everyone.
Rule number one-we only hire PhDs. The justification for this is simple-we want to have people who have undergone the rigorous “training” that a PhD program provides. Such individuals have not only obtained advanced knowledge and skills in some relevant field but have also learned to work independently, analyze literature and all the known facts, summarize their findings, identify main current problems, propose possible solutions and defend them with data, collaborate well, and execute under pressure of tight deadlines. Also, they were trained to write papers and reports and to give clear and to-the-point presentations, including public speeches (at conferences, etc.). From what I’ve seen, the difference in compensation between new hires with PhD and MS is no more than 10-20percent, which is a small price to pay for the extra 3-4 years of training.
Clearly, the above rule does not apply in cases when we need a software developer or someone else with other specific skills.
Rule number two-all candidates (with some rare exceptions) need to have some math/numerical simulation/modeling experience. This will almost universally require strong programming skills, although not necessarily with the languages that are the most relevant to Big Data Analytics, such as R and Python. They can learn those languages later and quickly (in our experience, in less than 3 months). Don’t be concerned when you hear “Fortran” or “C++”. I haven’t seen a single case when a candidate hasn’t learned the “new language of analytics” quickly.
Rule number three (and this is the most important rule)-a demonstrated ability to learn quickly is critical. Data science is a rapidly evolving field, so experience with a particular technique is not as important as a demonstrated ability to learn. Consider outstanding individuals in various science and engineering fields with strong analytical requirements in addition to pure data science. Over time, candidates meeting these requirements have proven to be the “superstars” we hoped they would be. In other words, once a star, always a star. Remember: talented people can always learn new skills, which is especially important in such a dynamic and fast-changing field as modern data science.
Rule number four-you still need a few very experienced folks in the group-experts in machine learning or data mining, for example. However, assuming that rules 1-3 are followed closely, the organization becomes so strong and capable that you need no more than 10-20percent of your folks to be those experts early on–they can guide and mentor the rest of the team. Finding a few data science experts and many more superstars is a much simpler and realistic thing to accomplish than finding all superstar experts.
Rule number five-this is a universal rule I personally use for hiring-always ask a candidate to start his/her live interview with an hour-long presentation (this time should include a Q&A session). The presentation may be given on any relevant subject chosen by the candidate. Tell the candidate upfront that this is his/her chance to show off, to demonstrate one's own abilities and to present knowledge and skills in the best possible light. Believe it or not, most of our hiring decisions are made by the end of this hour. We still follow it up with 5-6 thirty minute-long 1:1 interviews but, historically, they simply solidify our initial conclusion made at the end of the presentation.
Rule number six-be patient with your new hires. Expect them to spend 2-4 months adopting and learning new things before they start delivering at 100percent (this would include all the new programming languages and tools from rule number two). Most attempts to hire a Perfect candidate ASAP–the one who will “hit the ground running”–will likely result in major delays and might force you to make your final decisions in a hurry when you run out of time.
All in all, in my opinion, it is not terribly difficult to build a world-class Big Data Analytics organization. Just follow the six simple rules listed above.