Everyone wants a data scientist, yet most have no idea what that would mean – it’s the cool thing to want and even cooler thing to sell. The problem with data science is that it will take time to evolve technology, business and industry to make the discipline and role effective. Until then you actually want something even more rare than the data scientist, a software engineer who knows data.
The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. http://www.mckinsey.com/features/big_data
I am bullish on the data science concept as a field of study. I am hoping that at least half decide to get their masters in something different, because diversity is still more important than a specific skill set. The focus on data as an important domain of understanding is not new, but it does appear that many people have gotten by without – imagine how much better everything would have been if we knew what it was we were looking at or what questions we should have been asking!
As we enter the trough of disillusionment for big data, it would seem a side effect is the increasing demand for people that understand data, ask the right questions, design the right approaches and help build the right solutions. Instead of waiting for data scientists, we should be build them not wait to buy them.
First, it is going to be hard to hire the best data scientists. If you can’t source the most talented, I would argue there is a significant drop in ability to the point that additional strategies are required.
Second, most of the data scientists you will recruit will be lacking domain knowledge. It takes time to impart this information, especially if they have less experience, since that means they have less to build upon.
Third, I am no longer certain that what you want is a data scientist, at least not initially. I liken them to the recent and desperate need for Hadoop. Enterprises made significant investments in a subpar distributed file system and a framework for distributed processing. Most of those enterprises lack the attributes that drove those innovations and it is easy to argue the costs outweighed the benefits. Nonetheless, everyone needs to be “innovative†and so it goes, millions to build out data lakes without a lifeguard on duty. Data scientists require leaders and environments that build on their unique strengths – aspirational visions and tangible impact.
Instead of trying so hard to hire the data scientists from “MIT,” consider that it might make more sense to grow your current organization in understanding a little more data science. Your current top talent has domain expertise and if you have software engineers, the skills to deliver. With some help on the basic techniques and algorithms you actually have a practical strategy. Maybe you hire that MIT graduate as an advisor to the software engineers required to effectively build production data analysis?!
Data scientists are going to be a rarity for most enterprises and in some cases entire industries. This capability will be available through industry focused consulting offering the edge of having the knowledge without the expense of sourcing, leading, managing and retaining. It is more important to be able to build the first order techniques and algorithms against a data set you understand, because most of the time that will be good enough, more than enough. Moreover, if you are in a domain where more is required, then you already have hired data scientists – I am not directing this at you. Software engineers that know enough about data science are invaluable, since they are able to realize, at scale, the way in which to operationalize that data thinking. Without them, you might end up with a team of data scientists only able to offer solutions limited to the supporting technology they know – a limiting factor unless you want to create presentations and demos.
We have a world filled with educated individuals often performing tasks unrelated to their learning. Many have proven their ability to be highly successful which underscores how slight the specifics have been in their success. I think it is fair to say that more is required than the degree(s) and involvement – those events required an individual in tune enough to make something of those experiences to achieve great things. Data science, be it the scientist or engineer, is going to need more than the obvious indicators to make a go of it. As such, I can only recommend that we help people grow in diversity in their involvement, application and creation of unexpected things.