Over the past year, I’ve switched my focus from SQL Server and Teradata to Hadoop. As someone who has spent the majority of my professional career focused on SQL Server and who has been awarded as a Microsoft Most Valuable Professional (MVP) in SQL Server for 4 consecutive years, it comes as no surprise that I often get asked:
“Why are you switching to Hadoop? Is it better than SQL Server?”
I’ll save you the suspense of a long post and answer the second question first: No, it’s not.
SQL Server is Still Relevant
Here’s why. SQL Server does what it does *extremely* well. I would not hesitate to suggest SQL Server in numerous scenarios, such as the database backend for an OLTP application, a data store for small-to-medium sized data marts or data warehouses, or an OLAP solution for building and serving cubes. Honestly, with little exception, it remains my go-to solution over MySQL and Oracle.
Now that we’ve cleared that up, let’s go back to the first question. If SQL Server is still a valid and effective solution, why did I switch my focus to Hadoop?
Excellent question, dear reader! I’m glad you asked. 🙂
Before I get to the reason behind my personal decision, let’s discuss arguably the biggest challenge we face in the data industry.
Yes, Data Really Is Exploding
We’re in the midst of a so-called Data Explosion. You’ve probably heard about this… it’s one of the few technical topics that has actually made it into mainstream media. But I still think it’s important to understand just how quickly it’s growing.
Every year, EMC sponsors a study called The Digital Universe, which “is the only study to quantify and forecast the amount of data produced annually.” I’ve reviewed each of their studies and taken the liberty of preparing the following graphic* based on past performance and future predictions. Also worth noting is that, EMC historically tends to be conservative in their data growth estimates.
* Feel free to borrow this graphic with credit to: Michelle Ufford & EMC’s The Digital Universe
Take a moment and just really absorb this graphic. They say a picture is worth a thousand words. My hope is that this picture explains why the concept of Big Data is so important to all data professionals. DBAs, ETL developers, data warehouse engineers, BI analysts, and more are affected by the fact that data is growing at an alarming rate, and the majority of that data growth is coming in the form of unstructured and semi-structured data.
Throughout my career, I have been focused on using data to do really cool things for the business. I have built systems to personalize marketing offers, predict customer behaviors, and improve the customer experience in our applications. There is no doubt in my mind that Hadoop is absolutely critical to the ability of an enterprise to perform these types of activities.
The Bottom Line
SQL Server isn’t going away. Arguably, the most valuable raw data in an enterprise will still be managed in a SQL Server database, such as inventory, customer information, and order data.
So again: why did I make the decision to focus on Hadoop over the past year?
I once had the pleasure to work for a serial entrepreneur. One day over lunch, he gave me a piece of advice that resonated with me and would come to influence my whole career: “Michelle, to be successful in whatever you do, you need to find the point where your heart and the money intersect.”
My heart is in data, the money is in the ability to effectively consume data, and Hadoop is where they intersect.