Using a spoon to tunnel through the Big-Data Mountain
Today’s large datasets are difficult to analyze in Excel but very easy with R & Python programming .
You would think that after reaching a middle or upper management position, a Manager would be doing lots of cool things in their day-to-data activity. However, from our experience in the GCC, most managers spend a lot of time and energy struggling with excel by trying to analyze data for business insights and decision making.
The fact is, junior employees do not understand the business well enough and cannot analyze or explore data for newer insights. Hence the onus of analyzing data, finding insights, creating recommendations & advising seniors lies in every manager’s hands. So, starts the journey outlined below!
Excel becoming obsolete in today’s Data Rich Culture
Once upon a time, Excel was good enough for the small datasets that professionals typically used to work with. Today we are working with large datasets that become cumbersome to analyze in Excel. For instance, most managers mention the following in relation to struggling with excel.
How many times have you faced these familiar situations:
- When working with large datasets, the screen freezes as you scroll and each step in the analysis process takes time to execute.
- Data Cleaning gives you the nightmares because it requires significant and meticulous manual effort. For example,
- You have the different names for the same things in a column. For instance, DXB, Dubai, Dubay and Dubia (typo), etc. all mean “Dubai” in your City Column.
- Responses in the date & currencies column are not standardized.
So you grin & bear and start cleaning each problem one by one? And an analysis that you expected to be completed in minutes turns into hours just in data cleaning.
- Once the cleaning is over and you want to work with different data files, joining them is a struggle.
- For example, you have an HR file of all sales employees and their demographics like age, nationality, gender, income, etc. and a Sales file with sales of last one year with a column for which employee made the sale. You want to see how age or nationality of employee relates to Sales in different locations. How do you go about it in excel?
- What about if the scenario becomes more complicated and you have more files that you want to join? Say for example you have a training file which lists the different training programs all sales employees completed in the last one year and you want to see which training vendors optimize sales for employees by the employee age or nationality.
- Data Manipulation in Excel is very cumbersome too. You make new columns for small analysis steps and copy-paste entire sheets so that you do not lose old data. Only to realize that you missed some small step 2 hours back and your current sheet is incorrect!
- Data Mining is very limited. You have few functions and capabilities such as filter, sort, pivot tables, lookups, etc. but want to look at your data from different aspects and new dimensions, ask some interesting questions and explore and drill down because you are curious. But you don’t know if it is even possible to do so in excel and if yes how to get Excel to do those things, so you accept that this is a technology limitation and stop asking those types of questions.
- Graphical Visualization is limited. You see graphs on the internet such as these and wonder in awe what “programmer” could make them for you.
- And the WORST PART OF EXCEL is that you have to do all the above-mentioned steps again and again, every time as new data comes in, every month or quarter and sometimes each week!