Dead, so young – Microsoft Azure Data Lake Analytics

First touch with “ADLA” was one of familiarity (lo, ye olde YARN and a DAG-implementation, I salute you) and a thought of “you guys will come up with something better, soon”. Why the skepticism? Well, the combination of SQL and C# gave programmability…but for who?

IMHO: ADLA was mostly a quick-fix from Azure to serve Data engineers shoveling data – the analytics API was not for the masses. Let me explain.

The average data scientist is a Pythonista, or an R-head, with maybe Scala in the back pocket (because they have been playing with Spark on their own since it came out and get bored waiting for PySpark to catch up on the API:s and were thus forced to 😉) and MAYBE Julia if you are a real geek…but just SQL and C# for “analytics”? Counts and averages offered by ANSI SQL is not the analytics you are looking for in 2020, so the “something better” was bound to come up soon. Enter the contemporary stack: Azure Data Lake Store as the organized data layer and Azure DataBricks (=Spark as a Service) on top of that as the data preparation and analytics layer. Fate of “ADLA”? See left..

What tells that Spark really is mainstream? Someone pronouncing it already obsolete. When Hadoop stacks had Oozie orchestrating data operations from ingestion to data lake insertion, posts about Hadoop being dead appeared. So, Spark? I heard a bad-ass programmer proclaim “Spark is dead” in Madrid’s Big Data Spain already years ago (cough…Chris…cough).

What is the next thing? According to same programmer, it seems to be deep learning on GPU:s in hybrid multi-clouds. But honestly, that really is not the bulk of work on the table just yet. There is still data to integrate, clean and monitor for quality, before even plugging it as an un-modeled flat source for PowerBi. Guiding decisions with bad data may be worse than guiding them without data and a Machine Learning model trained with uselessly crappy data will be uselessly crappy.

So, what is the contemporary stack on Azure then? Analytics with an easy interface (DataBricks’ very much Jupyter-looking notebooks) on top of scalable data-store (Azure Data Lake), with an orchestration engine (well…Azure Data Factory DataBricks jobs).

So long ADLA. Like replicants, you also might have an expiration date sooner than later.

Time will tell.

Share this: