Published by Marin Ferara at

12

Making Life Easier with Databricks

 

As a consultant using PySpark on the daily basis, I always ask any Data Scientists or Data Engineers if they have ever tried using it. The usual response: “It’s nice once you get it running.” Installing PySpark on Hadoop can be a lengthy process (check out our guide to installing PySpark on Windows), especially for someone without a lot of Hadoop knowledge. And this is given that one already has access to a Hadoop cluster.  

Enter Databricks. With a few clicks and setting up a few parameters, you spin up a cluster of EC2 instances in a few minutes. You are greeted by a notebook (similar to Jupyter Notebook) where your Spark context is already initialized, and you can start developing. 

As a company living by the motto “From idea to production,” it only seemed logical to start the partnership with Databricks. We have already started using it for building sales packs, most notably, a convolutional neural network (CNN) which can recognize a car model with an accuracy of a whopping 84%. Databricks played a very important role here, one could call it a miracle. Evan, our Data Scientist, was training the model on his own laptop. It took roughly 1.5 hours to perform a single epoch of training, while the laptop would be getting hotter and hotter. The model was achieving around 50% accuracy. After gaining access to our Databricks Suite, Evan spun-up a cluster with a GPU-enabled cluster. GPU makes the training process much faster. Indeed, 50 epochs of training were done in a matter of hours. But the biggest surprise was that accuracy of the model jumped to 84%. We are in close discussion with Databricks technical staff in order to explain this phenomenon. 

We are looking forward to the close cooperation with Databricks in making our customers happier. Get in contact with us if you are interested in giving Databricks a try (marin@datainsights.de), or give the official documentation a read https://docs.databricks.com/index.html. If you wonder what more is there to like, check out these Slack messages from our colleagues:

 

img img