Once again it is the time of the year when the days are long, people are in a good mood, and everyone is excited to spend the next few days at … the Spark + AI Summit! This year, due to the Covid-19 pandemic, for the first time online and free of charge. I’m thrilled about it since it means that anyone with an internet connection can participate. If this is your first Spark + AI Summit, let me walk you through what you should not miss.
If there are two talks you should attend every day, it’s the keynotes. Keynotes are packed with new feature announcements and success stories from the frontline. This year’s keynote lineup consists of Databricks founders Ali Godhsi, Matei Zaharia, and Reynold Xin, maintainer of PyTorch Adam Paszke, director of Data and Analytics Engineering at Starbucks Vish Subramanian, West Coast Head of Engineering of FAIR (Facebook AI Research) Kim Hazelwood and many more. What I particularly like about the keynotes is that the speakers not only talk about new features, but also demonstrate them.
What are the topics that you can expect this year you might ask? First, a few days ago, Spark 3.0 was released, with new features that make it faster and with added support for Scala 2.12 and Java 11. Here is a good blogpost describing the most important ones: https://databricks.com/blog/2020/06/18/introducing-apache-spark-3-0-now-available-in-databricks-runtime-7-0.html. Next, there has been a lot of new developments with Delta Lake (currently at version 0.7.0) which brings ACID transactions to Spark (Spark on ACID) and is at the center of the Data Lakehouse architecture. The field I’m the most excited about is MLOps. Data science teams are increasingly realizing that the maintenance of ML models is not trivial, especially if you are deploying hundreds of models to production. The solution that Databricks is developing for MLOps teams is MLflow, but there are many other frameworks around, such as Netflix’s Metaflow, or Kubeflow, built by Google, Cisco, and others.
Apart from keynotes, there is a wide range of interesting talks, scaling deep learning to the demonstration of Spark 3.0’s brand new features. Just take a look at the agenda and join any talk that looks interesting to you. The best thing about the Summit taking place online is that you can freely switch between the talks without having to rush back and forth between the halls.
The parties at the Summit are amazing. I was at the Spark + AI Summit Europe in Amsterdam last year, and there was something for everyone: DJ, free drinks, retro arcade games, ping pong… This year, the organizers are promising two parties with DJ and games. It will start at 1 am for Central Europeans, so stock up on coffee and energy drinks 😉 .
Conferences would not be very fun without the people. You will be able to chat with others on the Summit’s platform. I am very interested to see how the organizers prepared for this since, in my opinion, it’s very hard to recreate an in-person experience and mingling. I will be happy to take notes about it, and maybe use some ideas to improve the webinars we’ve been organizing.
If this will be your first Spark + AI Summit, I wish you a lot of fun. If you are a returning participant, you already know how fun it will be. And if you are reading this blog post a few months later, you can check out all the talks online. Have a very sparkly adventure!
If you haven’t registered yet, you can do it here https://databricks.com/sparkaisummit/north-america-2020