Doing Dask Powered Data Science In The Saturn Cloud

The Python Podcast.__init__ - A podcast by Tobias Macey

Categorie:

Summary A perennial problem of doing data science is that it works great on your laptop, until it doesn’t. Another problem is being able to recreate your environment to collaborate on a problem with colleagues. Saturn Cloud aims to help with both of those problems by providing an easy to use platform for creating reproducible environments that you can use to build data science workflows and scale them easily with a managed Dask service. In this episode Julia Signall, head of open source at Saturn Cloud, explains how she is working with the product team and PyData community to reduce the points of friction that data scientists encounter as they are getting their work done. Announcements Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science. When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host as usual is Tobias Macey and today I’m interviewing Julia Signell about building distributed processing workflows in Python through the power of Dask Interview Introductions How did you get introduced to Python? Can you describe what you are building at Saturn Cloud? Who are your target users and how does that inform the features and priorities that you build into your platform? What are the road blocks that data scientists typically encounter when working on their laptop/workstation? How does open source factor into the Saturn product? What are some of the projects that you are collaborating with/contributing to as part of your work at Saturn? How has your experience at Anaconda informed your work at Saturn? Can you describe how the Saturn Cloud platform is architected? How has it changed or evolved since it was first launched? Can you describe the learning curve that data scientists go through when adopting Dask? What are some examples of projects or workflows that Dask enables which are not possible/practical to do locally? How would you characterize the overall awareness/adoption of Dask in the Python data science community? What are the most interesting, innovative, or unexpected ways that you have seen Saturn Cloud used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Saturn Cloud? When is Saturn Cloud the wrong choice? What do you have planned for the future of Saturn Cloud? Keep In Touch @jsignell on Twitter jsignell on GitHub Picks Tobias Peter Rabbit 2 Julia PawPaw Fruit Closing Announcements Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at pythonpodcast.com/chat Links Saturn Cloud Dask Podcast Episode Pangeo XArray Conda Mamba Holoviz Dash Anaconda Podcast Episode Kubernetes Tornado Podcast Episode Prefect Podcast Episode Dagster Podcast Episode Airflow Ray Podcast Episode The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

Visit the podcast's native language site