Interview: Dan Tofan for this week’s data science webinar

Posted on by Paul Everitt

In the past few years, Python has made a big push into data science and PyCharm has as well. Years ago we added Jupyter Notebook integration, then 2017.3 introduced Scientific Mode for workflows that felt more like an IDE. In 2019.1 we re-invented our Jupyter support to also be more like a professional tool.

PyCharm and data science are thus a hot topic. Dan Tofan very recently published a Pluralsight course on using PyCharm for data science and we invited him for a webinar next week.

To help set the stage, below is an interview with Dan.

  • Thursday, April 25
  • 7PM GMT+3, 9AM Pacific
  • Register here
  • Aimed at new and intermediate data scientists

webinar-05-2

Let’s start with the key point: what does PyCharm bring to data scientists?

PyCharm brings a productivity boost to data scientists, by helping them explore data, debug Python code, write better Python code, and understand Python code faster. As a PyCharm user, I experienced and benefited from these productivity boosters, which I distilled into my first Pluralsight course, so that data scientists can make the most out of PyCharm in their activities.

For the webinar: who is it for and what can people expect you to cover?

If you are a data scientist who dabbled with PyCharm, then this webinar is for you. I will cover PyCharm’s most relevant features to data science: the scientific mode and the completely rewritten Jupyter support. I will show how these features interplay with other PyCharm features, such as refactoring code from Jupyter cells. I will use easy-to-understand code examples with popular data science libraries.

Now, back to the start: tell us a little about yourself.

Currently, I am a senior backend developer for Dimensions – a research data platform that uses data science, and links data on a total of over 140 million publications, grants, patents and clinical trials. I’ve always been curious, which led me to do my PhD studies at the University of Groningen (Netherlands) and learn more about statistics and data analysis.

Do Python data scientists feel like programmers first and data scientists second, or the reverse?

In my opinion, data science is a melting pot of skills from three complementing backgrounds: programmers, statisticians and business analysts. At the start of your data science journey, you are going to rely on the skills from your main background, and – as your skills expand – you are going to feel more and more like a data scientist.

Your course has a bunch of sections on software development practices and IDE tips. How important are these practices to “professional” data science?

As part of the melting pot, programmers bring a lot of value with their experiences ranging from software development practices to IDE tips. Data scientists from a programming background are already familiar with most of these, and those from other backgrounds benefit immensely.

Think of a code base that starts to grow: how do you write better code? How do you refactor the code? How can a new team member understand that code faster? These are some of the questions that my course helps with.

The course also covers three major facilities in PyCharm Professional: Scientific Mode, Jupyter support, and the Database tool. How do these fit in?

All of them are data centric, so they are very relevant to data scientists. These facilities are integrated nicely with other PyCharm capabilities such as debugging and refactoring. Overall, after watching the course and getting familiar with these capabilities, data scientists get a nice productivity boost.

This webinar is good timing. You just released the course and we just re-invented our Jupyter support. What do you think of the new, IDE-centric Jupyter integration?

I think the new Jupyter integration is an excellent step in the right direction, because you can use both Jupyter and PyCharm features such as debugging and code completion. Joel Grus gave an insightful and entertaining talk about Jupyter limitations at JupyterCon 2018. I think the new Jupyter integration in PyCharm can eventually help solve some Jupyter pain points raised by Joel, such as hidden state.

What’s one big problem or pain point in Jupyter that could benefit from new ideas or tooling?

Reproducibility is problematic with Jupyter and it is important for data science. For example, it’s easy to share a notebook on GitHub, then someone else tries to run it and gets different results. Perhaps the solution is a mix of discipline and better tools.

Comments below can no longer be edited.

23 Responses to Interview: Dan Tofan for this week’s data science webinar

  1. Eswar Vandanapu says:

    August 7, 2013

    Is there a way to see the UML diagram from multiple python source files. We have a hierarchy of classes spread in a package. How can I see the diagram for entire package?

    • Dmitry Filippov says:

      August 12, 2013

      Unfortunately no. However, we have the ticket http://youtrack.jetbrains.com/issue/PY-10517 . Vote!

      • Eswar Vandanapu says:

        August 20, 2013

        Seems it is fixed in the latest build. This is awesome. Thanks for listening.

  2. UML tutorial says:

    August 9, 2013

    Is there a similar tool which generated class diagrams for java code?

  3. Creately says:

    February 16, 2015

    Can this re-factor without a problem?

    • Dmitry Filippov says:

      February 17, 2015

      You can safely do refactoring from a diagram.

  4. Jibay says:

    May 25, 2015

    Hello,

    For doing a uml Diagram we need professional version?

    Regards

    • Dmitry Filippov says:

      May 25, 2015

      yes. This functionality is supported only in the professional edition.

      • gbosetti says:

        October 3, 2019

        Hello Dmitry, I have the professional version thanks to the University, but I can’t see the diagrams. Does the students-teacher version have this feature? Thanks!

  5. Akshay Akin says:

    June 4, 2015

    Have seen many UML diagram before, however, the way to put it all together and characterize them is very important. Very nice way to represent and identify them. Thank You

  6. Pierre-Henry Moreau - Functional Analyst says:

    August 2, 2015

    Thank you very much for this useful tool ! It takes some time to draw UML diagrams from the code.
    I’m going to test it, thanks a lot !

  7. di says:

    April 13, 2017

    Hi,
    I just dipped my toes into PyCharm’s UML diagram generation. A couple of things that I couldn’t get around:
    1. Right Click on diagram -> Add Class is always grayed out. What is needed to activate that?
    2. Is there a way to hide/make invisible classes that show on diagrams?
    3. Related to 2. With python 3.4, diagrams always show the “object” and “collections.Hashable” classes, which a part of python 3.4 and not a project per se. Is there a way to hide those “system” classes?

    I use Professional PyCharm 2017.1.1 on Linux with python 3.4
    Many thanks!

  8. Jitesh Pathak says:

    August 10, 2017

    I have tested this . Excellent tool .

    Thanks a lot .

  9. Mandy says:

    October 13, 2017

    how i can integrate uml diagram whit Visual paradigm or import/open in other toll.

  10. Yoni says:

    November 7, 2017

    Not working well !!!
    I have Pycharm Pro 2017.2.4 and when running diagram i dont see all class names, and not all “extends” arrows are shown

    • Paul Everitt says:

      January 15, 2018

      Sorry for the delay in replying, we didn’t spot this in the moderation queue. Is this still an issue with the latest PyCharm?

  11. Warren Lynch says:

    January 4, 2018

    Try code engineering feature by reversing the code to Class Diagram could be helpful

    How to Reverse Engineer UML from Python?

    https://www.visual-paradigm.com/support/documents/vpuserguide/276/277/27943_reverseengin.html

    All other major programming lanaguages for code reverse engineering are supported:

    https://www.visual-paradigm.com/support/documents/vpuserguide/276/277_instantrever.html

  12. Rude says:

    September 29, 2018

    Is it work on community version?

  13. Kaiser Chavez says:

    May 26, 2019

    how i can integrate uml diagram whit Visual paradigm or import/open in other toll. please help

  14. Murray W. Greer C. says:

    March 11, 2020

    At some point it will be possible to create diagrams from scratch and based on those diagrams create the structure of our projects?

Subscribe

Subscribe for updates