Data Science JetBrains Academy News

Hour of Code: Top 5 Data Science and Machine Learning Projects

Today is the last day of the Hour of Code celebration with JetBrains Academy. To end our lineup of posts on an exciting note we prepared a special treat for our more experienced learners – we will talk about Machine Learning and Data Science. And, as always, you can find all the articles we’ve published this week listed in the Join Us for Hour of Code at JetBrains Academy post.

Machine learning and data science have become hot topics over the past decade. Streaming services with built-in recommendation algorithms, AI digital assistants, and online maps showing optimal routes to the destination – it is difficult to imagine today’s world without these technologies.

Data Science is a field that studies and analyzes data. Machine learning is a subfield of AI that explores machines’ capability to learn and adapt through statistical models and algorithms. Both Machine learning and Data Science have a lot of common topics between them: math, statistics, probability, different ways of working with data, and programming (the most widely used languages are Python, SQL, and Java).

There are quite a few DS and ML tracks to choose from at JetBrains Academy. To make your learning journey that much more exciting, check out our TOP-5 most popular Data Science and Machine Learning projects.

Generating Randomness (Medium)

It has been proven numerous times that humans are quite bad at generating random sequences. We invite you to see the evidence for yourself and create a simple program that will learn to predict “random” user actions. With this project, you will get familiar with probability theory and the math behind it, learn about geometrical probability, and see how simple statistics can predict a particular outcome. As regards programming, you will refresh Python basics and sharpen your skills with the NumPy library.

Data Analysis for Hospitals (Hard)

healthcare. In this project, you will help local hospitals sort through and analyze a database with their patients. You will upload datasets, deal with data omissions and incorrect data filling, find the main statistical characteristics, and visualize your data. All in all, you will conduct a comprehensive data study using the pandas library – from uploading data and correcting errors in the CSV files to simple data visualization.

Linear Regression from Scratch (Challenging)

Linear regression is one of the most popular machine learning algorithms. This project will teach you to implement the linear regression algorithm with Python’s classes, methods, functions, and the Numpy library. You will also learn about linear algebra and matrix operations and implement the fit, predict, and score methods. In the end, you will compare the performance of your model with that of the scikit-learn linear regression algorithm.

Classification of Handwritten Digits (Challenging)

We all know at least one person with almost ineligible handwriting. It is basically impossible to understand anything they write. Well, if you need to urgently decipher such person’s writing, this project will be of great help. Here, you are going to explore the main classification algorithms and learn how to find and train the best possible model for the classification of handwritten digits. As a result, you will get hands-on experience with the Keras dataset, train a variety of classification algorithms, and find the best one using scikit-learn tools.

Logistic Regression from Scratch (Challenging)

Logistic regression is another popular model used in data science. Like linear regression, logistic regression estimates the relationship between different variables and is commonly used for classification and predictive analysis. The main idea of this project is to implement gradient descent for two different cost functions, devise a method to predict the probability that a given sample belongs to a certain class, and analyze training errors. You will understand the math behind logistic regression and learn about two kinds of cost functions — Mean Squared Error and Log Loss.

Thank you for being with us this week! Did you find your favorite projects in our posts? If not, share them with us on social media using the hashtags #JetBrainsAcademy and #HourOfCode.

And, of course, we haven’t forgotten about our presents! If you’ve solved all seven Problems of the day this week, check your mailbox on December 15 – your 25% discount will be waiting for you there. Keep in mind that the discount code will be valid only until the end of the year.

Happy Hour of Code!
Your JetBrains Academy team