10 Tips for Working With Data in Datalore
Greetings from the Datalore team! In this blog post we’ll show you 10 tricks you can use to help you work more productively with data files in Datalore.
Before we start
In Datalore, files are persistently attached to notebooks. After you create a notebook and upload some data, you will have access to the data even after you restart the kernel or close and reopen Datalore.
Tip №1: Drag and drop files and folders
Quickly add new files and folders in Datalore by opening the Attached files sidebar tab and dragging and dropping your files and folders there.
Tip №2: Unzip files using the GUI
Uploading files and/or folders in compressed archives is faster because the file size is reduced. After uploading the archive, click the “Unpack” button and it will create a folder and extract the files. Datalore supports
.tar.gz file extensions.
Tip №3: Sharing notebooks with attached data
When you share a notebook with collaborators, the datasets are shared automatically. You don’t need to give any extra access rights to your colleagues.
Tip №4: Moving and cloning notebooks
If you want to organize your work, you can move, copy, or clone notebooks to different folders and workspaces. The data is copied automatically, so you won’t need to re-upload anything. You can move, copy, and clone the notebook in the file system.
Tip №5: Workspace files
Workspace files help you work with the same data files across several different notebooks without having to upload the files multiple times.
To get started with your workspace, follow these steps:
- Upload files to the whole workspace in the Workspace files tab in the File system.
- Go to the Workspace tab inside the Attached files sidebar menu and click Attach workspace files.
- Don’t forget to follow the prompt to restart the kernel.
- Access files in the notebook code from the
/data/workspace_files/directory or select the file you need and click Copy file path to clipboard.
Tip №6: Extending your cloud storage
In Datalore you can mount S3 buckets to extend internal cloud storage. There’s a detailed guide about how to do it in this blog post.
Tip №7: Upload files by URL
If you have a direct link to a file hosted publicly on the internet, you can upload it to a notebook using its URL. Make sure you use the direct link to the data file and not to an .html page. Open Attached files and сlick the dropdown on the “Upload” button and choose the “Upload by link” option.
Tip №8: Create files inside Datalore
In Datalore you can create files by clicking “New file” in the upper left corner of the Attached files sidebar. This lets you quickly create files and paste content into them.
Tip №9: Preview and edit files inside Datalore
You can preview and edit small text files (less than 100 KB) right inside the editor. Double click the file and the contents will open in a separate editor tab.
Tip №10: Download files using the urllib library
You can easily download files from a specified URL. The urllib library is already pre-installed in Datalore so you can import it and execute code like this:
When you want to publish a notebook and allow others to edit a copy of it, we recommend that you download the dataset inside the code. This will help others to reproduce your calculations in their own notebook copies.
We hope these tips for working with data were helpful and gave you some ideas to speed up your current workflow. Let us know in the comments if you have any other tips for working with data in Datalore!
The Datalore team
Subscribe to Blog updates
Thanks, we've got you!
Extra Datalore Credits for Flexible CPU and GPU Computing Power
Different analytical tasks have different computing requirements. Depending on the complexity of your project, the computing power you need can vary significantly from month to month. That's why we're introducing the option to purchase extra Datalore credits. Read more.
How to Read From MySQL Database to pandas With Datalore
Reading data from a MySQL database to a pandas dataframe can be intimidating. Establishing a connection, keeping the credentials safe, creating an SQL query within a string variable, and saving the result to pandas is not a trivial task.
New in Datalore 2023: Run API and a Number of Performance Improvements
Since January 2023, we have rolled out a number of new features for our Community and Professional users. The list includes the long-awaited Datalore Run API, a more robust reactive kernel, a whole array of Powerful CPUs and GPUs, Datalore credits for Professional users, and a number of performance improvements.
Can data science collaboration be secure? [New Datalore Case Study]
Working with financial data is not a trivial task, as you can’t just access a production database or a data lake, download the data, and work on it. You have to ensure secure access to the data and produce insights that are easy to share as well.