One of the coolest projects I work on at IBM is called the Data Scientist Workbench. It’s a free all-in-one solution for people interested in performing data analysis.
Naturally, we created it for people who are interested in doing Data Science, but the open source tools provided can be used for any statistical application (and to a lesser degree for other mathematical purposes as well).
The cloud based collection of tools can be accessed absolutely for free (no catch) directly from your browser. You simply sign up for it and after a short setup period (usually less than an hour), you’ll receive a fully configured environment to work in.
Behind the scene you are operating a decently spec’d virtual instance (e.g., 16 GB RAM / 100 GB disk), so performance tends to be pretty good and often better than that of many laptops. And in the case of DSWB, you don’t need to install anything.
The Data Scientist Workbench includes:
- OpenRefine to clean up messy data.
- Jupyter notebooks supporting Python, R, and Scala (with access to Apache Spark for Big Data processing).
- Apache Zeppelin notebooks.
- RStudio in your browser.
I suspect R (and to a minor extent Python) will be the most relevant to my audience here. In particular, I recommend RStudio to perform statistical analysis, and Jupyter/iPython notebooks for those who have more of a programming background or who need to add in a machine learning component.
Watch the tour below, and consider signing up:
If a library you need is not included, you can generally install it yourself (e.g., !pip install
I also like data analysis and take as my hobby.thanks.
You lost me at R…
R is such a hodge-podge, an uncoordinated mess.
One package may be well-done, the next not so much…
Minitab is pricey, but it’s organized, coordinated with consistent syntax etc. It’s a classic case of you get what you pay for.
Surely your bosses at IBM would agree, especially when it comes to selling their stuff LOL.