From Social Scientist To Data Scientist

Dropbox for Sharing Code

I learned a lot in my two years at INSEAD getting a master’s in Organizational Behavior. Many of the skills and knowledge I gained there will be beneficial as a data scientist, from defining a problem to carrying out a thoughtful analysis to communicating to a range of audiences. But one skill I wasn’t developing was working in an industry-level coding environment.

This hit home one day this spring. In one of my classes, we had been writing code in STATA. The professor wanted us to be able to see the code the other students had written, so, naturally, he suggested we do it through a shared dropbox folder. I don’t think he had ever heard of Github, let alone tried to use it.

This was one of the downsides of being in stand-alone business school, especially one in a small foreign town. We didn’t have a Statistics or Computer Science program, and there was no community of programmers outside the school either. A few PhD students did use Python extensively in their research, and two of them even put together a 2-day beginner Python workshop for professor and PhD students. On the first day, the student teacher showed how you could use pygeocoder to get the location of a list of universities. A professor sitting in on the class exclaimed, “I’ve spent five hours doing that before!” While learning programming at INSEAD is becoming more common and encouraged, almost all of it was still self-taught, with no best practices, formal curriculum, or expert professors available.

Starting off in R

I was fortunate to attend Rice University and take classes from and designed by Hadley Wickham, the most prominent developer of packages for R (he has even been referred to as the man who revolutionized R ). I didn’t fully appreciate the amazing opportunity I was receiving at that time, despite my brother David telling me how jealous he was that I personally knew the great Hadley Wickham. Now David has gotten to know Hadley as well through his creation of the broom package, which tidies the output of R functions like linear regression and t-tests into dataframes. Broom has now even become a core part of the what Hadley would like us to call “the tidyverse” (instead of the “Hadleyverse,” as is the current practice). But at least I can always still say I knew Hadley first.

Taking course designed by Hadley means that I’ve been programming in R for almost five years. I also took the introduction to computer science course, which used Python, but I never really kept up with it. While I still have a lot to learn in R, especially functional programming, I feel comfortable working with and growing in the “tidyverse.”

The same could not be said for Python. I took some online courses, but mostly I just continued using R. I decided what I needed was an immersive experience where I could work with Python while learning machine learning, web scraping, D3, cloud computing, and more. Coding bootcamps have proliferated over the last few years, with whole websites dedicated to helping people chose one. Metis’s data science bootcamp is a 12-week program that helps students transition into a career as a data scientist, and it seemed like the perfect environment for me.

Metis So Far

Well, it’s one week and one day in, and so far it’s been a great experience. The group of 25 students is amazingly diverse. Experience ranges from one year out of school to 15 or even 20 years in industry. Some people have mathematics or computer science degrees, a few have PhDs, while others learned to code mostly on their own. Each day we start off with 45 minutes of pair programming. We’re randomly assigned to a different person each day, so one day I can be guiding someone through accessing elements of a pandas dataframe and the other day learning the tricks of Jupyter notebook.

I’m excited to continue my journey at Metis over the next few months. I am certain I’ll never have to be worried about being bored!

Updated: