After benefiting for years from great resources on math, programming, data science, etc., I thought it may be a good idea to start trying to return the favor. One void I intend to contribute to is at the intersection of Data Science and DevOps.
“Data Science” is a bit of a catch-all term for all kinds of analytics, machine learning, and artificial intelligence techniques. The field includes Netflix’s movie recommendations, Amazon’s “You may also want to buy…”, Google’s autonomous vehicles, Philip’s detection of decompensating patients (e.g. CareSage), and on and on. There’s no shortage of great sites exploring the latest data science trends (many linked from the right panel, and more here).
“DevOps”, a compound term of Development and Operations, is a practice of unifying the various IT cultures around product development and deployment, streamlining practices and raising quality standards, and similarly has no shortage of content online. For decades, the people creating applications (developers), the people checking for quality (testers, quality assurance), and the people responsible for deploying, scaling, and maintaining (operations) were separate groups of people, throwing or receiving code to each other. This worked fine when software was shipped in boxes every few years, but the paradigm broke down as industry moved to faster and faster release cycles - many companies now ship new code multiple times per day! To make this possible the groups had to integrate, creating a panoply of tools and techniques to streamline systems and shorten the path from innovation to end user.
How do these worlds relate to one another? One way is in reproducible data science.
The foundations of science are experimental hypothesis testing and reproducibility.
Libraries and tools can play a role in the results that come out of an analysis, confounding thorough experiments and sometimes making reproducibility impossible. Open source toolsets (e.g. Python, R) have helped to create a free ecosystem of analytics environments, but libraries of modules can still impact results, and it can be tricky to truly recreate another scientist’s environment to reproduce results. Borrowing techniques from DevOps, such as version control, unit testing, containerization, and others, can help to solve these issues. For data scientists looking to deploy or scale analytics, even more can be borrowed from the DevOps side, including at data ingestion and processing, high availability, etc.
Hope you enjoy the explorations, and looking forward to comments!
About the author
Eric has a background in Electrical Engineering and Computer Science, starting from his Bachelor’s Degree from the University of Michigan, where he was electronics team leader for the U of M Solar Car Team. He attended Johns Hopkins University for graduate school, and received a PhD in Biomedical Engineering in 2011. His thesis detailed the processing that occurs in the primate visual cortex, and demonstrated that an artificial neural network optimizing for sparseness results tuning properties similar to those he observed in area V4 of the anterior visual system. Eric’s post doctoral work in retinal prosthetics at Weill Cornell Medical School was featured in a Ted Talk in 2012.
From 2013 through 2017 Eric worked at Philips Research, where he created models to predicting patient deterioration across care settings, worked with care providers to integrate them into clinical workflow, and the developed systems to collect, clean, and process patient data and score models in real time. While at Philips, Eric had the pleasure of collaborating with a stellar group of colleagues: physicians, nurses, professors, data scientists, programmers, and designers from around the globe. Much of the work is now embedded in Philips products today, and resulted in many patents and publications.
From 2017 onwards, Eric has worked at Goldman Sachs in New York, New York, as Vice President of Quantitative Investment Strategies. Eric leads the data science platform team, and serves as chief architect of the group’s next-generation research platform. Eric additionally participates in investment research, primarily contributing to natural language processing and medical domain topics.