Skip to main content

Posts

Showing posts from July, 2020

From Zero to AWS EC2 for Data Science

Amazon Elastic Compute Cloud (Amazon EC2 ki) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.  (source) As data scientists, we occasionally encounter situations in which our personal laptops do not pack enough computational punch for certain tasks. Fortunately, we can rely on cloud compute resources such as Amazon Web Services (AWS) to facilitate our workflow. By tapping into cloud computational power, we can expand the limits of the data that we can process for our data science needs. In addition to EC2, AWS also offers file storage (S3) and distributed computing (EMR), among others; detailed information about relevant AWS offerings can be found on  this excellently written article , or on the  AWS front page . Here’s an agenda of the steps we’ll need to take: Set up an AWS account Set up an EC2 instance SSH into the EC2 instance Set up the EC2 instance Upload data into EC2 Open a Ju