Using boto3 (Python) with R (reticulate) to push to S3

January 20, 2020 in Python, Reticulate, R

Using Python with R

The motivation for this comes from the fact that it is easier now more than ever to use Python with R via the reticulate package.

It does make a lot of sense too! I rely on packages to upload client data to S3 buckets quite a bit using R scripts and RStudio Connect for automation. Unfortunately, the aws.s3 r-package has been wonky and does not seem to be maintained.

Enter, boto3 - a Python package which lets you interface AWS S3. It is cleaner and easier to handle in terms of the set-up.

Example

You first set up your R-script like you usually would (assuming you have Python installed and set up) with the addition to the reticulate library added.

library(tidyverse)
library(reticulate)

## Do stuff and write to file

path <- "./file_name.csv"
s3_path <- "directory/file_name.csv"
  
ACCESS_KEY <- "AWS-KEY"
SECRET_KEY <- "AWS-SECRET"

Once you’re done processing data, etc. and writing results to a file, you move on to the Python section - you might have even seen the function below in some shape or form before.


# import boto3
# 
# 
# def upload_to_aws(local_file, bucket, s3_file):
#     s3 = boto3.client('s3', aws_access_key_id = r.ACCESS_KEY,
#                       aws_secret_access_key = r.SECRET_KEY)
# 
#     try:
#         s3.upload_file(local_file, bucket, s3_file, ExtraArgs={'ACL':'bucket-owner-full-control'})
#         print("Upload Successful")
#         return True
#     except FileNotFoundError:
#         print("The file was not found")
#         return False
#     except NoCredentialsError:
#         print("Credentials not available")
#         return False
# 
# uploaded = upload_to_aws(r.path, 'bucket_name', r.path_s3)

The beauty of this is that you can reference r-objects easily by prepending r., which allows you to not only feed your key/secret pair into the Python script but also the file paths for both the local file and the S3-path.

That’s it!!

Thumbnail copyright: RStudio

Author's picture

Matthias Raess, Ph.D.

Data Science/R

Data Analytics

United States

no post found

RStudio Connect - Automated Report Failsafe

Feb 2, 2020

RStudio Connect I can’t say enough good things about RStudio Connect! I use it on a daily basis for dashboards, applications, markdown docs, and automated internal and external (client) reports. Most recently, the team also uses it integrated with Python (via reticulate), which comes in handy when an R package has limitations for whatever reason. This post provides a quick and dirty failsafe that provides reports from sending when a condition is not met as RStudio Connect does not have a built-in solution natively as of yet.

numberFormattR - Format numbers and ggplot2 axis with k, M, B

Jan 1, 2020

A tiny package to format numbers I came across this ‘problem’ in a professional setting where (especially) large numbers (> 1M) had to be presented in a neat and tidy way. Tools like Looker let you use Excel formatting options and short-code, such as 0.000,,\" M\" for a number in millions with three decimals. I didn’t find a package that let me do that, other than the sitools package with some exceptions, so I wrote a very tiny package that lets you do just that; add number formatting (numbers only, and ggplot2 axis).

Using boto3 (Python) with R (reticulate) to push to S3

Jan 1, 2020

Using Python with R The motivation for this comes from the fact that it is easier now more than ever to use Python with R via the reticulate package. It does make a lot of sense too! I rely on packages to upload client data to S3 buckets quite a bit using R scripts and RStudio Connect for automation. Unfortunately, the aws.s3 r-package has been wonky and does not seem to be maintained.

The awesomeness that is the global.R file. Or how to clean up your shiny app

Jul 7, 2018

The global.R file What does it do? How do you do it? Final thought The global.R file Anybody who has ever created a shiny app or a shinydashboard has probably had the problem of the ui.R and server.R or the app.R files becoming very complex and crowded. Sure, if you’re app is very simple, you don’t have any side-bar with several tabs, etc. then you probably don’t know what I’m talking about, but bear with me, this might still be interesting further down the road on your journey to becoming a Shiny master.

Setting up an AWS EC2 instance with RStudio Server and Tensorflow GPU

Jun 6, 2018

Getting started Set up I bought Chollet and Allaire’s insanely good Deep Learning with R book and wanted to follow along with the example Neural Networks in R with Keras. However, my machine does not have a GPU that is powerful enough, let alone have CUDA capabilities 🙄. Thus I set forth, as the authors suggest, to create an AWS EC2 instance. Since there can be many tiny obstacles to prevent you from having a smooth start, I wrote this post, so you can get a timely start on your deep learning endeavors.

Bayesian Inference auf Deutsch 🇺🇸 ➔ 🇩🇪

Jan 1, 2018

Bayesian Inference im Kino Wahrscheinlichkeit Verbundwahrscheinlichkeit (joint probability) Randwahrscheinlichkeit (marginal probability) Bayes’ Theorem Wahrscheinlichkeitsverteilung (probability distributions) Bayesian Inference beim Tierarzt Bayes-Fallen vermeiden This is my translation of Brandon Rohrer’s blog post (Nov 2, 2016) and utterly awesome explanation of Bayesian Inference. https://www.youtube.com/watch?v=5NMxiOGL39M&feature=youtu.be Bayesian Inference kann genutzt werden um genauere Vorhersagen über einen Datensatz zu erhalten. Die Technik ist besonders dann nützlich, wenn man nicht so viele Daten hast, wie man gerne hätte - deshalb will man so viel wie möglich an Vorhersagegenauigkeit aus ihnen herausquetschen.