
Introduction to API's
=======================

Prepared by [Stephanie L. DeMora](http://stephaniedemora.com) for the [Graduate Quantitative Methods Center](https://gradquant.ucr.edu/) at UCR!

In this notebook we take a look at the Twitter API and the New York Times API! Sadly, both require access keys.

### Objectives

-   Become comfortable using Jupyter Notebooks
-   Settle in with some basic Python code
-   Understand the purpose of API's
-   Understand and become familiar with "access keys"
-   Gain familiarity with Twitter, data scraping, etc
-   Explore data collection possiblities...

### Basic Installation

We will utilize a common set of modules throughout this workshop:

-   pandas
-   tweepy
-   nytimesarticle

However, we don't have these installed yet, and we need to install several prerequisite modules as well. While we might typically call "pip install tweepy" from the command line, Jupyter notebooks works a little differently. In order to install these modules, we can call pip to get them through the shell by running the following cell:

In [None]:
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install pandas
!{sys.executable} -m pip install pysocks
!{sys.executable} -m pip install tweepy
!{sys.executable} -m pip install nytimesarticle

### Basic Imports

Now that we have these modules installed, we have to load them. If you're an R user, translate this as running "library(tweepy)" after "install.packages("tweepy"). We will utilize a common set of imports throughout this workshop and call them by running the following cell:

In [None]:
import pandas as pd
import tweepy

### Authorization!?

Before we can continue, we have to get a series of access keys from twitter. This allows Twitter to monitor our behavior and ensure that we're abiding by their rules. So, let's head over to [Twitter Developers' Page](https://developer.twitter.com)!

Once we have this taken care of, go ahead and fill in anything below with your own API Key, Secret Key, Access Token, and Token Secret.

In [None]:
auth = tweepy.OAuthHandler('API_KEY', 'API_SECRET_KEY')
auth.set_access_token('ACCESS_TOKEN', 'ACCESS_TOKEN_SECRET')

api = tweepy.API(auth)

### We're ready to go now!

So now let's take a look at someone's timeline-- how about Steph's!? Or maybe your own!

In [None]:
steph_tweets = api.user_timeline('SLDeMora')

In [None]:
steph_tweets

### Extracting the important stuff...

That data is a bit wild... Let's take a closer look at the tweet text on Steph's timeline:

In [None]:
for tweet in steph_tweets:
    print(tweet.text)

Or pehaps we really want to know who's tweeting on Steph's timeline...? (No suprises here... Just Steph):

In [None]:
for tweet in steph_tweets:
    print(tweet.user.name)

Maybe we need to extract data on WHEN Steph is posting to twitter!

In [None]:
for tweet in steph_tweets:
    print(tweet.created_at)

In [None]:
for tweet in steph_tweets:
    print(tweet.favorite_count)

### Using python and the API to tweet to your account?

I have the following commented out. To run it, remove the "#" at the beginning of the line. :-)

In [None]:
#api.update_status(status = "Hey, I'm taking an awesome API's workshop from @SLDeMora at GradQuant!!! This tweet was sent with Python!")

### Can we extract data about Steph other than what she posts?

We sure can! Let's use the "followers" call to check out Steph's followers: 

In [None]:
stephs_fans = api.followers('SLDeMora')
for fan in stephs_fans:
    print(fan.name)

In [None]:
for fan in stephs_fans:
    print(fan.screen_name)

In [None]:
for fan in stephs_fans:
    print(fan.description)

In [None]:
for fan in stephs_fans:
    print(fan.name, fan.description)

In [None]:
#API.search(q[, lang][, locale][, rpp][, page][, since_id][, geocode][, show_user])
results = api.search('UCR Students')

In [None]:
for tweet in results:
    print(tweet.text)

In [None]:
results = api.search(q="GradQuant", 
                      since="2014-02-14",
                      until="2019-02-11",
                      lang="en")

In [None]:
for tweet in results:
    print(tweet.text)

### ...but Steph, all of this is such a mess... Can't I just export this or open it with pandas?

Yeah, of course! I'm glad you asked. Let's go ahead and transform this data. It comes to us in json format, but if you don't like that, we can turn it into a pandas dataframe fairly easily.

In [None]:
tweets_df = pd.DataFrame(vars(results[i]) for i in range(len(results)))

In [None]:
tweets_df.columns

In [None]:
tweets_df.lang

In [None]:
tweets_df.head()

### Don't forget to save your data!!

In [None]:
# define file path (string) to save csv file to
FILE_PATH = 'results.csv'

### We don't really need all of that information... Let's slice the data up.
We can choose the specific attributes that we want to keep like this:

In [None]:
# define attributes you want
tweet_atts = [
'text', 'created_at', 'favorite_count'
#, 'lang', 'retweet_count', 'source',
#'in_reply_to_user_id_str', 'retweeted',
#'id'
]

In [None]:
# subset dataframe
tweets_df = tweets_df[tweet_atts]

In [None]:
tweets_df.head()

In [None]:
# save resulting df to csv
tweets_df.to_csv(FILE_PATH)

# Fun with NYT

### Friendly reminder to...

-   Import the previously installed modules like this.
-   Go to [New York Times Developers' Page](https://developer.nytimes.com/)! Set up an account get your access keys.

In [None]:
from nytimesarticle import articleAPI

Replace the "XXXX" with your access code:

In [None]:
api = articleAPI('XXXX')

In [None]:
articles = api.search( q = 'Obama')

In [None]:
articles.keys()

In [None]:
df = pd.DataFrame(articles['response']['docs'])

In [None]:
df.head()

In [None]:
df.to_csv("NYT.csv")