Get a head start on your coding projects with our Python Code Generator. Perfect for those times when you need a quick solution. Don't wait, try it today!
GitHub is a Git repository hosting service that adds many of its own features, such as a web-based graphical interface to manage repositories, access control, and several other features, such as wikis, organizations, gists, and more.
As you may already know, there is a ton of data to be grabbed. In addition to using GitHub API v3 in Python, you might also be interested in learning how to use the Google Drive API in Python to automate tasks related to Google Drive. Or perhaps you need to use the Gmail API in Python to automate tasks related to your Gmail account.
In this tutorial, you will learn how you can use GitHub API v3 in Python using both requests or PyGithub libraries.
Table of content:
To get started, let's install the dependencies:
$ pip3 install PyGithub requests
Related: How to Extract YouTube Data using YouTube API in Python.
Since it's pretty straightforward to use Github API v3, you can make a simple GET
request to a specific URL and retrieve the results:
import requests
from pprint import pprint
# github username
username = "x4nth055"
# url to request
url = f"https://api.github.com/users/{username}"
# make the request and return the json
user_data = requests.get(url).json()
# pretty print JSON data
pprint(user_data)
Here I used my account; here is a part of the returned JSON (you can see it in the browser as well):
{'avatar_url': 'https://avatars3.githubusercontent.com/u/37851086?v=4',
'bio': None,
'blog': 'https://www.thepythoncode.com',
'company': None,
'created_at': '2018-03-27T21:49:04Z',
'email': None,
'events_url': 'https://api.github.com/users/x4nth055/events{/privacy}',
'followers': 93,
'followers_url': 'https://api.github.com/users/x4nth055/followers',
'following': 41,
'following_url': 'https://api.github.com/users/x4nth055/following{/other_user}',
'gists_url': 'https://api.github.com/users/x4nth055/gists{/gist_id}',
'gravatar_id': '',
'hireable': True,
'html_url': 'https://github.com/x4nth055',
'id': 37851086,
'login': 'x4nth055',
'name': 'Rockikz',
<..SNIPPED..>
A lot of data, that's why using the requests library alone won't be handy to extract this ton of data manually. As a result, PyGithub comes to the rescue.
Related: Webhooks in Python with Flask.
Let's get all the public repositories of that user using the PyGithub library we just installed:
import base64
from github import Github
from pprint import pprint
# Github username
username = "x4nth055"
# pygithub object
g = Github()
# get that user by username
user = g.get_user(username)
for repo in user.get_repos():
print(repo)
Here is my output:
Repository(full_name="x4nth055/aind2-rnn")
Repository(full_name="x4nth055/awesome-algeria")
Repository(full_name="x4nth055/emotion-recognition-using-speech")
Repository(full_name="x4nth055/emotion-recognition-using-text")
Repository(full_name="x4nth055/food-reviews-sentiment-analysis")
Repository(full_name="x4nth055/hrk")
Repository(full_name="x4nth055/lp_simplex")
Repository(full_name="x4nth055/price-prediction")
Repository(full_name="x4nth055/product_recommendation")
Repository(full_name="x4nth055/pythoncode-tutorials")
Repository(full_name="x4nth055/sentiment_analysis_naive_bayes")
Alright, so I made a simple function to extract some useful information from this Repository object:
def print_repo(repo):
# repository full name
print("Full name:", repo.full_name)
# repository description
print("Description:", repo.description)
# the date of when the repo was created
print("Date created:", repo.created_at)
# the date of the last git push
print("Date of last push:", repo.pushed_at)
# home website (if available)
print("Home Page:", repo.homepage)
# programming language
print("Language:", repo.language)
# number of forks
print("Number of forks:", repo.forks)
# number of stars
print("Number of stars:", repo.stargazers_count)
print("-"*50)
# repository content (files & directories)
print("Contents:")
for content in repo.get_contents(""):
print(content)
try:
# repo license
print("License:", base64.b64decode(repo.get_license().content.encode()).decode())
except:
pass
Repository object has a lot of other fields. I suggest you use dir(repo)
to get the fields you want to print. Let's iterate over repositories again and use the function we just wrote:
# iterate over all public repositories
for repo in user.get_repos():
print_repo(repo)
print("="*100)
This will print some information about each public repository of this user:
====================================================================================================
Full name: x4nth055/pythoncode-tutorials
Description: The Python Code Tutorials
Date created: 2019-07-29 12:35:40
Date of last push: 2020-04-02 15:12:38
Home Page: https://www.thepythoncode.com
Language: Python
Number of forks: 154
Number of stars: 150
--------------------------------------------------
Contents:
ContentFile(path="LICENSE")
ContentFile(path="README.md")
ContentFile(path="ethical-hacking")
ContentFile(path="general")
ContentFile(path="images")
ContentFile(path="machine-learning")
ContentFile(path="python-standard-library")
ContentFile(path="scapy")
ContentFile(path="web-scraping")
License: MIT License
<..SNIPPED..>
I've truncated the whole output, as it will return all repositories and their information; you can see we used repo.get_contents("") method to retrieve all the files and folders of that repository, PyGithub parses it into a ContentFile object, use dir(content)
to see other useful fields.
Also, if you have private repositories, you can access them by authenticating your account (using the correct credentials) using PyGithub as follows:
username = "username"
password = "password"
# authenticate to github
g = Github(username, password)
# get the authenticated user
user = g.get_user()
for repo in user.get_repos():
print_repo(repo)
It is also suggested by GitHub to use the authenticated requests, as it will raise a RateLimitExceededException if you use the public one (without authentication) and exceed a small number of requests.
You can also download any file from any repository you want. To do that, I'm editing the print_repo()
function to search for Python files in a given repository. If found, we make the appropriate file name and write the content of it using content.decoded_content
attribute. Here's the edited version of the print_repo()
function:
# make a directory to save the Python files
if not os.path.exists("python-files"):
os.mkdir("python-files")
def print_repo(repo):
# repository full name
print("Full name:", repo.full_name)
# repository description
print("Description:", repo.description)
# the date of when the repo was created
print("Date created:", repo.created_at)
# the date of the last git push
print("Date of last push:", repo.pushed_at)
# home website (if available)
print("Home Page:", repo.homepage)
# programming language
print("Language:", repo.language)
# number of forks
print("Number of forks:", repo.forks)
# number of stars
print("Number of stars:", repo.stargazers_count)
print("-"*50)
# repository content (files & directories)
print("Contents:")
try:
for content in repo.get_contents(""):
# check if it's a Python file
if content.path.endswith(".py"):
# save the file
filename = os.path.join("python-files", f"{repo.full_name.replace('/', '-')}-{content.path}")
with open(filename, "wb") as f:
f.write(content.decoded_content)
print(content)
# repo license
print("License:", base64.b64decode(repo.get_license().content.encode()).decode())
except Exception as e:
print("Error:", e)
After you run the code again (you can get the complete code of the entire tutorial here), you'll notice a folder named python-files
created that contain Python files from different repositories of that user:
Learn also: How to Make a URL Shortener in Python.
The GitHub API is quite rich; you can search for repositories by a specific query just like you do on the website:
# search repositories by name
for repo in g.search_repositories("pythoncode tutorials"):
# print repository details
print_repo(repo)
This will return 9 repositories and their information.
You can also search by programming language or topic:
# search by programming language
for i, repo in enumerate(g.search_repositories("language:python")):
print_repo(repo)
print("="*100)
if i == 9:
break
To search for a particular topic, you simply put something like "topic:machine-learning"
in search_repositories()
method.
Read also: How to Extract Wikipedia Data in Python.
If you're using the authenticated version, you can also create, update and delete files very easily using the API:
# searching for my repository
repo = g.search_repositories("pythoncode tutorials")[0]
# create a file and commit n push
repo.create_file("test.txt", "commit message", "content of the file")
# delete that created file
contents = repo.get_contents("test.txt")
repo.delete_file(contents.path, "remove test.txt", contents.sha)
The above code is a simple use case; I searched for a particular repository, I've added a new file and called it test.txt
, I put some content in it and made a commit. After that, I grabbed the content of that new file and deleted it (and it'll count as a git commit
as well).
And sure enough, after the execution of the above lines of code, the commits were created and pushed:
We have just scratched the surface of the GitHub API, there are a lot of other functions and methods you can use, and obviously, we can't cover all of them. Here are some useful ones you can test on your own:
There are a lot more; please use dir(g)
to get other methods. Check PyGithub documentation or the GitHub API for detailed information.
Learn also: How to Use Google Custom Search Engine API in Python.
Happy Coding ♥
Liked what you read? You'll love what you can learn from our AI-powered Code Explainer. Check it out!
View Full Code Generate Python Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!