How to: Find list of github repositories that contain a string

The Github web ui search tool, along with its paging, is not ideal for getting a list of repositories containing a given string.  The following techniques can help.

The following examples show how to get a list of edx repositories containing the string "edx-drf-extensions".

Github API using Python

This script has the advantage of handling paging in addition to sorting and filtering results.  Since the script pages automatically, you may get rate limited with a warning.


  1. In a virtualenv, pip install PyGithub:

    # Also see https://pygithub.readthedocs.io/en/latest/introduction.html
    pip install PyGithub
  2. Use a simple script like the following:

#!/usr/bin/python
from github import Github

# Set this to a personal access token.
# - Select "repo" for the oauth scopes.
# See https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
g = Github('REPLACE_WITH_YOUR_ACCESS_TOKEN')

repositories = set()

# Note: Gets rate limited and fails if too many hits
content_files = g.search_code(query='org:edx edx-drf-extensions')
for content in content_files:
	repositories.add(content.repository.full_name)
	rate_limit = g.get_rate_limit()
	if rate_limit.search.remaining == 0:
		print('WARNING: Rate limit on searching was reached.  Results are incomplete.')
		break

for repo in sorted(repositories):
	print(repo)

rate_limit = g.get_rate_limit()
print('Search rate limit:')
print(rate_limit.search)

Github API from Command Line

  1. From the command-line, use the following:

    # Supply username to search private repos
    curl --user "REPLACE_WITH_GITHUB_USERNAME" https://api.github.com/search/code?q=edx-drf-extensions+org:edx
    curl --user "REPLACE_WITH_GITHUB_USERNAME" https://api.github.com/search/code?q=REPLACE_WITH_SEARCH_TERM+org:edx
    
    # Skip username to quickly search public repos
    curl https://api.github.com/search/code?q=edx-drf-extensions+org:edx
    
    # check the "last" link in the headers to see how many pages of results.
    curl -sI "https://api.github.com/search/code?q=edx-drf-extensions+org:edx" | grep 'rel="last"'
    # or, just add "&page=2", etc., and see if you get results:
    curl https://api.github.com/search/code?q=edx-drf-extensions+org:edx&page=2
  2. If you have jq installed (e.g. brew install jq), you can get a sorted/filtered list using the following:

    # Pipe results to jq to get a filtered list of repositories
    curl -s "https://api.github.com/search/code?q=edx-drf-extensions+org:edx" 2>&1 | jq "[.items[].repository.full_name] | unique"
    
    # Note: Add '--user "REPLACE_WITH_GITHUB_USERNAME"' like above to search private repos.
    # Note: Remember to get additional pages of results if there are any (see above).
  3. Or, use jq-play online to filter the output:
    1. Copy the search output into the JSON field in https://jqplay.org/
    2. Enter the following filter in jq-play:

      .items[].repository.full_name

Github API from Browser

Using the Github API from the browser is a quick and dirty approach, with some limitations. 

NOTE: This only searches public repos. You also need to remember to possible page the results.

  1. In the browser, use a url like the following: 
    1. https://api.github.com/search/code?q=edx-drf-extensions+org:edx
    2. https://api.github.com/search/code?q=REPLACE_WITH_SEARCH_TERM+org:edx
    3. NOTE: Either add '&page=2', etc., to see if there are more results or look for the last link in the header of the results.
  2. Copy the search output into the JSON field in https://jqplay.org/
  3. Enter the following filter in jq-play:

    .items[].repository.full_name