How to: Find list of github repositories that contain a string
The Github web ui search tool, along with its paging, is not ideal for getting a list of repositories containing a given string. The following techniques can help.
The following examples show how to get a list of edx repositories containing the string "edx-drf-extensions".
Github API using Python
This script has the advantage of handling paging in addition to sorting and filtering results. Since the script pages automatically, you may get rate limited with a warning.
In a virtualenv, pip install PyGithub:
# Also see https://pygithub.readthedocs.io/en/latest/introduction.html pip install PyGithub
Use a simple script like the following:
#!/usr/bin/python from github import Github # Set this to a personal access token. # - Select "repo" for the oauth scopes. # See https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/ g = Github('REPLACE_WITH_YOUR_ACCESS_TOKEN') repositories = set() # Note: Gets rate limited and fails if too many hits content_files = g.search_code(query='org:edx edx-drf-extensions') for content in content_files: repositories.add(content.repository.full_name) rate_limit = g.get_rate_limit() if rate_limit.search.remaining == 0: print('WARNING: Rate limit on searching was reached. Results are incomplete.') break for repo in sorted(repositories): print(repo) rate_limit = g.get_rate_limit() print('Search rate limit:') print(rate_limit.search)
Github API from Command Line
From the command-line, use the following:
# Supply username to search private repos curl --user "REPLACE_WITH_GITHUB_USERNAME" https://api.github.com/search/code?q=edx-drf-extensions+org:edx curl --user "REPLACE_WITH_GITHUB_USERNAME" https://api.github.com/search/code?q=REPLACE_WITH_SEARCH_TERM+org:edx # Skip username to quickly search public repos curl https://api.github.com/search/code?q=edx-drf-extensions+org:edx # check the "last" link in the headers to see how many pages of results. curl -sI "https://api.github.com/search/code?q=edx-drf-extensions+org:edx" | grep 'rel="last"' # or, just add "&page=2", etc., and see if you get results: curl https://api.github.com/search/code?q=edx-drf-extensions+org:edx&page=2
If you have jq installed (e.g. brew install jq), you can get a sorted/filtered list using the following:
# Pipe results to jq to get a filtered list of repositories curl -s "https://api.github.com/search/code?q=edx-drf-extensions+org:edx" 2>&1 | jq "[.items[].repository.full_name] | unique" # Note: Add '--user "REPLACE_WITH_GITHUB_USERNAME"' like above to search private repos. # Note: Remember to get additional pages of results if there are any (see above).
- Or, use jq-play online to filter the output:
- Copy the search output into the JSON field in https://jqplay.org/
Enter the following filter in jq-play:
.items[].repository.full_name
Github API from Browser
Using the Github API from the browser is a quick and dirty approach, with some limitations.
NOTE: This only searches public repos. You also need to remember to possible page the results.
- In the browser, use a url like the following:
- https://api.github.com/search/code?q=edx-drf-extensions+org:edx
- https://api.github.com/search/code?q=REPLACE_WITH_SEARCH_TERM+org:edx
- NOTE: Either add '&page=2', etc., to see if there are more results or look for the last link in the header of the results.
- Copy the search output into the JSON field in https://jqplay.org/
Enter the following filter in jq-play:
.items[].repository.full_name