Skip to content

PyGitGraph

I built PyGitGraph, an open-source tool that extracts full details of thousands of GitHub Issues using Python and GraphQL — and exports them to CSV or JSON for analysis.

GitHub Issues

GitHub Issues let you track tasks, manage code defects, plan projects, and more. Here are some examples of how they help manage complex work.

Features of Github Issues

Track your tasks in a Kanban board.

Track your tasks in a tabular view.

Visualize your project roadmap in a Gantt chart.

Precious Information

Over time, a lot of critical information about your project or product is captured within GitHub issues.

What if you could extract all this information and put it neatly in a CSV or JSON?

Such an extract would let you analyze:

  • the most frequently logged issues,
  • the average duration for issue closure,
  • sentiment analysis for comments inside issues, etc.

With that data, you can build graphs like these:

Graphs of Github Issue details

A heatmap showing the number of issues opened each day.

A Line chart showing the number of issues created and closed in January 2023.

A Bar chart showing the number of issues logged for each label.

A bar chart displaying the average time taken to close issues for each label.

A word cloud highlighting words from the titles of GitHub Issues.

Extraction Challenges

Building graphs or training machine learning models requires a large amount of data. However, extracting substantial amounts of data about GitHub Issues is not easy.

GitHub provides two options for data extraction — REST API and GraphQL API.

GraphQL is more advantageous than the REST API for the following reasons:

  • Precise Data Retrieval — With GraphQL, you receive exactly the data you request, mirroring your query schema. In contrast, the REST API often returns more data than needed in a fixed format that cannot be changed.
  • Efficiency of Retrieval — GraphQL allows you to obtain all required data in a single request, whereas the REST API needs multiple requests for the same data. This means GraphQL needs fewer server trips, making your application faster and more efficient.

There are additional benefits of using GraphQL over REST API, detailed by GitHub here.

The GraphQL API provides an efficient way to extract information from GitHub. However, tools integrating with the GitHub GraphQL API are scarce.

That gap inspired me to build a suite of Jupyter notebooks designed to help you extract, create, close, and delete GitHub issues using GraphQL.

Features of PyGitGraph

  • Manage Github Issues

    • Use Jupyter notebooks
    • Extract full details of GitHub issues
    • Export to CSV or JSON
    • Filter issues with pandas
    • Create issues from jupyter notebooks
    • Include labels, body, assignee, etc.
    • Full markdown support (lists, checkboxes, etc.)
    • Work with private/public repositories
    • Close issues with a click of a button
    • Delete issues easily
    • Get confirmation of closure/deletion
    • Act on multiple issues simultaneously
  • Analyze historical trends

    • Download 1000s Github issues
    • Paginate 100 records at a time
    • Flatten nested structures (json to csv)
    • Create historical data extracts
    • Build visualizations using bulk extracts
    • Visualize trends, build heatmaps
    • Issue closure rates, issue duration, etc.
    • Use tools of your choice
  • Easy, but Secure

  • Many use cases

    • Issue trend analysis and forecasting
    • Contributor activity and performance metrics
    • Label-based issue categorization and prioritization
    • Sentiment analysis with NLP models
    • Build forecasting models
    • Most issues logged when
    • Issue closure rates
    • Duration of issues open

Get Involved

The entire code is MIT licensed. Find it here.

Connect with me on LinkedIn or drop me an email at hello@kunal-pathak.com.