Skip to main content

Summarizing data with Actions

note

XetHub Actions is in early preview with known limitations. Help us improve by filing bugs!

Actions compute is flexible and can have many applications. One XetHub feature that pairs well with Actions is our custom views feature, which allows users to render up-to-date visualizations of your data in your repository UI. Use Actions to produce one or more summary files for each commit, save the results back to your repository, then use your summary information to generate a custom view. As a result, collaborators can see saved views alongside every data and model change without any extra effort, making it easy to understand what has changed since the last commit.

  • Supported data types: any
  • Supported formats: any

How it works

Set up your Actions workflow to call a script that runs code to produce summary files that are associated with the contents of the repository at that commit. The summary files can be saved anywhere within the repository, and can be displayed by a custom view scoped to either the summary files themselves or the original data files.

With each trigger of the Action:

  • Your custom code runs server-side on the repository to generate summaries.
  • Summaries are committed to the repository alongside the original files.
  • Custom views are then displayed in relevant places for that commit.

What is a summary?

A summary is any data that can be visually useful for understanding the contents of a file. Many file types don't need to be summarized in order to be visualized: a small text file, a single image, etc. Other file types require summarization to provide a useful view, typically because they are too large to display in full: a whole book as text, or a large dataset. Examples of useful summaries could include per-column statistics for a tabular data file, a report on data quality, or model evaluation results. Summaries can be represented in any data format (text or binary).

Update and customize

To create your own summaries, duplicate our tabular-data-summary-custom-view and update the following:

  • .xethub/workflows/main.yml - This example triggers on push events to the main branch of the repo. Customize the trigger, script, or any other setup you’d like around the Actions workflow.
  • run.sh - This bash script is a wrapper around your code that generates summaries.
    • The example assumes CSV, TSV, or Parquet data files, view generating code in run.py, and dependencies in requirements.txt. Update the script with your own relevant file types, custom code files, and dependencies in any language of your choice.
  • After customizing the above and pushing your changes from your terminal, manually trigger your Action by going to the Actions tab and clicking Run.