README.md

MetaPathPredict

The MetaPathPredict Python module utilizes deep learning models to predict the presence or absence of KEGG metabolic modules in bacterial genomes recovered from environmental sequencing efforts.

Installation

To run MetaPathPredict, download this repository and install it as a Python module (see download and installation instructions below):

XetHub install:

NOTE: Conda is required for this installation.

  1. Download the Git-Xet version control extension here: https://xethub.com/assets/docs/getting-started/installation#install-the-git-xet-extension

    1. NOTE: Git v2.29 or higher is required for the Git-Xet version control extension. Install and/or upgrade Git here, if necessary: https://git-scm.com/downloads

    2. NOTE: It is recommended to set your Git username and email address. If these are not set, in a Terminal (MacOS/Linux)/Command Prompt (Windows) run git config --global user.name <YOUR USERNAME> to set your Git username, and git config --global user.email <YOUR_EMAIL@example.com> to set your Git email address. Replace <YOUR USERNAME> and <YOUR_EMAIL@example.com> with the desired username and email address.

  2. Install the downloaded file.

  3. Open a Terminal/Command Prompt window and run the following command to download the XetHub repository to the desired location (note: change your current working directory first to the desired download location, e.g., ~/Downloads on MacOS):
    git xet clone https://xethub.com/dgellermcgrath/MetaPathPredict.git

  4. In a Terminal/Command Prompt window, run the following commands from the parent directory the MetaPathPredict repository was cloned to:

conda create -n MetaPathPredict python=3.10.6 scikit-learn=1.3.0 tensorflow=2.10.0 numpy=1.23.4 pandas=1.5.2 keras=2.10.0 git=2.40.1

NOTE: You will be prompted (y/n) to confirm creating this conda environment. Now activate it:

conda activate MetaPathPredict

Once complete, pip install MetaPathPredict:

pip install MetaPathPredict/package

Note: MetaPathPredict is now installed in the MetaPathPredict conda environment. Activate the conda environment prior to any use of MetaPathPredict.

GitHub install:

Follow the instructions from the MetaPathPredict GitHub repository.

pip install:

[not available yet]


Functions

The following functions can be implemented to run MetaPathPredict on the command line:

  • MetaPathPredict parses one or more input KEGG Ortholog gene annotation datasets (currently only bacterial genome data is supported) and predicts the presence or absence of KEGG Modules. This function takes as input the .tsv output files from the KofamScan and DRAM gene annotation tools as well as the KEGG KOALA online annotation platforms blastKOALA, ghostKOALA, and kofamKOALA. Run any of these tools first and then use one or more of their output .tsv files as input to MetaPathPredict.

    • A single file or multiple space-separated files can be specified to the --input parameter, or use a wildcard (e.g., /results/*.tsv). Include full or relative paths to the input file(s). A sample of each annotation file format that MetaPathPredict can process is included in this repository in the annotatation_examples folder. The sample annotation files in annotatation_examples can optionally be used as input to test the installation.
    • The format of the gene annotation files (kofamscan, kofamkoala, dram, or koala) that is used as input must be specified with the --annotation-format parameter. Currently, only one input type can be specified at a time.
    • The full or relative path to the desired destination for MetaPathPredict's output .tsv file must be specified, as well as a name for the file. The output file path and name can be specified using the --output parameter. By default, MetaPathPredict does not create any default output directory nor does the output have a default file name.
    • To specify a specific KEGG module or modules to reconstruct and predict, include the module identifier (e.g., M00001) or identifiers as a space-separated list to the argument --kegg-modules.
  • To view which KEGG modules MetaPathPredict can reconstruct and make predictions for, run the following on the command line: MetaPathModules.


Basic usage

# predict method for making KEGG module presence/absence predictions on input gene annotations

usage: MetaPathPredict [-h] --input INPUT [INPUT ...] --annotation-format ANNOTATION_FORMAT
                       [--kegg-modules KEGG_MODULES [KEGG_MODULES ...]] --output OUTPUT

options:
  -h, --help            show this help message and exit
  --input INPUT [INPUT ...], -i INPUT [INPUT ...]
                        input file path(s) and name(s) [required]
  --annotation-format ANNOTATION_FORMAT, -a ANNOTATION_FORMAT
                        annotation format (kofamscan, kofamkoala, dram, or koala) [default:
                        kofamscan]
  --kegg-modules KEGG_MODULES [KEGG_MODULES ...], -k KEGG_MODULES [KEGG_MODULES ...]
                        KEGG modules to predict [default: MetaPathPredict KEGG modules]
  --output OUTPUT, -o OUTPUT
                        output file path and name [required]

Examples with sample datasets

# One KofamScan gene annotation dataset
MetaPathPredict -i /path/to/kofamscan_annotations_1.tsv -a kofamscan -o /results/predictions.tsv

# Three KofamScan gene annotation datasets, with predictions for modules M00001 and M00003
MetaPathPredict \
-i kofamscan_annotations_1.tsv kofamscan_annotations_2.tsv kofamscan_annotations_3.tsv \
-a kofamscan \
-k M00001 M00003 \
-o /results/predictions.tsv

# Multiple KofamScan datasets in a directory
MetaPathPredict -i annotations/*.tsv -a kofamscan -o /results/predictions.tsv

# One DRAM gene annotation dataset
MetaPathPredict -i dram_annotation.tsv -a dram -o /results/predictions.tsv

# Multiple DRAM datasets in a directory
MetaPathPredict -i annotations/*.tsv -a dram -o /results/predictions.tsv

Understanding the output

The output of running MetaPathPredict is a table. The first column, file, displays the full file name of each input gene annotation file. The remaining columns give the class predictions (module present = 1; module absent = 0) of KEGG modules. Each KEGG module occupies a single column in the table and is labelled by its module identifier. See a sample output below of four KEGG module predictions for three input annotation files:

file M00001 M00002 M00003 M00004
/path/to/kofamscan_annotations_1.tsv 1 1 0 1
/path/to/kofamscan_annotations_2.tsv 0 1 0 0
/path/to/kofamscan_annotations_3.tsv 1 0 0 0

Developer usage

# training method for MetaPathPredict's internal models

usage: MetaPathTrain [-h] --train-targets TRAIN_TARGETS --train-features TRAIN_FEATURES
                     [--num-epochs NUM_EPOCHS] --model-out MODEL_OUT [--use-gpu]
                     [--num-cores NUM_CORES] [--num-hidden-layers NUM_HIDDEN_LAYERS]
                     [--hidden-nodes-per-layer HIDDEN_NODES_PER_LAYER]
                     [--num-features NUM_FEATURES] [--threshold THRESHOLD]

options:
  -h, --help            show this help message and exit
  --train-targets TRAIN_TARGETS
                        training targets file
  --train-features TRAIN_FEATURES
                        training features
  --num-epochs NUM_EPOCHS
                        number of epochs
  --model-out MODEL_OUT, -m MODEL_OUT
                        model file name output
  --use-gpu             use GPU if available
  --num-cores NUM_CORES
                        Number of cores for parallel processing

Neural Net parameters:
  --num-hidden-layers NUM_HIDDEN_LAYERS
                        number of hidden layers
  --hidden-nodes-per-layer HIDDEN_NODES_PER_LAYER
                        number of nodes in each hidden layer
  --num-features NUM_FEATURES
                        number of features to retain from training data
  --threshold THRESHOLD
                        threshold for SelectKBest feature selection
                        ```
File List Total items: 8
Name Last Commit Size Last Modified
annotatation_examples updated file reading functions 10 months ago
package added MetaPathPredictE 10 months ago
.gitattributes Initial commit 79 B 12 months ago
.gitignore updated .gitignore 34 B 12 months ago
README.html added kofamkoala data read-in function, updated readme 619 KiB 12 months ago
README.md updated install instructions 8.5 KiB 11 months ago
metapathpredict.log added metapathmodules functionality 662 B 11 months ago
metapathpredict.yaml added install env yaml file; updated install instructions 4.7 KiB 12 months ago

Repository Size

Loading repo size...

Commits 64 commits

File Types