Formatting (VIOLIN.formatting)

This page details the formatting functions of VIOLIN, used during model and reading input.

The formatting step is important, as it:

  • identifies duplicate interactions in the reading output,

  • counts the number of times an interaction was found in the reading (Evidence Score),

  • converts the variable representation of the model regulators into the common names

The formatting functions are also responsible for inputting models and machine reading output which are not in the BioRECIPES or REACH format (respectively).

Functions

VIOLIN.formatting.evidence_score(reading_df, col_names)[source]

This function merges duplicate interactions and calculates evidence score of each LEE

Parameters
  • reading_df (pd.DataFrame) – The dataframe of the machine reading output

  • col_names (list) – Specifically the column headings used to determine if interactions are identical

Returns

counted_reading – A new dataframe with the evidence count and PMCID list for each interaction

Return type

pd.DataFrame

VIOLIN.formatting.add_regulator_names_id(model_df)[source]

This function converts the model regulator lists from BioRECIPE variables to the common element names and database identifiers

Parameters

model_df (pd.DataFrame) – The model dataframe (in BioRECIPE format)

Returns

model_df – A new dataframe with added columns containing the positive and negative regulators listed by their Element Names and IDs

Return type

pd.DataFrame

VIOLIN.formatting.convert_to_biorecipes(model, att_list=[], separate=True)[source]

This function imports a model which is NOT in the BioRECIPES format, such as models formatted as node-edge lists. Regulators may be represented in the REACH formatt, separated by regulator sign, or unseparated, with a speicifed column for regulator sign

Parameters
  • model (str) – Directory and filename of the file containing the model BioRECIPES spreadsheet Accepted files: .txt, .csv, .tsv, .xlsx

  • model_cols (list) – Column names of the model file. Default names are found in required_model

  • att_list (list) – List of Element attributes (in addition to Name, ID, and Type) Default is no additional attributes

  • separate (Boolean) – Whether or not the model presents regulator in separate Positive/Negative columns (True) or in a single column with Regulator Sign attribute (False) Default is True

Returns

new_model – Formatted model dataframe

Return type

pd.DataFrame

VIOLIN.formatting.convert_reading(reading, action, atts=[])[source]

This function formats the machine reading output, either separating regulator names and attributes into ‘positive’ and ‘negative’ columns to match REACH formatting, or combining regulator names and attributes without regulator sign distinction, and adding a ‘regulator sign’ column. This function can take the machine reading as either a filename or as an already uploaded dataframe.

Parameters
  • reading (str or pd.DataFrame) – Machine reading output, either as file location string or dataframe

  • action (str) – Action to be performed by function Accepts only ‘combine’ or ‘separate’ as input

  • atts (list) – List of attributes associated with each regualtor Default list is [‘Type’,’ID’] List should not include regulator signs (where applicable)

Returns

reading_df – A dataframe with the specified formatting completed

Return type

pd.DataFrame

Dependencies

Python: pandas and NumPy libraries, as well as the os.path module

VIOLIN: none

Usage

This module is used in during file input in the input/output module. For an example of using the convert functions, see Tutorial 4: Alternative Input.