Subsystem Manipulation

Any manipulation of subsystem data should happen through this interface. This allows us to assure ourselves that the relational tables that mirror and index the subsystem data are kept up to date with the canonical version of the subsystem information in the flat-files kept in $FIG_Config::data/Subsystems.

Objects.

We define the following perl objects:

Subsystem: represents a subsystem. It can be read from disk and written to disk, and manipulated via its methods when in memory.

If we were completely on the OO side of the world, we would also define the following set of objects. However, we are not, so they are only objects in a conceptual sense. They are implemented using the basic perl datatypes.

Role: represents a single role. A role has a name and an abbreviation.

RoleSubset: represents a subset of available roles. A subset has a name and a list of role names that comprise the subset.

Thoughts on locking

It is currently dangerous for multiple users to modify spreadsheets at once. It will likely remain dangerous while the subsystem backend is fairly stateless, as it is with the CGI mechanism.

We'd like to make this a little safer. One mechanism might be to allow a user to open a subsystem for modification, and others for readonly access. For this to work we have to be able to tell which users is allowed; current implementation uses the curator of the subsystem for this purpose.

NB: This module does not currently attempt to handle locking or exclusion. It is up to the caller (user application, CGI script, etc) to do so. It does attempt to use locking internally where appropriate.

Data structures

We maintain the following data structures (all members of %$self).

dir

Directory in which the subsystem is stored.

notes

The current notes contents for the subsystem

version

Current subsystem version.

exchangable

1 if subsystem is exchangable, 0 otherwise.

roles

List of role names.

role_index

hash that maps from role name to index

role_abbrs

list of role abbreviations

abbr

hash mapping from role abbreviation to role name

col_subsets

list of column subset names

col_subset_members

hash that maps from column subset name to subset members

col_active_subset

currently-active column subset

row_active_subset

currently-active row subset

genome

List of genome IDs.

variant_code

List of variant codes.

genome_index

Hash mapping from genome ID to genome index.

spreadsheet

Spreadsheet data. Structured as a list of rows, each of which is a list of entries. An entry is a list of PEG numbers.

spreadsheet_inv

Inverted structure of spreadsheet - list of columns, each of which is a list of rows.

Public Methods

new

    my $sub = Subsystem->new($subName, $fig, $createFlag);

Load the subsystem. If it does not exist, and $createFlag is true, create a new, empty subsystem.

subName

Name of the desired subsystem.

fig

FIG object for accessing the SEED data store.

createFlag

TRUE if an empty subsystem should be created with the given name, else FALSE. If a subsystem with the name already exists, this parameter has no effect.

all_functions

    my $pegRoles = $sub->all_functions();

Return a hash of all the features in the subsystem. The hash maps each feature ID to its functional assignment.

create_subsystem

Create a new subsystem. This creates the subsystem directory in the correct place ($FIG_Config::data/Subsystems), and populates it with the correct initial data.

get_diagrams

    my @list = $sub->get_diagrams();

Return a list of the diagrams associated with this subsystem. Each diagram is represented in the return list as a 4-tuple [diagram_id, diagram_name, page_link, img_link] where

diagram_id

ID code for this diagram.

diagram_name

Displayable name of the diagram.

page_link

URL of an HTML page containing information about the diagram.

img_link

URL of an HTML page containing an image for the diagram.

Note that the URLs are in fact for CGI scripts with parameters that point them to the correct place.

get_diagram

    my ($name, $pageURL, $imgURL) = $sub->get_diagram($id);

Get the information (if any) for the specified diagram. The diagram corresponds to a subdirectory of the subsystem's diagrams directory. For example, if the diagram ID is d03, the diagram's subdirectory would be $dir/diagrams/d03, where $dir is the subsystem directory. The diagram's name is extracted from a tiny file containing the name, and then the links are computed using the subsystem name and the diagram ID. The parameters are as follows.

id

ID code for the desired diagram.

RETURN

Returns a three-element list. The first element is the diagram name, the second a URL for displaying information about the diagram, and the third a URL for displaying the diagram image.

in_genome($genome,$peg)

    if ($sub->in_genome($genome,$peg))
    {
         process a PEG from the genome or region of a genome
    }

Return a boolean: "true" -> PEG falls within the genome or region of a genome

genome

either \d+\.\d+ (a typical genome ID) OR \d+\.\d+:Contig_Beg_End

where Contig is a string of non-whitespace characters Beg is an integer End is an integer

peg

ID of the peg we are checking

RETURN

Returns a boolean

get_peg_roles

    my @cols = $sub->get_peg_roles($peg);

Return the column numbers in which the specified PEG appears.

peg

ID of the feature whose roles are desired.

RETURN

Returns a list of the column numbers in which the peg appears, or an empty list if it is not found.

get_all_pegs

    my @pegs = $sub->get_all_pegs();

Return all pegs appearing in the subsystem.

write_subsystem

Write the subsystem to the disk. Updates on-disk data with notes, etc. Perform backups when necessary.

write_spreadsheet

    $sub->write_spreadsheet($fh);

Write the spreadsheet for this subsystem to filehandle $fh.

get_genomes

    my @genomeList = $sub->get_genomes();

Return a list of the genome IDs for this subsystem. Each genome corresponds to a row in the subsystem spreadsheet. Indexing into this list returns the ID of the genome in the specified row.

get_variant_codes

    my @codes = $sub->get_variant_codes();

Return a list of the variant codes for each genome, in row index order. The variant code indicates which variation of the subsystem is used by the given genome.

get_variant_code

    my $code = $sub->get_variant_code($gidx);

Return the variant code for the specified genome. Each subsystem has multiple variants which involve slightly different chemical reactions, and each variant has an associated variant code. When a genome is connected to the spreadsheet, the subsystem variant used by the genome must be specified.

gidx

Row index for the genome whose variant code is desired.

RETURN

Returns the variant code for the specified genome.

get_roles

    my @roles = $sub->get_roles();

Return a list of the subsystem's roles. Each role corresponds to a column in the subsystem spreadsheet. The list entry at a specified position in the list will contain the ID of that column's role.

get_abbr_for_role

    my $abbr = $sub->get_abbr_for_role($name);

Return the abbreviation for the given role name.

get_roles_for_genome

    my $abbr = $sub->get_roles_for_genome($genome_id);

Return the list of roles for which the given genome has nonempty cells.

get_genome_index

    my $idx = $sub->get_genome_index($genome);

Return the row index for the genome with the specified ID.

genome

ID of the genome whose row index is desired.

RETURN

Returns the row index for the genome with the specified ID, or an undefined value if the genome does not participate in the subsystem.

get_role_index

    my $idx = $sub->get_role_index($role);

Return the column index for the role with the specified ID.

role

ID (full name) of the role whose column index is desired.

RETURN

Returns the column index for the role with the specified name.

get_role_abbr

    my $abbr = $sub->get_role_abbr($ridx);

Return the abbreviation for the role in the specified column. The abbreviation is a shortened identifier that is not necessarily unique, but is more likely to fit in a column heading.

ridx

Column index for the role whose abbreviation is desired.

RETURN

Returns an abbreviated name for the role corresponding to the indexed column.

set_pegs_in_cell

    $sub->set_pegs_in_cell($genome, $role, $peg_list);

Set the cell for the given genome and role to $peg_list.

get_pegs_from_cell

    my @pegs = $sub->get_pegs_from_cell($rowstr, $colstr);

Return a list of the peg IDs for the features in the specified spreadsheet cell.

rowstr

Genome row, specified either as a row index or a genome ID.

colstr

Role column, specified either as a column index, a role name, or a role abbreviation.

RETURN

Returns a list of PEG IDs. The PEGs in the list belong to the genome in the specified row and perform the role in the specified column. If the indicated row and column does not exist, returns an empty list.

get_subset_namesC

    my @subsetNames = $sub->get_subset_namesC();

Return a list of the names for all the column (role) subsets. Given a subset name, you can use the get_subsetC_roles method to get the roles in the subset.

get_subsetC_roles

    my @roles = $sub->get_subsetC_roles($subname);

Return the names of the roles contained in the specified role (column) subset.

subname

Name of the role subset whose roles are desired.

RETURN

Returns a list of the role names for the columns in the named subset.

get_subsetR

    my @genomes = $sub->get_subsetR($subName);

Return the genomes in the row subset indicated by the specified subset name.

subName

Name of the desired row subset, or All to get all of the rows.

RETURN

Returns a list of genome IDs corresponding to the named subset.

load_row_subsets_by_kv

Load a row subset based on a key/value pair. This will take a single key/value pair and only show that subset

It is just a modification of load_row_subsets to deal with kv pairs

This takes a required argument: the key that the genome must have, and a second optional argument, the value that key must hold.

set_subsetC

    $sub->set_subsetC($name, $members);

Create a subset with the given name and members.

$members is a list of role names.

_set_subset

Create a subset with the given name and members.

Internal version - here, members is a list of role indices.

set_roles

    $sub->set_roles($role_list);

Set the list of roles. $role_list is a list of tuples [$role_name, $abbreviation].

If a role already exists, it is used. If it does not exist, it is created empty.

add_role($role, $abbr)

Add the given role to the spreadsheet.

This causes a new column to be added, with empty values in each cell.

We do nothing if the role is already present.

Return the index of the new role.

change_role( $oldrole, $newrole )

Change just the function of a role

change_role_in_column_of_file($oldrole, $newrole, $colnum, $file, $backup_file)

Copy $file to $backup_file, then rewrite $file mapping any occurences of $oldrole with $newrole in column $colnum. Assume file is tab-separated.

Used for remapping the various reactions files.

$file and $backup_file are paths relative to the subsystem directory.

remove_role

Remove the role from the spreadsheet.

We do nothing if the role is not present.

add_genome($genome, $abbr)

Add the given genome to the spreadsheet.

This causes a new row to be added, with empty values in each cell.

We do nothing if the genome is already present.

Return the index of the new genome.

remove_genome

Remove the genome from the spreadsheet.

We do nothing if the genome is not present.

get_notes

    my $text = $sub->get_notes();

Return the descriptive notes for this subsystem.

get_description

    my $text = $sub->get_description();

Return the description for this subsystem.

get_variants

    my $text = $sub->get_variants();

Return the variants for this subsystem.

get_literature

    my $text = $sub->get_literature();

Return the literature for this subsystem.

get_reactions

    my $reactHash = $sub->get_reactions();

Return a reference to a hash that maps each role ID to a list of the reactions catalyzed by the role.

get_curator

    my $userName = $sub->get_curator();

Return the name of this subsystem's official curator.

add_to_subsystem($subsystem_name, $columns, $notes_flag)

Merge the given columns from $subsystem_name into this subsystem. Append the notes from the subsystem if $notes_flag is true.

functional_role_instances

    my @role_instances = $sub->functional_role_instances($role);

Returns the set of genes for a functional role that belong to genomes with functional variants (> 0).

If the flag $strict is set to true, an additional check for the correct function assignment is performed. If the name of the functional role does not occur exaclty in the latest function assignment of the PEG, it is not included in the returned array. A simple index check is done.

get_dir_from_name

    my $dirName = Subsystem::get_dir_from_name($name);

Return the name of the directory containing the SEED data for the specified subsystem.

name

Name of the subsystem whose directory is desired.

RETURN

Returns the fully-qualified directory name for the subsystem.

Static Utilities

These are internal static methods used by the Sprout Subsystem object (SproutSubsys.pm). They insure that common functions are implemented with common code.

GetDiagramIDs

    my @diagramIDs = Subsystem::GetDiagramIDs($subDir);

Return a list of the subsystem diagram IDs. The parameters are

subDir

Fully-qualified directory name for the subsystem.

RETURN

Returns a list of the diagram IDs for this subsystem. Each diagram ID corresponds to a diagram subdirectory in the subsystem's directory.

GetDiagramName

    my $name = Subsystem::GetDiagramName($subDir, $diagramID);

Return the name of the subsystem diagram with the specified ID.

subDir

Subsystem directory name.

diagramID

ID of the diagram whose name is desired.

RETURN

Returns the name of the specified diagram, or undef if the diagram does not exist.

ComputeDiagramURLs

    my ($link, $imgLink) = Subsystem::ComputeDiagramURLs($self, $ssName, $diagramID, $sprout);

This is an internal static method that computes the URLs for a subsystem diagram. It insures that both SEED and Sprout use the same rules for generating the diagram URLs. The parameters are as follows.

self

Relevant subsystem object.

ssName

Name of the relevant subsystem.

diagramID

ID of the relevant diagram.

sprout (optional)

If specified, indicates this should be a Sprout URL.

RETURN

Returns a two-element list, the first element of which is a link to the diagram page, and the second of which is a link to the diagram image.

Method Listing

index_cell

Create the subsystem_index entries for the given cell. (NEW).

delete_role(name)

Delete the given role.

add_role(name, abbr)

Add a new role.

get_subset(name)

A deprecated form of get_subsetC

get_subsetC(name)

Returns a given subset. A subset is an object, implemented as a blessed array of roles.

add_genome(genome_id, variant_code)
remove_genome(genome_id)