in_genome($genome,$peg)
Any manipulation of subsystem data should happen through this interface. This allows us to assure ourselves that the relational tables that mirror and index the subsystem data are kept up to date with the canonical version of the subsystem information in the flat-files kept in $FIG_Config::data/Subsystems.
We define the following perl objects:
Subsystem: represents a subsystem. It can be read from disk and written to disk, and manipulated via its methods when in memory.
If we were completely on the OO side of the world, we would also define the following set of objects. However, we are not, so they are only objects in a conceptual sense. They are implemented using the basic perl datatypes.
Role: represents a single role. A role has a name and an abbreviation.
RoleSubset: represents a subset of available roles. A subset has a name and a list of role names that comprise the subset.
It is currently dangerous for multiple users to modify spreadsheets at once. It will likely remain dangerous while the subsystem backend is fairly stateless, as it is with the CGI mechanism.
We'd like to make this a little safer. One mechanism might be to allow a user to open a subsystem for modification, and others for readonly access. For this to work we have to be able to tell which users is allowed; current implementation uses the curator of the subsystem for this purpose.
NB: This module does not currently attempt to handle locking or exclusion. It is up to the caller (user application, CGI script, etc) to do so. It does attempt to use locking internally where appropriate.
We maintain the following data structures (all members of %$self).
Directory in which the subsystem is stored.
The current notes contents for the subsystem
Current subsystem version.
1 if subsystem is exchangable, 0 otherwise.
List of role names.
hash that maps from role name to index
list of role abbreviations
hash mapping from role abbreviation to role name
list of column subset names
hash that maps from column subset name to subset members
currently-active column subset
currently-active row subset
List of genome IDs.
List of variant codes.
Hash mapping from genome ID to genome index.
Spreadsheet data. Structured as a list of rows, each of which is a list of entries. An entry is a list of PEG numbers.
Inverted structure of spreadsheet - list of columns, each of which is a list of rows.
my $sub = Subsystem->new($subName, $fig, $createFlag);
Load the subsystem. If it does not exist, and $createFlag is true, create a new, empty subsystem.
Name of the desired subsystem.
FIG object for accessing the SEED data store.
TRUE if an empty subsystem should be created with the given name, else FALSE. If a subsystem with the name already exists, this parameter has no effect.
my $pegRoles = $sub->all_functions();
Return a hash of all the features in the subsystem. The hash maps each feature ID to its functional assignment.
Create a new subsystem. This creates the subsystem directory in the correct place ($FIG_Config::data/Subsystems), and populates it with the correct initial data.
my @list = $sub->get_diagrams();
Return a list of the diagrams associated with this subsystem. Each diagram
is represented in the return list as a 4-tuple [diagram_id, diagram_name,
page_link, img_link] where
ID code for this diagram.
Displayable name of the diagram.
URL of an HTML page containing information about the diagram.
URL of an HTML page containing an image for the diagram.
Note that the URLs are in fact for CGI scripts with parameters that point them to the correct place.
my ($name, $pageURL, $imgURL) = $sub->get_diagram($id);
Get the information (if any) for the specified diagram. The diagram corresponds
to a subdirectory of the subsystem's diagrams directory. For example, if the
diagram ID is d03, the diagram's subdirectory would be $dir/diagrams/d03,
where $dir is the subsystem directory. The diagram's name is extracted from
a tiny file containing the name, and then the links are computed using the
subsystem name and the diagram ID. The parameters are as follows.
ID code for the desired diagram.
Returns a three-element list. The first element is the diagram name, the second a URL for displaying information about the diagram, and the third a URL for displaying the diagram image.
in_genome($genome,$peg)
if ($sub->in_genome($genome,$peg))
{
process a PEG from the genome or region of a genome
}
Return a boolean: "true" -> PEG falls within the genome or region of a genome
either \d+\.\d+ (a typical genome ID) OR \d+\.\d+:Contig_Beg_End
where Contig is a string of non-whitespace characters Beg is an integer End is an integer
ID of the peg we are checking
Returns a boolean
my @cols = $sub->get_peg_roles($peg);
Return the column numbers in which the specified PEG appears.
ID of the feature whose roles are desired.
Returns a list of the column numbers in which the peg appears, or an empty list if it is not found.
my @pegs = $sub->get_all_pegs();
Return all pegs appearing in the subsystem.
Write the subsystem to the disk. Updates on-disk data with notes, etc. Perform backups when necessary.
$sub->write_spreadsheet($fh);
Write the spreadsheet for this subsystem to filehandle $fh.
my @genomeList = $sub->get_genomes();
Return a list of the genome IDs for this subsystem. Each genome corresponds to a row in the subsystem spreadsheet. Indexing into this list returns the ID of the genome in the specified row.
my @codes = $sub->get_variant_codes();
Return a list of the variant codes for each genome, in row index order. The variant code indicates which variation of the subsystem is used by the given genome.
my $code = $sub->get_variant_code($gidx);
Return the variant code for the specified genome. Each subsystem has multiple variants which involve slightly different chemical reactions, and each variant has an associated variant code. When a genome is connected to the spreadsheet, the subsystem variant used by the genome must be specified.
Row index for the genome whose variant code is desired.
Returns the variant code for the specified genome.
my @roles = $sub->get_roles();
Return a list of the subsystem's roles. Each role corresponds to a column in the subsystem spreadsheet. The list entry at a specified position in the list will contain the ID of that column's role.
my $abbr = $sub->get_abbr_for_role($name);
Return the abbreviation for the given role name.
my $abbr = $sub->get_roles_for_genome($genome_id);
Return the list of roles for which the given genome has nonempty cells.
my $idx = $sub->get_genome_index($genome);
Return the row index for the genome with the specified ID.
ID of the genome whose row index is desired.
Returns the row index for the genome with the specified ID, or an undefined value if the genome does not participate in the subsystem.
my $idx = $sub->get_role_index($role);
Return the column index for the role with the specified ID.
ID (full name) of the role whose column index is desired.
Returns the column index for the role with the specified name.
my $abbr = $sub->get_role_abbr($ridx);
Return the abbreviation for the role in the specified column. The abbreviation is a shortened identifier that is not necessarily unique, but is more likely to fit in a column heading.
Column index for the role whose abbreviation is desired.
Returns an abbreviated name for the role corresponding to the indexed column.
$sub->set_pegs_in_cell($genome, $role, $peg_list);
Set the cell for the given genome and role to $peg_list.
my @pegs = $sub->get_pegs_from_cell($rowstr, $colstr);
Return a list of the peg IDs for the features in the specified spreadsheet cell.
Genome row, specified either as a row index or a genome ID.
Role column, specified either as a column index, a role name, or a role abbreviation.
Returns a list of PEG IDs. The PEGs in the list belong to the genome in the specified row and perform the role in the specified column. If the indicated row and column does not exist, returns an empty list.
my @subsetNames = $sub->get_subset_namesC();
Return a list of the names for all the column (role) subsets. Given a subset name, you can use the get_subsetC_roles method to get the roles in the subset.
my @roles = $sub->get_subsetC_roles($subname);
Return the names of the roles contained in the specified role (column) subset.
Name of the role subset whose roles are desired.
Returns a list of the role names for the columns in the named subset.
my @genomes = $sub->get_subsetR($subName);
Return the genomes in the row subset indicated by the specified subset name.
Name of the desired row subset, or All to get all of the rows.
Returns a list of genome IDs corresponding to the named subset.
Load a row subset based on a key/value pair. This will take a single key/value pair and only show that subset
It is just a modification of load_row_subsets to deal with kv pairs
This takes a required argument: the key that the genome must have, and a second optional argument, the value that key must hold.
$sub->set_subsetC($name, $members);
Create a subset with the given name and members.
$members is a list of role names.
Create a subset with the given name and members.
Internal version - here, members is a list of role indices.
$sub->set_roles($role_list);
Set the list of roles. $role_list is a list of tuples [$role_name, $abbreviation].
If a role already exists, it is used. If it does not exist, it is created empty.
Add the given role to the spreadsheet.
This causes a new column to be added, with empty values in each cell.
We do nothing if the role is already present.
Return the index of the new role.
Change just the function of a role
Copy $file to $backup_file, then rewrite $file mapping any occurences of $oldrole with $newrole in column $colnum. Assume file is tab-separated.
Used for remapping the various reactions files.
$file and $backup_file are paths relative to the subsystem directory.
Remove the role from the spreadsheet.
We do nothing if the role is not present.
Add the given genome to the spreadsheet.
This causes a new row to be added, with empty values in each cell.
We do nothing if the genome is already present.
Return the index of the new genome.
Remove the genome from the spreadsheet.
We do nothing if the genome is not present.
my $text = $sub->get_notes();
Return the descriptive notes for this subsystem.
my $text = $sub->get_description();
Return the description for this subsystem.
my $text = $sub->get_variants();
Return the variants for this subsystem.
my $text = $sub->get_literature();
Return the literature for this subsystem.
my $reactHash = $sub->get_reactions();
Return a reference to a hash that maps each role ID to a list of the reactions catalyzed by the role.
my $userName = $sub->get_curator();
Return the name of this subsystem's official curator.
Merge the given columns from $subsystem_name into this subsystem. Append the notes from the subsystem if $notes_flag is true.
my @role_instances = $sub->functional_role_instances($role);
Returns the set of genes for a functional role that belong to genomes with functional variants (> 0).
If the flag $strict is set to true, an additional check for the correct function assignment is performed. If the name of the functional role does not occur exaclty in the latest function assignment of the PEG, it is not included in the returned array. A simple index check is done.
my $dirName = Subsystem::get_dir_from_name($name);
Return the name of the directory containing the SEED data for the specified subsystem.
Name of the subsystem whose directory is desired.
Returns the fully-qualified directory name for the subsystem.
These are internal static methods used by the Sprout Subsystem object (SproutSubsys.pm). They insure that common functions are implemented with common code.
my @diagramIDs = Subsystem::GetDiagramIDs($subDir);
Return a list of the subsystem diagram IDs. The parameters are
Fully-qualified directory name for the subsystem.
Returns a list of the diagram IDs for this subsystem. Each diagram ID corresponds to a diagram subdirectory in the subsystem's directory.
my $name = Subsystem::GetDiagramName($subDir, $diagramID);
Return the name of the subsystem diagram with the specified ID.
Subsystem directory name.
ID of the diagram whose name is desired.
Returns the name of the specified diagram, or undef if the diagram does
not exist.
my ($link, $imgLink) = Subsystem::ComputeDiagramURLs($self, $ssName, $diagramID, $sprout);
This is an internal static method that computes the URLs for a subsystem diagram. It insures that both SEED and Sprout use the same rules for generating the diagram URLs. The parameters are as follows.
Relevant subsystem object.
Name of the relevant subsystem.
ID of the relevant diagram.
If specified, indicates this should be a Sprout URL.
Returns a two-element list, the first element of which is a link to the diagram page, and the second of which is a link to the diagram image.
Create the subsystem_index entries for the given cell. (NEW).
Delete the given role.
Add a new role.
A deprecated form of get_subsetC
Returns a given subset. A subset is an object, implemented as a blessed array of roles.