Title: | Predict Transmembrane Protein Topology |
---|---|
Description: | Proteins reside in either the cell plasma or in the cell membrane. A membrane protein goes through the membrane at least once. Given the amino acid sequence of a membrane protein, the tool 'PureseqTM' (<https://github.com/PureseqTM/pureseqTM_package>, as described in "Efficient And Accurate Prediction Of Transmembrane Topology From Amino acid sequence only.", Wang, Qing, et al (2019), <doi:10.1101/627307>), can predict the topology of a membrane protein. This package allows one to use 'PureseqTM' from R. |
Authors: | Richèl J.C. Bilderbeek [aut, cre] |
Maintainer: | Richèl J.C. Bilderbeek <[email protected]> |
License: | GPL-3 |
Version: | 1.4 |
Built: | 2024-10-31 04:21:13 UTC |
Source: | https://github.com/richelbilderbeek/pureseqtmr |
Are the sequences transmembrance helices?
are_tmhs(protein_sequences, folder_name = get_default_pureseqtm_folder())
are_tmhs(protein_sequences, folder_name = get_default_pureseqtm_folder())
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
a vector of booleans of the same length as the number of sequences. The ith element is TRUE if the ith protein sequence is a transmembrane helix
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { sequences <- c( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "VVIILTIRGNILVIMAVSLE" ) are_tmhs(sequences) }
if (is_pureseqtm_installed()) { sequences <- c( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "VVIILTIRGNILVIMAVSLE" ) are_tmhs(sequences) }
Determine if these are all valid protein sequences, as can be used in topology prediction
are_valid_protein_sequences(protein_sequences, verbose = FALSE)
are_valid_protein_sequences(protein_sequences, verbose = FALSE)
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
verbose |
set to TRUE for more output |
TRUE if the protein sequence is valid
Calculate the the distance for each amino acid to the center of the TMH
calc_distance_to_tmh_center_from_topology(topology)
calc_distance_to_tmh_center_from_topology(topology)
topology |
the topology as a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains the respective topologies as strings. |
a tibble with the columns 'name' and 'position' and 'distance_to_tmh_center'
Richèl J.C. Bilderbeek
Calculate the the distance for each amino acid to the center of the TMH
calc_distance_to_tmh_center_from_topology_str(topology_str)
calc_distance_to_tmh_center_from_topology_str(topology_str)
topology_str |
the topology as a string,
for example |
a tibble with the columns 'position' and 'distance_to_tmh_center'
Richèl J.C. Bilderbeek
Use Rcpp to calculate the distance to a TMH center
calc_distance_to_tmh_center_from_topology_str_cpp_stl(topology_str)
calc_distance_to_tmh_center_from_topology_str_cpp_stl(topology_str)
topology_str |
a topology as a string |
a vector with distances
Will stop if the protein sequence is invalid, with a helpful error message.
check_protein_sequence(protein_sequence)
check_protein_sequence(protein_sequence)
protein_sequence |
a protein sequence, with
the amino acids as capitals, for
example |
A protein sequence is invalid if:
it has zero, two or more sequences
the sequence contains zero, 1 or 2 amino acids
the sequence contains characters that are not in the
amino acid uppercase alphabet,
that is ACDEFGHIKLMNPQRSTVWY
nothing. Will stop if the protein sequence is invalid, with a helpful error message.
check_protein_sequence("FAMILYVW")
check_protein_sequence("FAMILYVW")
Will stop if the protein sequence is invalid, with a helpful error message.
check_protein_sequences(protein_sequences)
check_protein_sequences(protein_sequences)
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
A protein sequence is invalid if:
it has zero, two or more sequences
the sequence contains zero, 1 or 2 amino acids
the sequence contains characters that are not in the
amino acid uppercase alphabet,
that is ACDEFGHIKLMNPQRSTVWY
nothing. Will stop at the first invalid protein sequence, with a helpful error message.
check_protein_sequences(c("FAMILYVW", "FAMILYVW"))
check_protein_sequences(c("FAMILYVW", "FAMILYVW"))
Checks the installation of PureseqTM. Throws a helpful error message if incomplete, else does nothing
check_pureseqtm_installation(folder_name = get_default_pureseqtm_folder())
check_pureseqtm_installation(folder_name = get_default_pureseqtm_folder())
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
Nothing. Will stop with a helpful error message if PureseqTM is not installed.
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { check_pureseqtm_installation() }
if (is_pureseqtm_installed()) { check_pureseqtm_installation() }
Check if the argument is of the same type as a predicted topology, as can be created with predict_topology. Will stop if not.
check_topology(topology)
check_topology(topology)
topology |
the topology as a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains the respective topologies as strings. |
Nothing. Will stop with a helpful error message if the topology is invalid.
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") topology <- predict_topology(fasta_filename) check_topology(topology) }
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") topology <- predict_topology(fasta_filename) check_topology(topology) }
Check if the topology string is valid. Will stop if not.
check_topology_str(topology_str)
check_topology_str(topology_str)
topology_str |
the topology as a string,
for example |
Nothing. Will stop with a helpful error message if the topology is invalid.
Richèl J.C. Bilderbeek
check_topology_str("0000000000000000000000000011111111111111111100000")
check_topology_str("0000000000000000000000000011111111111111111100000")
Convert a TMHMM topology to a PureseqTM topology
convert_tmhmm_to_pureseqtm_topology(tmhmm_topology)
convert_tmhmm_to_pureseqtm_topology(tmhmm_topology)
tmhmm_topology |
topology as used by |
a tibble with column names
name
and topology
, as can be checked by check_topology
Richèl J.C. Bilderbeek
tmhmm_topo_filename <- system.file( "extdata", "UP000005640_9606_no_u.tmhmm", package = "pureseqtmr" ) tmhmm_topology <- load_topology_file_as_tibble(tmhmm_topo_filename) convert_tmhmm_to_pureseqtm_topology(tmhmm_topology)
tmhmm_topo_filename <- system.file( "extdata", "UP000005640_9606_no_u.tmhmm", package = "pureseqtmr" ) tmhmm_topology <- load_topology_file_as_tibble(tmhmm_topo_filename) convert_tmhmm_to_pureseqtm_topology(tmhmm_topology)
Count the number of TMHs in a topology
count_n_tmhs(topology_strs)
count_n_tmhs(topology_strs)
topology_strs |
the topologies as zero, one oor more strings,
for example |
count_n_tmhs("00000000000000000000000000") count_n_tmhs("00000000001111100000000000") count_n_tmhs(c("0", "1"))
count_n_tmhs("00000000000000000000000000") count_n_tmhs("00000000001111100000000000") count_n_tmhs(c("0", "1"))
Create the five PureseqTM output files, by running PureseqTM.
create_pureseqtm_files( fasta_filename, folder_name = get_default_pureseqtm_folder(), temp_folder_name = tempfile(pattern = "pureseqt_") )
create_pureseqtm_files( fasta_filename, folder_name = get_default_pureseqtm_folder(), temp_folder_name = tempfile(pattern = "pureseqt_") )
fasta_filename |
path to a FASTA file |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
temp_folder_name |
path of a temporary folder. The folder does not need to exist. Files that are out in this folder are not automatically deleted, which is not a problem, as the default path given by tempdir is automatically cleaned by the operating system |
full path to the files created
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") create_pureseqtm_files(fasta_filename) }
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") create_pureseqtm_files(fasta_filename) }
Create the output file of a PureseqTM proteome run
create_pureseqtm_proteome_file( fasta_filename, topology_filename = tempfile(fileext = ".top"), folder_name = get_default_pureseqtm_folder() )
create_pureseqtm_proteome_file( fasta_filename, topology_filename = tempfile(fileext = ".top"), folder_name = get_default_pureseqtm_folder() )
fasta_filename |
path to a FASTA file |
topology_filename |
name of the file to save a protein's topology to |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
the filename
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") create_pureseqtm_proteome_file(fasta_filename) }
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") create_pureseqtm_proteome_file(fasta_filename) }
This function does nothing. It is intended to inherit is parameters' documentation.
default_params_doc( download_url, fasta_filename, fasta_file_text, folder_name, protein_sequence, protein_sequences, pureseqtm_filename, pureseqtm_proteome_text, pureseqtm_result, pureseqtm_url, temp_fasta_filename, temp_folder_name, tmhmm_topology, topology, topology_filename, topology_str, topology_strs, verbose )
default_params_doc( download_url, fasta_filename, fasta_file_text, folder_name, protein_sequence, protein_sequences, pureseqtm_filename, pureseqtm_proteome_text, pureseqtm_result, pureseqtm_url, temp_fasta_filename, temp_folder_name, tmhmm_topology, topology, topology_filename, topology_str, topology_strs, verbose )
download_url |
the URL to download PureseqTM from |
fasta_filename |
path to a FASTA file |
fasta_file_text |
text of a FASTA file |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
protein_sequence |
a protein sequence, with
the amino acids as capitals, for
example |
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
pureseqtm_filename |
filename to write the PureseqTM results to |
pureseqtm_proteome_text |
the output of a call
to |
pureseqtm_result |
the result of a PureseqTM run |
pureseqtm_url |
URL of the PureseqTM git repository |
temp_fasta_filename |
temporary FASTA filename, which will deleted after usage |
temp_folder_name |
path of a temporary folder. The folder does not need to exist. Files that are out in this folder are not automatically deleted, which is not a problem, as the default path given by tempdir is automatically cleaned by the operating system |
tmhmm_topology |
topology as used by |
topology |
the topology as a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains the respective topologies as strings. |
topology_filename |
name of the file to save a protein's topology to |
topology_str |
the topology as a string,
for example |
topology_strs |
the topologies as zero, one oor more strings,
for example |
verbose |
set to TRUE for more output |
This is an internal function, so it should be marked with
@noRd
. This is not done, as this will disallow all
functions to find the documentation parameters
Richèl J.C. Bilderbeek
Get the path to the folder where this package installs PureseqTM by default
get_default_pureseqtm_folder()
get_default_pureseqtm_folder()
the path to the folder where this package installs PureseqTM by default
Richèl J.C. Bilderbeek
get_default_pureseqtm_folder()
get_default_pureseqtm_folder()
Get the full path to a PureseqTM example file. If the filename specified is not a PureseqTM example file, this function will stop
get_example_filename(filename, folder_name = get_default_pureseqtm_folder())
get_example_filename(filename, folder_name = get_default_pureseqtm_folder())
filename |
name of the example file, without the path |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
the full path to a PureseqTM example file
Richèl J.C. Bilderbeek
use get_example_filenames to get all PureseqTM example filenames
if (is_pureseqtm_installed()) { get_example_filename("1bhaA.fasta") }
if (is_pureseqtm_installed()) { get_example_filename("1bhaA.fasta") }
Get the full path to all PureseqTM example files
get_example_filenames(folder_name = get_default_pureseqtm_folder())
get_example_filenames(folder_name = get_default_pureseqtm_folder())
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
a character vector with all PureseqTM example files
Richèl J.C. Bilderbeek
use get_example_filename to get the full path to a PureseqTM example file
if (is_pureseqtm_installed()) { get_example_filenames() }
if (is_pureseqtm_installed()) { get_example_filenames() }
Get the URL of the PureseqTM source code
get_pureseqtm_url()
get_pureseqtm_url()
a URL as a character vector of one element
Richèl J.C. Bilderbeek
get_pureseqtm_url()
get_pureseqtm_url()
Get the PureseqTM version
get_pureseqtm_version(folder_name = get_default_pureseqtm_folder())
get_pureseqtm_version(folder_name = get_default_pureseqtm_folder())
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
a version number as a character vector of one element,
for example v0.10
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { get_pureseqtm_version() }
if (is_pureseqtm_installed()) { get_pureseqtm_version() }
Install PureseqTM to a local folder
install_pureseqtm( folder_name = get_default_pureseqtm_folder(), pureseqtm_url = get_pureseqtm_url() )
install_pureseqtm( folder_name = get_default_pureseqtm_folder(), pureseqtm_url = get_pureseqtm_url() )
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
pureseqtm_url |
URL of the PureseqTM git repository |
Nothing.
Richèl J.C. Bilderbeek
## Not run: install_pureseqtm() ## End(Not run)
## Not run: install_pureseqtm() ## End(Not run)
Determines if the environment is AppVeyor
is_on_appveyor()
is_on_appveyor()
TRUE if run on AppVeyor, FALSE otherwise
Richèl J.C. Bilderbeek
if (is_on_appveyor()) { message("Running on AppVeyor") }
if (is_on_appveyor()) { message("Running on AppVeyor") }
Determines if the environment is a continuous integration service
is_on_ci()
is_on_ci()
TRUE if run on AppVeyor or Travis CI, FALSE otherwise
Richèl J.C. Bilderbeek
if (is_on_ci()) { message("Running on a continuous integration service") }
if (is_on_ci()) { message("Running on a continuous integration service") }
Determines if the environment is GitHub Actions
is_on_github_actions()
is_on_github_actions()
TRUE if run on GitHub Actions, FALSE otherwise
Richèl J.C. Bilderbeek
if (is_on_github_actions()) { message("Running on GitHub Actions") }
if (is_on_github_actions()) { message("Running on GitHub Actions") }
Determines if the environment is Travis CI
is_on_travis()
is_on_travis()
TRUE if run on Travis CI, FALSE otherwise
Richèl J.C. Bilderbeek
if (is_on_ci()) { message("Running on Travis CI") }
if (is_on_ci()) { message("Running on Travis CI") }
Is the line of text the name of a protein, as used within a FASTA filename?
is_protein_name_line(line)
is_protein_name_line(line)
line |
line of text from a FASTA filename |
TRUE if the line can be the name of a protein in a FASTA file
Richèl J.C. Bilderbeek
is_protein_name_line(">5H2A_CRIGR")
is_protein_name_line(">5H2A_CRIGR")
Measure if PureseqTM is installed locally
is_pureseqtm_installed(folder_name = get_default_pureseqtm_folder())
is_pureseqtm_installed(folder_name = get_default_pureseqtm_folder())
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
TRUE is PureseqTM is installed locally, FALSE otherwise
Richèl J.C. Bilderbeek
is_pureseqtm_installed()
is_pureseqtm_installed()
Determine if the protein sequence contains at least one transmembrane helix.
is_tmh(protein_sequence, folder_name = get_default_pureseqtm_folder())
is_tmh(protein_sequence, folder_name = get_default_pureseqtm_folder())
protein_sequence |
a protein sequence, with
the amino acids as capitals, for
example |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
TRUE if the protein sequence contains at least one transmembrane helix
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { # This sequence is a TMH is_tmh("QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM") # This sequence is not a TMH is_tmh("VVIILTIRGNILVIMAVSLE") }
if (is_pureseqtm_installed()) { # This sequence is a TMH is_tmh("QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM") # This sequence is not a TMH is_tmh("VVIILTIRGNILVIMAVSLE") }
Is the line of text the topology, as used within a FASTA filename? In this context, a topology is a string of zeroes and ones, in which a one denotes that that amino acid is within the membrane.
is_topology_line(line)
is_topology_line(line)
line |
line of text from a FASTA filename |
TRUE if the line can be the text of a topology in a FASTA file.
Richèl J.C. Bilderbeek
# This is a valid topology is_topology_line("000010101011") # This is an invalid topology is_topology_line("invalid")
# This is a valid topology is_topology_line("000010101011") # This is an invalid topology is_topology_line("invalid")
Determine if this is a valid protein sequence, as can be used in topology prediction
is_valid_protein_sequence(protein_sequence, verbose = FALSE)
is_valid_protein_sequence(protein_sequence, verbose = FALSE)
protein_sequence |
a protein sequence, with
the amino acids as capitals, for
example |
verbose |
set to TRUE for more output |
TRUE if the protein sequence is valid
name
and sequence
columnParse a FASTA file to a table
with a name
and sequence
column
load_fasta_file_as_tibble(fasta_filename)
load_fasta_file_as_tibble(fasta_filename)
fasta_filename |
path to a FASTA file |
a tibble
with a name
and sequence
column
use load_fasta_file_as_tibble_cpp to directly call the C++ function that does the actual work. Use load_fasta_file_as_tibble_r to call the (approx ten thousand times slower) R function
name
and sequence
columnParse a FASTA file to a table
with a name
and sequence
column
load_fasta_file_as_tibble_cpp(fasta_filename)
load_fasta_file_as_tibble_cpp(fasta_filename)
fasta_filename |
path to a FASTA file |
a tibble
with a name
and sequence
column
Use Rcpp to load a FASTA file
load_fasta_file_as_tibble_cpp_raw(fasta_filename)
load_fasta_file_as_tibble_cpp_raw(fasta_filename)
fasta_filename |
FASTA filename |
a list with two character vectors, named 'name' and 'sequence'
name
and sequence
columnParse a FASTA file to a table
with a name
and sequence
column
load_fasta_file_as_tibble_r(fasta_filename)
load_fasta_file_as_tibble_r(fasta_filename)
fasta_filename |
path to a FASTA file |
a tibble
with a name
and sequence
column
.topo
) file to a table
with a name
and topology
columnParse a topology (.topo
) file to a table
with a name
and topology
column
load_topology_file_as_tibble(topology_filename)
load_topology_file_as_tibble(topology_filename)
topology_filename |
name of the file to save a protein's topology to |
a tibble
with a name
and topology
column,
as can be checked by check_topology
topology_filename <- system.file( "extdata", "100507436.topo", package = "pureseqtmr" ) load_topology_file_as_tibble(topology_filename)
topology_filename <- system.file( "extdata", "100507436.topo", package = "pureseqtmr" ) load_topology_file_as_tibble(topology_filename)
Do a mock prediction directy on a protein sequence, as can be useful in testing Use predict_topologies_from_sequences for doing a real prediction.
mock_predict_topologies_from_sequences(protein_sequences)
mock_predict_topologies_from_sequences(protein_sequences)
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
a topology as a string of zeroes and ones, where a one denotes that the corresponding amino acid is located within the membrane.
Richèl J.C. Bilderbeek
protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) mock_predict_topologies_from_sequences(protein_sequence)
protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) mock_predict_topologies_from_sequences(protein_sequence)
Uses predict_topology for doing a real prediction
mock_predict_topology(fasta_filename)
mock_predict_topology(fasta_filename)
fasta_filename |
path to a FASTA file |
a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains all respective topologies.
Richèl J.C. Bilderbeek
fasta_filename <- tempfile() save_tibble_as_fasta_file( t = tibble::tibble( name = c("A", "B"), sequence = c("FAMILYVW", "VWFAMILY") ), fasta_filename = fasta_filename ) mock_predict_topology(fasta_filename)
fasta_filename <- tempfile() save_tibble_as_fasta_file( t = tibble::tibble( name = c("A", "B"), sequence = c("FAMILYVW", "VWFAMILY") ), fasta_filename = fasta_filename ) mock_predict_topology(fasta_filename)
PureseqTM_proteome.sh
Parse the output of a call to PureseqTM_proteome.sh
parse_pureseqtm_proteome_text(pureseqtm_proteome_text)
parse_pureseqtm_proteome_text(pureseqtm_proteome_text)
pureseqtm_proteome_text |
the output of a call
to |
Plot the topology
plot_topology(topology)
plot_topology(topology)
topology |
the topology as a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains the respective topologies as strings. |
a ggplot that displays the topology of one or more proteins
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed() && is_on_ci()) { fasta_filename <- get_example_filename("test_proteome.fasta") topology <- predict_topology(fasta_filename) plot_topology(topology) }
if (is_pureseqtm_installed() && is_on_ci()) { fasta_filename <- get_example_filename("test_proteome.fasta") topology <- predict_topology(fasta_filename) plot_topology(topology) }
Run PureseqTM directy on a protein sequence
predict_topologies_from_sequences( protein_sequences, folder_name = get_default_pureseqtm_folder(), temp_fasta_filename = tempfile(fileext = ".fasta") )
predict_topologies_from_sequences( protein_sequences, folder_name = get_default_pureseqtm_folder(), temp_fasta_filename = tempfile(fileext = ".fasta") )
protein_sequences |
one ore more protein sequence,
each sequence with the amino acids as capitals, for
example |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
temp_fasta_filename |
temporary FASTA filename, which will deleted after usage |
a topology as a string of zeroes and ones, where a one denotes that the corresponding amino acid is located within the membrane.
Richèl J.C. Bilderbeek
use mock_predict_topologies_from_sequences to mock the prediction of protein sequences, as can be useful in testing
if (is_pureseqtm_installed()) { protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) predict_topology_from_sequence(protein_sequence) }
if (is_pureseqtm_installed()) { protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) predict_topology_from_sequence(protein_sequence) }
Predict the topology of zero, one or more proteins, of which the names and sequences are stored in the FASTA format
predict_topology( fasta_filename, folder_name = get_default_pureseqtm_folder(), topology_filename = tempfile(fileext = ".top") )
predict_topology( fasta_filename, folder_name = get_default_pureseqtm_folder(), topology_filename = tempfile(fileext = ".top") )
fasta_filename |
path to a FASTA file |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
topology_filename |
name of the file to save a protein's topology to |
a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains all respective topologies.
unlike PureseqTM, the topologies predicted are returned in the same order as the original sequences. A bugreport is posted at the PureseqTM GitHub repository at https://github.com/PureseqTM/PureseqTM_Package/issues/11
Richèl J.C. Bilderbeek
use mock_predict_topology to do a mock prediction, as can be useful in testing
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") predict_topology(fasta_filename) }
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") predict_topology(fasta_filename) }
Will stop if the protein sequence is shorter than three amino acids.
predict_topology_from_sequence( protein_sequence, folder_name = get_default_pureseqtm_folder(), temp_fasta_filename = tempfile(fileext = ".fasta") )
predict_topology_from_sequence( protein_sequence, folder_name = get_default_pureseqtm_folder(), temp_fasta_filename = tempfile(fileext = ".fasta") )
protein_sequence |
a protein sequence, with
the amino acids as capitals, for
example |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
temp_fasta_filename |
temporary FASTA filename, which will deleted after usage |
a topology as a string of zeroes and ones, where a one denotes that the corresponding amino acid is located within the membrane.
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) predict_topology_from_sequence(protein_sequence) }
if (is_pureseqtm_installed()) { protein_sequence <- paste0( "QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLM", "SLAIADMLLGFLVMPVSMLTILYGYRWP" ) predict_topology_from_sequence(protein_sequence) }
Proteins reside in either the cell plasma of in the cell membrane. A membrane protein goes through the membrane at least once. There are multiple ways to span this hydrophobic layer. One common structure is the transmembrane (alpha) helix (TMH). Given the amino acid sequence of a membrane protein, this package predicts which parts of the protein are TMHs
Richèl J.C. Bilderbeek
if (is_pureseqtm_installed()) { # Obtain an example filename fasta_filename <- get_example_filename("1bhaA.fasta") # Get the topology as a tibble topology <- predict_topology(fasta_filename) # show the topology plot_topology(topology) }
if (is_pureseqtm_installed()) { # Obtain an example filename fasta_filename <- get_example_filename("1bhaA.fasta") # Get the topology as a tibble topology <- predict_topology(fasta_filename) # show the topology plot_topology(topology) }
Create a pureseqtmr report, to be used when reporting bugs
pureseqtmr_report(folder_name = get_default_pureseqtm_folder())
pureseqtmr_report(folder_name = get_default_pureseqtm_folder())
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
Nothing.
Richèl J.C. Bilderbeek
pureseqtmr_report()
pureseqtmr_report()
Run PureseqTM on a proteome
run_pureseqtm_proteome( fasta_filename, folder_name = get_default_pureseqtm_folder(), topology_filename = tempfile(fileext = ".top") )
run_pureseqtm_proteome( fasta_filename, folder_name = get_default_pureseqtm_folder(), topology_filename = tempfile(fileext = ".top") )
fasta_filename |
path to a FASTA file |
folder_name |
superfolder of PureseqTM.
The superfolder's name is |
topology_filename |
name of the file to save a protein's topology to |
the topology of the proteome, using the same output as PureseqTM. Use predict_topology to get the topology as a tibble
Richèl J.C. Bilderbeek
Use predict_topology to predict the topology of a proteome
Use create_pureseqtm_files to only create the PureseqTM output files
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") run_pureseqtm_proteome(fasta_filename) }
if (is_pureseqtm_installed()) { fasta_filename <- get_example_filename("1bhaA.fasta") run_pureseqtm_proteome(fasta_filename) }
Save the first two columns of a tibble as a FASTA file
save_tibble_as_fasta_file(t, fasta_filename)
save_tibble_as_fasta_file(t, fasta_filename)
t |
a tibble |
fasta_filename |
path to a FASTA file |
Richèl J.C. Bilderbeek
Count the number of transmembrane helices in a topology
tally_tmhs(topology)
tally_tmhs(topology)
topology |
the topology as a tibble with the columns 'name' and 'topology', where the 'name' column hold all the proteins' names, and 'topology' contains the respective topologies as strings. |
a tibble with the number of TMHs per protein
if (is_pureseqtm_installed()) { tally_tmhs( predict_topology( get_example_filename("1bhaA.fasta") ) ) }
if (is_pureseqtm_installed()) { tally_tmhs( predict_topology( get_example_filename("1bhaA.fasta") ) ) }
Uninstall PureseqTM
uninstall_pureseqtm(folder_name = get_default_pureseqtm_folder())
uninstall_pureseqtm(folder_name = get_default_pureseqtm_folder())
folder_name |
name of the folder
where the PureseqTM files are installed.
The name of the PureseqTM binary file will be at
|
Nothing.
Richèl J.C. Bilderbeek