DataSet¶
#include <pbbam/DataSet.h>
-
class
PacBio::BAM::
DataSet
¶ The DataSet class represents a PacBio analyis dataset (e.g. from XML).
It provides resource paths, filters, and metadata associated with a dataset under analysis.
DataSet Type
-
enum
TypeEnum
¶ This enum defines the currently-supported DataSet types.
Values:
-
GENERIC
= 0¶
-
ALIGNMENT
¶
-
BARCODE
¶
-
CONSENSUS_ALIGNMENT
¶
-
CONSENSUS_READ
¶
-
CONTIG
¶
-
HDF_SUBREAD
¶
-
REFERENCE
¶
-
SUBREAD
¶
-
-
static DataSet::TypeEnum
NameToType
(const std::string &typeName)¶ Converts printable dataset type to type enum.
- Return
- dataset type enum
- Parameters
typeName
: printable dataset type
- Exceptions
std::runtime_error
: iftypeName
is unknown
-
static std::string
TypeToName
(const DataSet::TypeEnum &type)¶ Converts dataset type enum to printable name.
- Return
- printable dataset type
- Parameters
type
: dataset type enum
- Exceptions
std::runtime_error
: iftype
is unknown
-
std::string
TypeName
() const¶ Fetches the dataset’s type.
- Return
- printable dataset type
Constructors & Related Methods
-
static DataSet
FromXml
(const std::string &xml)¶ Creates a DataSet from “raw” XML data.
- Parameters
xml
: DataSetXML text
-
DataSet
(const DataSet::TypeEnum type)¶ Constructs an empty DataSet of the type specified.
- Parameters
type
: dataset type
- Exceptions
std::runtime_error
: iftype
is unknown
-
DataSet
(const BamFile &bamFile)¶ Constructs a DataSet from a BAM file.
This currently defaults to a SubreadSet, with an ExternalResource pointing to BamFile::Filename.
- Parameters
bamFile
: BamFile object
-
DataSet
(const std::string &filename)¶ Loads a DataSet from a file.
filename
may be one of the following types, indicated by its extension:- BAM (“*.bam”)
- FOFN (“*.fofn”)
- FASTA (“*.fa” or “*.fasta”)
- DataSetXML (“*.xml”)
- Parameters
filename
: input filename
- Exceptions
std::runtime_error
: iffilename
has an unsupported extension, or if a valid DataSet could not be created from its contents
-
DataSet
(const std::vector<std::string> &filenames)¶ Constructs a DataSet from a list of files.
- Parameters
filenames
: input filenames
- Exceptions
std::runtime_error
: if DataSet could not be created fromfilenames
-
~DataSet
()¶
Operators
Serialization
Attributes
-
const std::string &
Attribute
(const std::string &name) const¶ Fetches the value of a DataSet root element’s attribute.
These are the attributes attached to the root dataset element:
<SubreadSet foo="x" bar="y" />
Built-in accessors exist for the standard attributes (e.g. CreatedAt) but additional attributes can be used as well via these generic Attribute methods.
- Return
- const reference to attribute’s value (empty string if not present)
- Parameters
name
: root element’s attribute name
-
const std::string &
CreatedAt
() const¶ Fetches the value of dataset’s CreatedAt attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
Format
() const¶ Fetches the value of dataset’s Format attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
MetaType
() const¶ Fetches the value of dataset’s MetaType attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
ModifiedAt
() const¶ Fetches the value of dataset’s ModifiedAt attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
Name
() const¶ Fetches the value of dataset’s Name attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
ResourceId
() const¶ Fetches the value of dataset’s ResourceId attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
Tags
() const¶ Fetches the value of dataset’s Tags attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
TimeStampedName
() const¶ Fetches the value of dataset’s TimeStampedName attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
UniqueId
() const¶ Fetches the value of dataset’s UniqueId attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
const std::string &
Version
() const¶ Fetches the value of dataset’s Version attribute.
- Return
- const reference to attribute’s value (empty string if not present)
-
std::string &
Attribute
(const std::string &name)¶ Fetches the value of a DataSet root element’s attribute.
These are the attributes attached to the root dataset element:
<SubreadSet foo="x" bar="y" />
Built-in accessors exist for the standard attributes (e.g. CreatedAt) but additional attributes can be used as well via these generic methods.
A new attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
- Parameters
name
: root element’s attribute name
-
std::string &
CreatedAt
()¶ Fetches the value of dataset’s CreatedAt attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
Format
()¶ Fetches the value of dataset’s Format attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
MetaType
()¶ Fetches the value of dataset’s MetaType attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
ModifiedAt
()¶ Fetches the value of dataset’s ModifiedAt attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
Name
()¶ Fetches the value of dataset’s Name attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
ResourceId
()¶ Fetches the value of dataset’s ResourceId attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
Tags
()¶ Fetches the value of dataset’s Tags attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
TimeStampedName
()¶ Fetches the value of dataset’s TimeStampedName attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
UniqueId
()¶ Fetches the value of dataset’s UniqueId attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
std::string &
Version
()¶ Fetches the value of dataset’s Version attribute.
This attribute will be created if it does not yet exist.
- Return
- non-const reference to attribute’s value (empty string if this is a new attribute)
-
DataSet &
Attribute
(const std::string &name, const std::string &value)¶ Sets this dataset’s XML attribute
name
, withvalue
.These are the attributes attached to the root dataset element:
<SubreadSet foo="x" bar="y" />
Built-in accessors exist for the standard attributes (e.g. CreatedAt) but additional attributes can be used as well via these generic methods.
The attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
name
: root element’s attribute namevalue
: new value for the attribute
-
DataSet &
CreatedAt
(const std::string &createdAt)¶ Sets this dataset’s CreatedAt attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
createdAt
: new value for the attribute
-
DataSet &
Format
(const std::string &format)¶ Sets this dataset’s Format attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
format
: new value for the attribute
-
DataSet &
MetaType
(const std::string &metatype)¶ Sets this dataset’s MetaType attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
metatype
: new value for the attribute
-
DataSet &
ModifiedAt
(const std::string &modifiedAt)¶ Sets this dataset’s ModifiedAt attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
modifiedAt
: new value for the attribute
-
DataSet &
Name
(const std::string &name)¶ Sets this dataset’s Name attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
name
: new value for the attribute
-
DataSet &
ResourceId
(const std::string &resourceId)¶ Sets this dataset’s ResourceId attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
resourceId
: new value for the attribute
-
DataSet &
Tags
(const std::string &tags)¶ Sets this dataset’s Tags attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
tags
: new value for the attribute
-
DataSet &
TimeStampedName
(const std::string &timeStampedName)¶ Sets this dataset’s TimeStampedName attribute.
This attribute will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
timeStampedName
: new value for the attribute
Child Elements
-
const PacBio::BAM::Extensions &
Extensions
() const¶ Fetches the dataset’s Extensions element.
- Return
- const reference to child element
- Exceptions
std::runtime_error
: if element does not exist
-
const PacBio::BAM::ExternalResources &
ExternalResources
() const¶ Fetches the dataset’s ExternalResources element.
- Return
- const reference to child element
- Exceptions
std::runtime_error
: if element does not exist
-
const PacBio::BAM::Filters &
Filters
() const¶ Fetches the dataset’s Filters element.
- Return
- const reference to child element
-
const PacBio::BAM::DataSetMetadata &
Metadata
() const¶ Fetches the dataset’s DataSetMetadata element.
- Return
- const reference to child element
-
const PacBio::BAM::SubDataSets &
SubDataSets
() const¶ Fetches the dataset’s DataSets element.
- Return
- const reference to child element
-
PacBio::BAM::Extensions &
Extensions
()¶ Fetches the dataset’s Extensions element.
This element will be created if it does not yet exist.
- Return
- non-const reference to child element
-
PacBio::BAM::ExternalResources &
ExternalResources
()¶ Fetches the dataset’s ExternalResources element.
This element will be created if it does not yet exist.
- Return
- non-const reference to child element
-
PacBio::BAM::Filters &
Filters
()¶ Fetches the dataset’s Filters element.
This element will be created if it does not yet exist.
- Return
- non-const reference to child element
-
PacBio::BAM::DataSetMetadata &
Metadata
()¶ Fetches the dataset’s DataSetMetadata element.
This element will be created if it does not yet exist.
- Return
- non-const reference to child element
-
PacBio::BAM::SubDataSets &
SubDataSets
()¶ Fetches the dataset’s DataSets element.
This element will be created if it does not yet exist.
- Return
- non-const reference to child element
-
DataSet &
Extensions
(const PacBio::BAM::Extensions &extensions)¶ Sets this dataset’s Extensions element.
This element will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
extensions
: new value for the element
-
DataSet &
ExternalResources
(const PacBio::BAM::ExternalResources &resources)¶ Sets this dataset’s ExternalResources element.
This element will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
resources
: new value for the element
-
DataSet &
Filters
(const PacBio::BAM::Filters &filters)¶ Sets this dataset’s Filters element.
This element will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
filters
: new value for the element
-
DataSet &
Metadata
(const PacBio::BAM::DataSetMetadata &metadata)¶ Sets this dataset’s DataSetMetadata element.
This element will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
metadata
: new value for the element
-
DataSet &
SubDataSets
(const PacBio::BAM::SubDataSets &subdatasets)¶ Sets this dataset’s DataSets element.
This element will be created if it does not yet exist.
- Return
- reference to this dataset object
- Parameters
subdatasets
: new value for the element
Resource Handling
-
std::vector<std::string>
AllFiles
() const¶ Returns all of this dataset’s resource files, with relative filepaths already resolved.
Includes both primary resources (e.g. subread BAM files), as well as all secondary or child resources (e.g. index files, scraps BAM, etc).
- Return
- vector of (resolveD) filepaths
- See
- DataSet::ResolvedResourceIds
-
std::vector<BamFile>
BamFiles
() const¶ Returns this dataset’s primary BAM resources, with relative filepaths already resolved.
Primary resources are those listed as top-level ExternalResources, not associated files (indices, references, scraps BAMs, etc.).
- Return
- vector of BamFiles
- See
- DataSet::ResolvedResourceIds
-
std::vector<std::string>
FastaFiles
() const¶ Returns this dataset’s primary FASTA resources, with relative filepaths already resolved.
Primary resources are those listed as top-level ExternalResources, not associated files (indices, references, scraps BAMs, etc.).
- Return
- vector of filepaths to FASTA resources
- See
- DataSet::ResolvedResourceIds
-
std::vector<std::string>
ResolvedResourceIds
() const¶ Returns all primary external resource filepaths, with relative paths resolved.
Primary resources are those listed as top-level ExternalResources, not associated files (indices, references, scraps BAMs, etc.).
- See
- ResolvePath
- Return
- resourceIds
-
std::string
ResolvePath
(const std::string &originalPath) const¶ Resolves a filepath (that may be relative to the dataset).
A DataSet‘s resources may be described using absolute filepaths or with relative paths. For absolute paths, nothing is changed from the input. For relative paths, these are resolved using the DataSet‘s own path as a starting point. A DataSet‘s own path will be one of: 1 - the location of its XML or BAM input file, e.g. created using DataSet(“foo.xml”) or DataSet(“foo.bam”) 2 - application’s current working directory for all other DataSet construction methods { DataSet(), DataSet(type), DataSet(“foo.fofn”) }
- Return
- resolved path
- Parameters
originalPath
: input file path (absolute or relative)
-
std::set<std::string>
SequencingChemistries
() const¶ - Return
- sequence chemistry info for all read groups in this dataset
- See
- ReadGroupInfo::SequencingChemistry
XML Namespace Handling
-
const NamespaceRegistry &
Namespaces
() const¶ Access this dataset’s namespace info.
- Return
- const reference to dataset’s NamespaceRegistry
-
NamespaceRegistry &
Namespaces
()¶ Access this dataset’s namespace info.
- Return
- non-const reference to dataset’s NamespaceRegistry
-
enum