Class: Data Set (DataSet)
A collection of related data items or records that are organized together in a common format or structure, to enable their computational manipulation as a unit.
Comments
- Instances of this class represent a specific version of a given dataset. Examples include the gnomAD version 3.1.2 dataset carrying allele population frequencies, or a specific version of a VCF file that describes variations observed in a particular patient and various annotations made on them, or a SIFT dataset of computational predictions functional impact for a set of variants.
URI: sepio_linkml:DataSet
Inheritance
- Entity
- InformationEntity
- DataSet
- InformationEntity
Slots
Direct slots
A specific type of data set the DataSet instance represents (e.g. a 'clinical data set', a 'sequencing data set', a 'gene expression data set', a 'genome annotation data set').
Implementation Guidance
* This attribute can be used to report a specific type for the DataSet, in cases where a model does not define DataSet subclasses for this purpose. Implementers can define their own set of data set type codes/terms, to match the needs of the domain or application.releaseDate --> String [0..1]
Indicates when a version of a DataSet was formally released.
Implementation Guidance
* This attribute may apply to version and distribution-level DataSet representations. It refers to when the data set was released, which may be different than the data set was generated.The version of the DataSet, as assigned by its creator.
Implementation Guidance
* This property can capture version information for resources such as data sets and documents that get published and released as a unit for community use. These may go through rounds of revisions that add or modify content, but don't change the identity of the resource. Use this attribute in cases where version is not reflected in an identifier associated with the data setA specific license that dictates legal permissions for how a data set can be used (by whom, where, for what purposes, with what additional requirements, etc.).
Implementation Guidance
* Where possible, provide a URL pointing to the license or a description of it (e.g. "https://creativecommons.org/licenses/by/4.0/"). Otherwise, provide a free-text name or description of the license (e.g. "CC-BY").Inherited slots
An entity or concept in the world that the information entity describes/is about.
Inherited from: InformationEntity
Implementation Guidance
* e.g. In the context of a Statement object, this attribute may be used to indicate entities/concepts it is about, in lieu of providing a more precise description of what the Statement asserts to be true using subject, predicate, object, and qualifier properties. e.g. the Statement that "BRCA2 c.8023A>G is pathogenic for Breast Cancer" might be annotated to be about the variant 'BRCA2 c.8023A>G', and the disease 'Breast Cancer'.contributions --> Contribution [*]
Specific actions taken by an Agent toward the creation, modification, validation, or deprecation of an Information Entity.
Inherited from: InformationEntity
Implementation Guidance
* This attribute holds one or more Contribution objects, which provide structured descriptions of a contribution made to the Information Entity by a particular agent.dateAuthored --> String [0..1]
Indicates when the information content expressed in the Information Entity was generated.
Inherited from: InformationEntity
Implementation Guidance
* The term 'authored' as used in the model refers to the generation of 'information content' in the abstract sense, as opposed to a concrete encoding of this information in a specific language or format. e.g. for a Statement, this attribute captures when the information content expressed in the Statement was first generated by an agent. Information about when a particular concrete encoding of this information was created (e.g. as row in a table, or object in a json document) would live in a RecordMetadata object attached to the Information Entity).specifiedBy --> String [*]
A specification that describes all or part of the process that led to creation of the Information Entity.
Inherited from: InformationEntity
Implementation Guidance
* Examples - an experimental protocol or data analysis specification that describe how data were generated, or an evidence interpretation guideline that describes steps taken to interpret data in making a variant pathogenicity classification.* Note that this attribute captures specific *instances* of specifications/methods (e.g. the specific electron microscopy method described in https://doi.org/10.1002/ cpz1.1045) - as opposed to reporting a *type* of method applied (e.g. "Transmission Electron Microscopy").
supportingMethods --> String [*]
Specific methods that were executed to directly or indirectly support creation of the Information Entity.
Inherited from: InformationEntity
Implementation Guidance
* These may include methods that directly produced the Information Entity, or upstream/accessory methods that indirectly support creation of the Information Entity - e.g. methods used to produce data that was interpreted as evidence to generate a Statement of knowledge.* This field captures terms representing specific INSTANCES of methods applied, vs the 'supportingMethodTypes' attribute which captures TYPES of methods used.
supportingMethodTypes --> Coding [*]
Types of methodological approaches that were executed to directly or indirectly support creation of the Information Entity.
Inherited from: InformationEntity
Implementation Guidance
* This field captures terms representing TYPES of methods applied, vs the 'specifiedBy' or'supportingMethods' attributes which capture specific INSTANCES of methods used. These may include types of methods that directly produced the Information Entity, or upstream/accessory methods that indirectly support creation of the Information Entity* Implementers should define a relevant source or set of method type codes/terms to use here, based on the needs of the domain or application.'
derivedFrom --> InformationEntity [*]
Another Information Entity from which this Information Entity is derived, in whole or in part.
Inherited from: InformationEntity
reportedIn --> String [*]
A document in which the Information Entity is reported.
Inherited from: InformationEntity
Implementation Guidance
* This attribute is used specifically to reference documents/publications where the Information Entity is expressed or reported. For a Statement, this might be a publication where the authors express the statement in text. For a Data Item, this might be a publication with a table or figure that reports the value of the data.A document or other informtion resource in which the information entity, or evidence supporting it, is reported.
Inherited from: InformationEntity
Implementation Guidance
* This attribute is more general than InformationEntity.reportedIn (which is used to references a Docuement that directly reports the infrormation), and Statement.hasEvidenceFromSources (which is used to reference resources that provided evidence used to generate the knowledge expressed in a Statement). It can be used to cover both cases, in situations where a data provider does not know which is the case, or does not wish to make the distinction.informationQuality --> Coding [0..1]
A qualitative term indicating the scientific rigor or reliability with which the information was generated/collected.
Inherited from: InformationEntity
Implementation Guidance
* This is typically based on the quality of design and execution of the study or curation activity that generated it (e.g. were relevant controls assessed to show instruments were working, were all samples taken care of and handled identically, are methods sound and well documented, etc.).* The quality of information is intrinsic to the information itself, and not to a particular application of the information (e.g. as evidence for making an Assertion)
* The quality of information is one factor that goes into the confidence we have in the information''s veracity (i.e. that it is an accurate reflection of reality it intends to measure or describe). Other factors informing confidence may include who did it (we may just not trust some Agents), when (if data created 500 years ago, we may have less confidence in it), and for Assertions, the relevance and abundance of supporting evidence.
* Implementers should define a relevant source of codes or terms to use here, based on the needs of the domain or application.'
recordMetadata --> RecordMetadata [0..1]
Provenance metadata about a specific concrete record of information as encoded/serialized in a particular data set or object (as opposed to provenance about the abstract information content the encoding carries).
Inherited from: InformationEntity
Implementation Guidance
* This attribute holds a structured RecordMetadata objects, which can be used to capture when, how, and by whom a record serialization was generated or modified; what upstream resources it was derived or retrieved from; and record-level administrative information such as versioning and lifecycle status.The 'logical' identifier of the entity in the system of record, e.g. a UUID. This 'id' is unique within a given system, but may or may not be globally unique outside the system. It is used within a system to reference one object from another.
Inherited from: Entity
Implementation Guidance
* Note that it is common for implementers to create their own internal logical ids - typically a serially or randomly generated value like a UUID that is assigned to the data object as it is created in a system. But an implementer may choose to reuse an existing, globally unique id from an external system or authority for this purpose (e.g. an HGNC id for a Gene object) - as long as it is unique within the implementing system, and can be used to reference the identified object in this context.identifiers --> String [*]
A globally-unique 'business' identifier or accession number for the real-world entity represented by a data object. These are typically assigned by an external system or authority, and used to connect entities and share content across different systems.
Inherited from: Entity
Implementation Guidance
* Preferred values for this attribute are CURIEs or URIs - so the system that provisioned the identifier is clear.* A given real world entity - e.g. a genetic variant - may have many business identifiers defined by different systems, which can be captured in the "identifiers" property to indicate that they represent the same thing.
The name of the class that is instantiated by a data object representing the Entity.
Inherited from: Entity
Implementation Guidance
* MUST be the label of a concrete class from the data model.A primary name for the Entity.
Inherited from: Entity
alternativeLabels --> String [*]
Alternative name(s) for the Entity.
Inherited from: Entity
description --> String [0..1]
A free text description of the Entity.
Inherited from: Entity
extensions --> Extension [*]
A list of extensions to the Entity, that allow for capture of information not directly supported by elements defined in the model.
Inherited from: Entity
Implementation Guidance
* Extension objects have a key-value data structure that allows definition of custom fields in the data itself. Extensions are not expected to be natively understood, but may be used for pre-negotiated exchange of message attributes between systems.Usages
used by | used in | type | used |
---|---|---|---|
StudyResult | sourceDataSet | range | DataSet |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/sepio-framework/sepio-linkml
Mappings
Mapping Type | Mapped Value |
---|---|
self | sepio_linkml:DataSet |
native | sepio_linkml:DataSet |
LinkML Source
Direct
name: DataSet
description: A collection of related data items or records that are organized together
in a common format or structure, to enable their computational manipulation as a
unit.
title: Data Set
comments:
- Instances of this class represent a *specific version* of a given dataset. Examples
include the gnomAD version 3.1.2 dataset carrying allele population frequencies,
or a specific version of a VCF file that describes variations observed in a particular
patient and various annotations made on them, or a SIFT dataset of computational
predictions functional impact for a set of variants.
from_schema: https://w3id.org/sepio-framework/sepio-linkml
status: Draft
is_a: InformationEntity
attributes:
subtype:
name: subtype
description: A specific type of data set the DataSet instance represents (e.g.
a 'clinical data set', a 'sequencing data set', a 'gene expression data set',
a 'genome annotation data set').
comments:
- This attribute can be used to report a specific type for the DataSet, in cases
where a model does not define DataSet subclasses for this purpose. Implementers
can define their own set of data set type codes/terms, to match the needs of
the domain or application.
from_schema: https://w3id.org/sepio-model
status: Informative
domain_of:
- Method
- Document
- DataItem
- DataSet
- Activity
- Agent
- EvidenceLine
range: Coding
required: false
multivalued: false
releaseDate:
name: releaseDate
description: Indicates when a version of a DataSet was formally released.
comments:
- This attribute may apply to version and distribution-level DataSet representations.
It refers to when the data set was released, which may be different than the
data set was generated.
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
domain_of:
- DataSet
range: string
required: false
multivalued: false
version:
name: version
description: The version of the DataSet, as assigned by its creator.
comments:
- This property can capture version information for resources such as data sets
and documents that get published and released as a unit for community use. These
may go through rounds of revisions that add or modify content, but don't change
the identity of the resource. Use this attribute in cases where version is not
reflected in an identifier associated with the data set
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
domain_of:
- DataSet
range: string
required: false
multivalued: false
license:
name: license
description: A specific license that dictates legal permissions for how a data
set can be used (by whom, where, for what purposes, with what additional requirements,
etc.).
comments:
- Where possible, provide a URL pointing to the license or a description of it
(e.g. "https://creativecommons.org/licenses/by/4.0/"). Otherwise, provide a
free-text name or description of the license (e.g. "CC-BY").
from_schema: https://w3id.org/sepio-model
status: Informative
domain_of:
- Document
- DataSet
range: string
required: false
multivalued: false
Induced
name: DataSet
description: A collection of related data items or records that are organized together
in a common format or structure, to enable their computational manipulation as a
unit.
title: Data Set
comments:
- Instances of this class represent a *specific version* of a given dataset. Examples
include the gnomAD version 3.1.2 dataset carrying allele population frequencies,
or a specific version of a VCF file that describes variations observed in a particular
patient and various annotations made on them, or a SIFT dataset of computational
predictions functional impact for a set of variants.
from_schema: https://w3id.org/sepio-framework/sepio-linkml
status: Draft
is_a: InformationEntity
attributes:
subtype:
name: subtype
description: A specific type of data set the DataSet instance represents (e.g.
a 'clinical data set', a 'sequencing data set', a 'gene expression data set',
a 'genome annotation data set').
comments:
- This attribute can be used to report a specific type for the DataSet, in cases
where a model does not define DataSet subclasses for this purpose. Implementers
can define their own set of data set type codes/terms, to match the needs of
the domain or application.
from_schema: https://w3id.org/sepio-model
status: Informative
alias: subtype
owner: DataSet
domain_of:
- Method
- Document
- DataItem
- DataSet
- Activity
- Agent
- EvidenceLine
range: Coding
required: false
multivalued: false
releaseDate:
name: releaseDate
description: Indicates when a version of a DataSet was formally released.
comments:
- This attribute may apply to version and distribution-level DataSet representations.
It refers to when the data set was released, which may be different than the
data set was generated.
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: releaseDate
owner: DataSet
domain_of:
- DataSet
range: string
required: false
multivalued: false
version:
name: version
description: The version of the DataSet, as assigned by its creator.
comments:
- This property can capture version information for resources such as data sets
and documents that get published and released as a unit for community use. These
may go through rounds of revisions that add or modify content, but don't change
the identity of the resource. Use this attribute in cases where version is not
reflected in an identifier associated with the data set
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: version
owner: DataSet
domain_of:
- DataSet
range: string
required: false
multivalued: false
license:
name: license
description: A specific license that dictates legal permissions for how a data
set can be used (by whom, where, for what purposes, with what additional requirements,
etc.).
comments:
- Where possible, provide a URL pointing to the license or a description of it
(e.g. "https://creativecommons.org/licenses/by/4.0/"). Otherwise, provide a
free-text name or description of the license (e.g. "CC-BY").
from_schema: https://w3id.org/sepio-model
status: Informative
alias: license
owner: DataSet
domain_of:
- Document
- DataSet
range: string
required: false
multivalued: false
isAbout:
name: isAbout
description: An entity or concept in the world that the information entity describes/is
about.
comments:
- e.g. In the context of a Statement object, this attribute may be used to indicate
entities/concepts it is about, in lieu of providing a more precise description
of what the Statement asserts to be true using subject, predicate, object, and
qualifier properties. e.g. the Statement that "BRCA2 c.8023A>G is pathogenic
for Breast Cancer" might be annotated to be about the variant 'BRCA2 c.8023A>G',
and the disease 'Breast Cancer'.
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: isAbout
owner: DataSet
domain_of:
- InformationEntity
range: string
required: false
multivalued: true
contributions:
name: contributions
description: Specific actions taken by an Agent toward the creation, modification,
validation, or deprecation of an Information Entity.
comments:
- This attribute holds one or more Contribution objects, which provide structured
descriptions of a contribution made to the Information Entity by a particular
agent.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: contributions
owner: DataSet
domain_of:
- InformationEntity
- RecordMetadata
range: Contribution
required: false
multivalued: true
dateAuthored:
name: dateAuthored
description: Indicates when the information content expressed in the Information
Entity was generated.
comments:
- The term 'authored' as used in the model refers to the generation of 'information
content' in the abstract sense, as opposed to a concrete encoding of this information
in a specific language or format. e.g. for a Statement, this attribute captures
when the information content expressed in the Statement was first generated
by an agent. Information about when a particular concrete encoding of this
information was created (e.g. as row in a table, or object in a json document)
would live in a RecordMetadata object attached to the Information Entity).
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: dateAuthored
owner: DataSet
domain_of:
- InformationEntity
range: string
required: false
multivalued: false
specifiedBy:
name: specifiedBy
description: A specification that describes all or part of the process that led
to creation of the Information Entity.
comments:
- Examples - an experimental protocol or data analysis specification that describe
how data were generated, or an evidence interpretation guideline that describes
steps taken to interpret data in making a variant pathogenicity classification.
- Note that this attribute captures specific *instances* of specifications/methods
(e.g. the specific electron microscopy method described in https://doi.org/10.1002/
cpz1.1045) - as opposed to reporting a *type* of method applied (e.g. "Transmission
Electron Microscopy").
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: specifiedBy
owner: DataSet
domain_of:
- InformationEntity
- Activity
range: string
required: false
multivalued: true
supportingMethods:
name: supportingMethods
description: Specific methods that were executed to directly or indirectly support
creation of the Information Entity.
comments:
- These may include methods that directly produced the Information Entity, or
upstream/accessory methods that indirectly support creation of the Information
Entity - e.g. methods used to produce data that was interpreted as evidence
to generate a Statement of knowledge.
- This field captures terms representing specific INSTANCES of methods applied,
vs the 'supportingMethodTypes' attribute which captures TYPES of methods used.
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: supportingMethods
owner: DataSet
domain_of:
- InformationEntity
range: string
required: false
multivalued: true
supportingMethodTypes:
name: supportingMethodTypes
description: Types of methodological approaches that were executed to directly
or indirectly support creation of the Information Entity.
comments:
- This field captures terms representing TYPES of methods applied, vs the 'specifiedBy'
or'supportingMethods' attributes which capture specific INSTANCES of methods
used. These may include types of methods that directly produced the Information
Entity, or upstream/accessory methods that indirectly support creation of the
Information Entity
- Implementers should define a relevant source or set of method type codes/terms
to use here, based on the needs of the domain or application.'
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: supportingMethodTypes
owner: DataSet
domain_of:
- InformationEntity
range: Coding
required: false
multivalued: true
derivedFrom:
name: derivedFrom
description: Another Information Entity from which this Information Entity is
derived, in whole or in part.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: derivedFrom
owner: DataSet
domain_of:
- InformationEntity
range: InformationEntity
required: false
multivalued: true
reportedIn:
name: reportedIn
description: A document in which the Information Entity is reported.
comments:
- This attribute is used specifically to reference documents/publications where
the Information Entity is expressed or reported. For a Statement, this might
be a publication where the authors express the statement in text. For a Data
Item, this might be a publication with a table or figure that reports the value
of the data.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: reportedIn
owner: DataSet
domain_of:
- InformationEntity
range: string
required: false
multivalued: true
sources:
name: sources
description: A document or other informtion resource in which the information
entity, or evidence supporting it, is reported.
comments:
- This attribute is more general than InformationEntity.reportedIn (which is used
to references a Docuement that directly reports the infrormation), and Statement.hasEvidenceFromSources
(which is used to reference resources that provided evidence used to generate
the knowledge expressed in a Statement). It can be used to cover both cases,
in situations where a data provider does not know which is the case, or does
not wish to make the distinction.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: sources
owner: DataSet
domain_of:
- InformationEntity
range: string
required: false
multivalued: true
informationQuality:
name: informationQuality
description: A qualitative term indicating the scientific rigor or reliability
with which the information was generated/collected.
comments:
- This is typically based on the quality of design and execution of the study
or curation activity that generated it (e.g. were relevant controls assessed
to show instruments were working, were all samples taken care of and handled
identically, are methods sound and well documented, etc.).
- The quality of information is intrinsic to the information itself, and not to
a particular application of the information (e.g. as evidence for making an
Assertion)
- The quality of information is one factor that goes into the confidence we have
in the information''s veracity (i.e. that it is an accurate reflection of reality
it intends to measure or describe). Other factors informing confidence may include
who did it (we may just not trust some Agents), when (if data created 500 years
ago, we may have less confidence in it), and for Assertions, the relevance and
abundance of supporting evidence.
- Implementers should define a relevant source of codes or terms to use here,
based on the needs of the domain or application.'
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: informationQuality
owner: DataSet
domain_of:
- InformationEntity
range: Coding
required: false
multivalued: false
recordMetadata:
name: recordMetadata
description: Provenance metadata about a specific concrete record of information
as encoded/serialized in a particular data set or object (as opposed to provenance
about the abstract information content the encoding carries).
comments:
- This attribute holds a structured RecordMetadata objects, which can be used
to capture when, how, and by whom a record serialization was generated or modified;
what upstream resources it was derived or retrieved from; and record-level administrative
information such as versioning and lifecycle status.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: recordMetadata
owner: DataSet
domain_of:
- InformationEntity
range: RecordMetadata
required: false
multivalued: false
id:
name: id
description: The 'logical' identifier of the entity in the system of record, e.g.
a UUID. This 'id' is unique within a given system, but may or may not be globally
unique outside the system. It is used within a system to reference one object
from another.
comments:
- Note that it is common for implementers to create their own internal logical
ids - typically a serially or randomly generated value like a UUID that is assigned
to the data object as it is created in a system. But an implementer may choose
to reuse an existing, globally unique id from an external system or authority
for this purpose (e.g. an HGNC id for a Gene object) - as long as it is unique
within the implementing system, and can be used to reference the identified
object in this context.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: id
owner: DataSet
domain_of:
- Entity
range: string
required: true
multivalued: false
identifiers:
name: identifiers
description: A globally-unique 'business' identifier or accession number for the
real-world entity represented by a data object. These are typically assigned
by an external system or authority, and used to connect entities and share content
across different systems.
comments:
- Preferred values for this attribute are CURIEs or URIs - so the system that
provisioned the identifier is clear.
- A given real world entity - e.g. a genetic variant - may have many business
identifiers defined by different systems, which can be captured in the "identifiers"
property to indicate that they represent the same thing.
from_schema: https://w3id.org/sepio-model
status: Informative
rank: 1000
alias: identifiers
owner: DataSet
domain_of:
- Entity
range: string
required: false
multivalued: true
type:
name: type
description: The name of the class that is instantiated by a data object representing
the Entity.
comments:
- MUST be the label of a concrete class from the data model.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: type
owner: DataSet
domain_of:
- Entity
range: string
required: true
multivalued: false
label:
name: label
description: A primary name for the Entity.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: label
owner: DataSet
domain_of:
- Entity
- Coding
range: string
required: false
multivalued: false
alternativeLabels:
name: alternativeLabels
description: Alternative name(s) for the Entity.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: alternativeLabels
owner: DataSet
domain_of:
- Entity
range: string
required: false
multivalued: true
description:
name: description
description: A free text description of the Entity.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: description
owner: DataSet
domain_of:
- Entity
range: string
required: false
multivalued: false
extensions:
name: extensions
description: A list of extensions to the Entity, that allow for capture of information
not directly supported by elements defined in the model.
comments:
- Extension objects have a key-value data structure that allows definition of
custom fields in the data itself. Extensions are not expected to be natively
understood, but may be used for pre-negotiated exchange of message attributes
between systems.
from_schema: https://w3id.org/sepio-model
status: Draft
rank: 1000
alias: extensions
owner: DataSet
domain_of:
- Entity
range: Extension
required: false
multivalued: true