Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Title Required
| Name | title |
| Description | The title for your dataset. This will be displayed when search results including your data are shown. Often this will be the same as an associated publication. |
| Example | SARS-COV-2 drug repurposing - Caco2 cell line |
| Reference | # |
| Regex | ^.{25,}$ |
| Namespace | rembi:title |
Description Required
| Name | description |
| Description | Use this field to describe your dataset. This can be the abstract to an accompanying publication. |
| Example | High-throughput screening of repurposed drugs against SARS-CoV-2 in Caco-2 cells |
| Reference | http://purl.org/dc/terms/1.1/title |
| Regex | ^.{25,}$ |
| Namespace | rembi:description |
Release Date Required
| Name | private_until_date |
| Description | The date until which the data remains private and embargoed. |
| Example | 2027-06-01T00:00:00 |
| Reference | http://purl.obolibrary.org/obo/SLSO_0001056 |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | rembi:private_until_date |
Keywords Required
| Name | keywords |
| Description | Keywords describing your data that can be used to aid search and classification. |
| Example | CRISPR |
| Reference | http://schema.org/keywords |
| Namespace | rembi:keywords |
Licence
| Name | licence |
| Description | The license under which the data are available. |
| Example | MIT License |
| Reference | http://purl.org/dc/elements/1.1/license |
| Namespace | rembi:licence |
| Allowed Values | Apache License 2.0 Creative Commons Attribution 4.0 International Creative Commons Attribution Share Alike 4.0 International Creative Commons Zero v1.0 Universal GNU General Public License v3.0 or later MIT License |
Funding Statement
| Name | funding_statement |
| Description | A description of how the data generation was funded. |
| Example | Data generation for this study was supported by a grant from the BBSRC, which funded annotation and analysis activities. |
| Reference | http://purl.obolibrary.org/obo/IAO_0000623 |
| Namespace | rembi:funding_statement |
Acknowledgements
| Name | acknowledgements |
| Description | Any people or groups that should be acknowledged as part of the dataset. |
| Example | We acknowledge the contributions of the field research team at the University of Edinburgh, the sequencing support from the Earlham Institute, and funding provided by the BBSRC. Special thanks to local conservation volunteers for assistance in sample collection. |
| Reference | http://purl.obolibrary.org/obo/IAO_0000324 |
| Namespace | rembi:acknowledgements |
Rembi Version Required
| Name | rembi_version |
| Description | The version of REMBI. The current version to be used is 1.5. |
| Example | 1.5 |
| Reference | # |
| Regex | ^1\.5$ |
| Namespace | rembi:rembi_version |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Identifier Required
| Name | identifier |
| Description | The identifier for the grant. |
| Example | 12345 |
| Reference | http://purl.org/dc/terms/identifier |
| Namespace | rembi:identifier |
Funder Required
| Name | funder |
| Description | The funding body provididing support. |
| Example | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Reference | https://schema.org/funder |
| Namespace | rembi:funder |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Title Required
| Name | title |
| Description | Title of associated publication. |
| Example | High-throughput drug screening identifies potential SARS-CoV-2 inhibitors in Caco2 cells |
| Reference | http://purl.org/dc/terms/1.1/title |
| Namespace | rembi:title |
DOI
| Name | doi |
| Description | A Digital Object Identifier (DOI) is a unique alphanumeric string assigned to a digital object, such as a journal article, dataset, or publication, to provide a permanent link to its location on the internet. It ensures reliable citation and access. The DOI should follow the standard format (e.g., 10.1234/example.doi) and link to the original source of the publication or data referenced. |
| Example | 10.1038/s41586-020-2577-1 |
| Reference | http://purl.obolibrary.org/obo/ONTOAVIDA_00000015 |
| Regex | ^10\.\d{4,9}/[-._;()/:A-Za-z0-9]+$ |
| Namespace | rembi:doi |
Year
| Name | year |
| Description | Year of publication. |
| Example | 2025 |
| Reference | http://rs.tdwg.org/dwc/terms/year |
| Regex | ^(19|20)\d{2}$ |
| Namespace | rembi:year |
Pubmed ID
| Name | pubmed_id |
| Description | PubMed identifier for the publication. |
| Example | 32726801 |
| Reference | http://purl.obolibrary.org/obo/MS_1000879 |
| Regex | ^\d{1,8}$ |
| Namespace | rembi:pubmed_id |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Link URL Required
| Name | link_url |
| Description | The URL of a link relevant to the dataset. |
| Example | https://example.org/zebrafish-embryo |
| Reference | # |
| Namespace | rembi:link_url |
Link Type
| Name | link_type |
| Description | The type of the link. |
| Example | Dataset |
| Reference | # |
| Namespace | rembi:link_type |
Link Description
| Name | link_description |
| Description | The description of the linked content. |
| Example | Image analysis code |
| Reference | # |
| Namespace | rembi:link_description |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Study Component ID Required
| Name | study_component_id |
| Description | A unique alphanumeric identifier for the study component |
| Example | STUDYCOMP001 |
| Reference | # |
| Namespace | ei:study_component_id |
Name Required
| Name | name |
| Description | The name of your study component. |
| Example | Confocal images |
| Reference | # |
| Namespace | rembi:name |
Description Required
| Name | description |
| Description | An explanation of your study component. |
| Example | Stitched max-projected fluorescent confocal images |
| Reference | # |
| Namespace | rembi:description |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Annotation ID Required
| Name | annotation_id |
| Description | A unique alphanumeric identifier for the image annotation record. |
| Example | ANNOT001 |
| Reference | # |
| Namespace | rembi:annotation_id |
Annotation Overview Required
| Name | annotation_overview |
| Description | Short descriptive summary indicating the type of annotation and how it was generated |
| Example | Cell nuclei marked using DAPI staining. |
| Reference | # |
| Namespace | rembi:annotation_overview |
File Type
| Name | file_type |
| Description | The format of the annotation file. |
| Example | gff |
| Reference | http://purl.obolibrary.org/obo/SLSO_0001157 |
| Namespace | rembi:file_type |
Annotation Type
| Name | annotation_type |
| Description | Defines the type of annotation (e.g., class_labels, bounding_boxes, counts, derived_annotations). |
| Example | geometrical_annotations |
| Reference | http://purl.obolibrary.org/obo/NCIT_C89919 |
| Namespace | rembi:annotation_type |
| Allowed Values | bounding_boxes class_labels counts derived_annotations geometrical_annotations graphs other point_annotations segmentation_mask tracks weak_annotations |
Annotation Method Required
| Name | annotation_method |
| Description | Description of how the annotations where created. Including protocols used for consensus and quality assurance, if applicable. |
| Example | crowdsourced |
| Reference | # |
| Namespace | rembi:annotation_method |
Annotation Criteria
| Name | annotation_criteria |
| Description | Rules used to generate annotations |
| Example | only nuclei in focus were segmented |
| Reference | # |
| Namespace | rembi:annotation_criteria |
Annotation Coverage
| Name | annotation_coverage |
| Description | The proportion of images from the dataset that were annotated. |
| Example | All data that satisfied the Annotation Criteria were annotated. |
| Reference | # |
| Namespace | rembi:annotation_coverage |
Annotation Confidence Level
| Name | annotation_confidence_level |
| Description | Confidence on annotation accuracy |
| Example | more than 95% pixel consensus where multiple annotators independently segmented the same object |
| Reference | # |
| Namespace | rembi:annotation_confidence_level |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Person ID Required
| Name | person_id |
| Description | A unique alphanumeric identifier for the author. |
| Example | PERSON001 |
| Reference | # |
| Namespace | ei:person_id |
Annotation ID Required
| Name | annotation_id |
| Description | A unique alphanumeric identifier for the image annotation record. |
| Example | ANNOT001 |
| Reference | # |
| Namespace | ei:annotation_id |
Author First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Author Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | rembi:email |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{4}$ |
| Namespace | rembi:orcid_id |
Affiliation or Institution Required
| Name | affiliation |
| Description | A URL to a public registry containing organisation information or the name of the organisation. A Research Organisation Registry (ROR) URL is recommended if a URL is provided. |
| Example | https://ror.org/018cxtf62 |
| Reference | https://schema.org/affiliation |
| Namespace | rembi:affiliation |
Role
| Name | role |
| Description | Author role in the study. If multiple separate by pipe sybom |
| Example | Senior Bioinformatician |
| Reference | http://www.w3.org/2006/vcard/ns#role |
| Namespace | rembi:role |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique alphanumeric identifier for this sample |
| Example | SAMP001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism Required
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ei:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ei:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ei:biosampleAccession |
Biological Entity Required
| Name | biological_entity |
| Description | What is being imaged |
| Example | Drosophila endoderm |
| Reference | # |
| Namespace | rembi:biological_entity |
Common Name
| Name | common_name |
| Description | Common name |
| Example | rock worm |
| Reference | # |
| Namespace | rembi:common_name |
Description
| Name | description |
| Description | High level description of sample. |
| Example | Bronchial epithelial cell culture |
| Reference | # |
| Namespace | rembi:description |
Intrinsic Variables
| Name | intrinsic_variables |
| Description | Intrinsic (e.g. genetic) alteration if applicable |
| Example | stable overexpression of HIST1H2BJ-mCherry and LMNA |
| Reference | # |
| Namespace | rembi:intrinsic_variables |
Extrinsic Variables
| Name | extrinsic_variables |
| Description | External sample treatment (e.g. reagent) if applicable |
| Example | 2-(9-oxoacridin-10-yl)acetic acid |
| Reference | # |
| Namespace | rembi:extrinsic_variables |
Experimental Variables
| Name | experimental_variables |
| Description | What is intentionally varied (e.g. time) between multiple entries in this study component |
| Example | Time |
| Reference | # |
| Namespace | rembi:experimental_variables |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Specimen ID Required
| Name | specimen_id |
| Description | A unique alphanumeric identifier for this specimen |
| Example | SPEC001 |
| Reference | # |
| Namespace | ei:specimen_id |
Sample ID Required
| Name | sample_id |
| Description | A unique alphanumeric identifier for this sample |
| Example | SAMP001 |
| Reference | # |
| Namespace | ei:sample_id |
Study Component ID Required
| Name | study_component_id |
| Description | A unique alphanumeric identifier for the study component |
| Example | STUDYCOMP001 |
| Reference | # |
| Namespace | ei:study_component_id |
Sample Preparation Required
| Name | sample_preparation |
| Description | How the sample was prepared for imaging. |
| Example | Cells were cultured on poly-L-lysine treated coverslips. Culture media was aspirated, and coverslips were washed once with PBS. Cells were fixed by incubating for 10 min with 4 % formaldehyde/PBS, washed twice with PBS, and permeabilized by incubating (>3 h, -20°C) in 70 % ethanol. Cells were rehydrated by incubating (5 min, RT) with FISH wash buffer (10 % formamide, 2x SSC). For hybridization, coverslips were placed cell-coated side down on a 48μl drop containing 100 nM Quasar570-labelled probes complementary to one of REV-ERBα, CRY2, or TP53 transcripts (Biosearch Technologies) (see Table S6 for probe sequences), 0.1 g/ml dextran sulfate, 1 mg/ml E. coli tRNA, 2 mM VRC, 20 μg/ml BSA, 2x SSC, 10 % formamide and incubated (37°C, 20 h) in a sealed parafilm chamber. Coverslips were twice incubated (37°C, 30 min) in pre-warmed FISH wash buffer, then in PBS containing 0.5 μg/ml 4’,6-diamidino-2-phenylindole (DAPI) (5 min, RT), washed twice with PBS, dipped in water, air-dried, placed cell-coated side down on a drop of ProLong Diamond Antifade Mountant (Life Technologies), allowed to polymerize for 24 h in the dark and then sealed with nail varnish. |
| Reference | # |
| Namespace | rembi:sample_preparation |
Growth Protocol
| Name | growth_protocol |
| Description | How the specimen was grown, e.g. cell line cultures, crosses or plant growth. |
| Example | Cells grown on coverslips were fixed in ice-cold methanol at _20 _ C for 10 min. After blocking in 0.2% gelatine from cold-water fish (Sigma) in PBS (PBS/FSG) for 15 min, coverslips were incubated with primary antibodies in blocking solution for 1h. Following washes with 0.2% PBS/FSG, the cells were incubated with a 1:500 dilution of secondary antibodies for 1 h (donkey anti- mouse/rabbit/goat/sheep conjugated to Alexa 488 or Alexa 594; Molecular Probes or donkey anti-mouse conjugated to DyLight 405, Jackson ImmunoResearch). The cells were counterstained with 1 _g ml_1 Hoechst 33342 (Sigma) to visualize chromatin. After washing with 0.2% PBS/FSG, the coverslips were mounted on glass slides by inverting them into mounting solution (ProLong Gold antifade, Molecular Probes). The samples were allowed to cure for 24-48 h. |
| Reference | # |
| Namespace | rembi:growth_protocol |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Image Acquisition ID Required
| Name | image_acquisition_id |
| Description | A unique alphanumeric identifier for the image acquisition |
| Example | IMGACQ001 |
| Reference | # |
| Namespace | ei:image_acquisition_id |
Specimen ID Required
| Name | specimen_id |
| Description | A unique alphanumeric identifier for this specimen |
| Example | SPEC001 |
| Reference | # |
| Namespace | ei:specimen_id |
Image Method Required
| Name | image_method |
| Description | What method was used to capture images. |
| Example | secondary_electron imaging |
| Reference | FBbi:00000222 |
| Namespace | ei:image_method |
Imaging Instrument Required
| Name | imaging_instrument |
| Description | Description of the instrument used to capture the images. |
| Example | DeltaVision OMX V3 Blaze system (GE Healthcare) equipped with a 60x/1.42 NA PlanApo oil immersion objective (Olympus), pco.edge 5.5 sCMOS cameras (PCO) and 405, 488, 593 and 640 nm lasers |
| Reference | # |
| Namespace | rembi:imaging_instrument |
Image Acquisition Parameters Required
| Name | image_acquisition_parameters |
| Description | How the images were acquired, including instrument settings/parameters. |
| Example | Embryos were imaged on a Luxendo MuVi SPIM light-sheet microscope, using 30x magnification setting on the Nikon 10x/0.3 water objective. The 488 nm laser was used to image nuclei (His-GFP), and the 561 nm laser was used to image transcriptional dots (MCP-mCherry), both at 5% laser power. Exposure time for the green channel was 55 ms and exposure for the red channel was 70 ms. The line illumination tool was used to improve background levels and was set to 40 pixels. |
| Reference | # |
| Namespace | rembi:image_acquisition_parameters |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Image Analysis ID Required
| Name | image_analysis_id |
| Description | A unique alphanumeric identifier for the image analysis |
| Example | IMGANAL001 |
| Reference | # |
| Namespace | ei:image_analysis_id |
Study Component ID Required
| Name | study_component_id |
| Description | A unique alphanumeric identifier for the study component |
| Example | STUDYCOMP001 |
| Reference | # |
| Namespace | ei:study_component_id |
Analysis Overview Required
| Name | analysis_overview |
| Description | How image analysis was carried out. |
| Example | Each 3D-SIM image contained one nucleus (in a small number of cases multiple nuclei were present, which did not affect the analysis). The image analysis pipeline contained six main steps: bivalent skeleton tracing, trace fluorescence intensity quantification, HEI10 peak detection, HEI10 foci identification, HEI10 foci intensity quantification, and total bivalent intensity quantification. Note that the normalization steps used for foci identification differ from those used for foci intensity quantification; the former was intended to robustly identify foci from noisy traces, whilst the latter was used to carefully quantify foci HEI10 levels. |
| Reference | # |
| Namespace | rembi:analysis_overview |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
Image Correlation ID Required
| Name | image_correlation_id |
| Description | A unique alphanumeric identifier for the image correlation |
| Example | IMGCORR001 |
| Reference | # |
| Namespace | ei:image_correlation_id |
Image Analysis ID Required
| Name | image_analysis_id |
| Description | A unique alphanumeric identifier for the image analysis |
| Example | IMGANAL001 |
| Reference | # |
| Namespace | ei:image_analysis_id |
Spatial and Temporal Alignment Required
| Name | spatial_and_temporal_alignment |
| Description | Method used to correlate images from different modalities (e.g. manual overlay, alignment algorithm etc) |
| Example | Alignment algorithm |
| Reference | # |
| Namespace | rembi:spatial_and_temporal_alignment |
Fiducials Used Required
| Name | fiducials_used |
| Description | Features from correlated datasets used for colocalisation |
| Example | Fluorescent bead markers |
| Reference | # |
| Namespace | rembi:fiducials_used |
Transformation Matrix or Other Information Required
| Name | transformation_matrix |
| Description | Correlation transformations |
| Example | Translation and rotation matrix applied using ImageJ plugin |
| Reference | # |
| Namespace | rembi:transformation_matrix |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | FILE001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:file_id |
Study Component ID Required
| Name | study_component_id |
| Description | A unique alphanumeric identifier for the study component |
| Example | STUDYCOMP001 |
| Reference | # |
| Namespace | ei:study_component_id |
Annotation ID Required
| Name | annotation_id |
| Description | A unique alphanumeric identifier for the image annotation record. |
| Example | ANNOT001 |
| Reference | # |
| Namespace | rembi:annotation_id |
Image File name Required
| Name | source_image_id |
| Description | The file name of the image including the extension. Common file names end with tiff, jpeg, png, gif, bmp, and ome-tiff etc. |
| Example | file001.png |
| Reference | # |
| Namespace | rembi:source_image_id |
Transformations
| Name | transformations |
| Description | Any preprocessing or transformations applied to the image. |
| Example | z-stack flattening |
| Reference | # |
| Namespace | rembi:transformations |
Spatial Information
| Name | spatial_information |
| Description | Spatial resolution, scale, or coordinate info related to the image. |
| Example | pixel_size=0.5µm |
| Reference | # |
| Namespace | rembi:spatial_information |
Annotation Creation Time
| Name | annotation_creation_time |
| Description | Timestamp of when the annotation was created. |
| Example | 2025-05-15T14:32:00Z |
| Reference | # |
| Regex | ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$ |
| Namespace | rembi:annotation_creation_time |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Title Required
| Name | title |
| Description | A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Spatial Transcriptomics FISH of Human Lung Tissue |
| Reference | http://purl.org/dc/terms/title |
| Namespace | dcterms:title |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Spatial Transcriptomics |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{4}$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMP001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
Platform Required
| Name | platform |
| Description | The platform used to isolate the cells. |
| Example | Illumina NovaSeq |
| Reference | # |
| Namespace | ei:platform |
Instrument Required
| Name | instrument |
| Description | The instrument used to isolate the cells. |
| Example | Illumina NovaSeq 6000 |
| Reference | # |
| Namespace | ei:instrument |
Target Probe Code Required
| Name | target_probe_code |
| Description | The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell. |
| Example | Oligo-dT |
| Reference | # |
| Namespace | ei:target_probe_code |
Section Thickness (µm)
| Name | section_thickness_µm |
| Description | The thickness of the tissue section in micrometres. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_µm |
Section Thickness Measurement Method
| Name | section_thickness_measurement_method |
| Description | The method used to measure tissue section thickness. |
| Example | Microtome |
| Reference | # |
| Namespace | ei:section_thickness_measurement_method |
Section Thickness Temperature
| Name | section_thickness_temperature |
| Description | The temperature at which the section was made in degree celsius. |
| Example | 22 |
| Reference | # |
| Regex | ^-?\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_temperature |
Is Pathological
| Name | is_pathological |
| Description | A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue. |
| Example | No |
| Reference | # |
| Namespace | ei:is_pathological |
| Allowed Values | No Yes |
Photobleaching Duration In Hours
| Name | photobleaching_duration_in_hours |
| Description | The duration of photobleaching in hours |
| Example | 2 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:photobleaching_duration_in_hours |
Clearing with ProteinaseK Required
| Name | clearing_with_proteinasek |
| Description | The duration of clearing at 47°C with Proteinase K. |
| Example | 24 hrs |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_with_proteinasek |
Clearing without ProteinaseK Required
| Name | clearing_without_proteinasek |
| Description | The duration of tissue clearing at 37°C without Proteinase K. |
| Example | 4.5 days |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_without_proteinasek |
Instrument User Guide Required
| Name | instrument_user_guide |
| Description | The user guide for the instrument used. |
| Example | User Guide |
| Reference | # |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ei:instrument_user_guide |
Instrument User Guide Revision Required
| Name | instrument_user_guide_revision |
| Description | The revision of the instrument user guide. |
| Example | 1.2 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:instrument_user_guide_revision |
Sample Preparation Guide Required
| Name | sample_preparation_guide |
| Description | The guide used for sample preparation. |
| Example | example_guide_v1.0.pdf |
| Reference | # |
| Regex | ^[A-Za-z0-9._-]*[a-z]+$ |
| Namespace | ei:sample_preparation_guide |
Sample Preparation Guide Revision Required
| Name | sample_preparation_guide_revision |
| Description | The revision of the sample preparation guide. |
| Example | 1.0 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:sample_preparation_guide_revision |
Deviations From Official Protocol Required
| Name | deviations_from_official_protocol |
| Description | Any deviations from the official protocol. Separate individual deviations with '|'. |
| Example | Temperature exceeded 25°C during storage | Sample handling delayed by 2 hours |
| Reference | # |
| Namespace | ei:deviations_from_official_protocol |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | FILE001 |
| Reference | # |
| Namespace | ei:file_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
File Name Required
| Name | file_name |
| Description | A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | file001.tiff |
| Reference | # |
| Namespace | ei:file_name |
File Type Required
| Name | file_type |
| Description | A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | tiff |
| Reference | # |
| Namespace | ei:file_type |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Project Name Required
| Name | project_name |
| Description | Official name of the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Spatial Transcriptomics FISH of Human Lung Tissue |
| Reference | https://w3id.org/mixs/0000092 |
| Namespace | mixs:project_name |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Spatial Transcriptomics |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{4}$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMP001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
Platform Required
| Name | platform |
| Description | The platform used to isolate the cells. |
| Example | Illumina NovaSeq |
| Reference | # |
| Namespace | ei:platform |
Instrument Required
| Name | instrument |
| Description | The instrument used to isolate the cells. |
| Example | Illumina NovaSeq 6000 |
| Reference | # |
| Namespace | ei:instrument |
Target Probe Code Required
| Name | target_probe_code |
| Description | The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell. |
| Example | Oligo-dT |
| Reference | # |
| Namespace | ei:target_probe_code |
Section Thickness (µm)
| Name | section_thickness_µm |
| Description | The thickness of the tissue section in micrometres. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_µm |
Section Thickness Measurement Method
| Name | section_thickness_measurement_method |
| Description | The method used to measure tissue section thickness. |
| Example | Microtome |
| Reference | # |
| Namespace | ei:section_thickness_measurement_method |
Section Thickness Temperature
| Name | section_thickness_temperature |
| Description | The temperature at which the section was made in degree celsius. |
| Example | 22 |
| Reference | # |
| Regex | ^-?\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_temperature |
Is Pathological
| Name | is_pathological |
| Description | A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue. |
| Example | No |
| Reference | # |
| Namespace | ei:is_pathological |
| Allowed Values | No Yes |
Photobleaching Duration In Hours
| Name | photobleaching_duration_in_hours |
| Description | The duration of photobleaching in hours |
| Example | 2 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:photobleaching_duration_in_hours |
Clearing with ProteinaseK Required
| Name | clearing_with_proteinasek |
| Description | The duration of clearing at 47°C with Proteinase K. |
| Example | 24 hrs |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_with_proteinasek |
Clearing without ProteinaseK Required
| Name | clearing_without_proteinasek |
| Description | The duration of tissue clearing at 37°C without Proteinase K. |
| Example | 4.5 days |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_without_proteinasek |
Instrument User Guide Required
| Name | instrument_user_guide |
| Description | The user guide for the instrument used. |
| Example | User Guide |
| Reference | # |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ei:instrument_user_guide |
Instrument User Guide Revision Required
| Name | instrument_user_guide_revision |
| Description | The revision of the instrument user guide. |
| Example | 1.2 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:instrument_user_guide_revision |
Sample Preparation Guide Required
| Name | sample_preparation_guide |
| Description | The guide used for sample preparation. |
| Example | example_guide_v1.0.pdf |
| Reference | # |
| Regex | ^[A-Za-z0-9._-]*[a-z]+$ |
| Namespace | ei:sample_preparation_guide |
Sample Preparation Guide Revision Required
| Name | sample_preparation_guide_revision |
| Description | The revision of the sample preparation guide. |
| Example | 1.0 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:sample_preparation_guide_revision |
Deviations From Official Protocol Required
| Name | deviations_from_official_protocol |
| Description | Any deviations from the official protocol. Separate individual deviations with '|'. |
| Example | Temperature exceeded 25°C during storage | Sample handling delayed by 2 hours |
| Reference | # |
| Namespace | ei:deviations_from_official_protocol |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | FILE001 |
| Reference | # |
| Namespace | ei:file_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
File Name Required
| Name | file_name |
| Description | A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | file001.tiff |
| Reference | # |
| Namespace | ei:file_name |
File Type Required
| Name | file_type |
| Description | A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | tiff |
| Reference | # |
| Namespace | ei:file_type |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Title
| Name | title |
| Description | A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Spatial Transcriptomics FISH of Human Lung Tissue |
| Reference | http://purl.org/dc/terms/title |
| Namespace | dcterms:title |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Spatial Transcriptomics |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{4}$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMP001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
Platform Required
| Name | platform |
| Description | The platform used to isolate the cells. |
| Example | Illumina NovaSeq |
| Reference | # |
| Namespace | ei:platform |
Instrument Required
| Name | instrument |
| Description | The instrument used to isolate the cells. |
| Example | Illumina NovaSeq 6000 |
| Reference | # |
| Namespace | ei:instrument |
Target Probe Code Required
| Name | target_probe_code |
| Description | The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell. |
| Example | Oligo-dT |
| Reference | # |
| Namespace | ei:target_probe_code |
Section Thickness (µm)
| Name | section_thickness_µm |
| Description | The thickness of the tissue section in micrometres. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_µm |
Section Thickness Measurement Method
| Name | section_thickness_measurement_method |
| Description | The method used to measure tissue section thickness. |
| Example | Microtome |
| Reference | # |
| Namespace | ei:section_thickness_measurement_method |
Section Thickness Temperature
| Name | section_thickness_temperature |
| Description | The temperature at which the section was made in degree celsius. |
| Example | 22 |
| Reference | # |
| Regex | ^-?\d+(\.\d+)?$ |
| Namespace | ei:section_thickness_temperature |
Is Pathological
| Name | is_pathological |
| Description | A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue. |
| Example | No |
| Reference | # |
| Namespace | ei:is_pathological |
| Allowed Values | No Yes |
Photobleaching Duration In Hours
| Name | photobleaching_duration_in_hours |
| Description | The duration of photobleaching in hours |
| Example | 2 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:photobleaching_duration_in_hours |
Clearing with ProteinaseK Required
| Name | clearing_with_proteinasek |
| Description | The duration of clearing at 47°C with Proteinase K. |
| Example | 24 hrs |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_with_proteinasek |
Clearing without ProteinaseK Required
| Name | clearing_without_proteinasek |
| Description | The duration of tissue clearing at 37°C without Proteinase K. |
| Example | 4.5 days |
| Reference | # |
| Regex | ^\d+(\.\d+)?\s*(hrs?|days?|mins?|seconds?)$ |
| Namespace | ei:clearing_without_proteinasek |
Instrument User Guide Required
| Name | instrument_user_guide |
| Description | The user guide for the instrument used. |
| Example | User Guide |
| Reference | # |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ei:instrument_user_guide |
Instrument User Guide Revision Required
| Name | instrument_user_guide_revision |
| Description | The revision of the instrument user guide. |
| Example | 1.2 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:instrument_user_guide_revision |
Sample Preparation Guide Required
| Name | sample_preparation_guide |
| Description | The guide used for sample preparation. |
| Example | example_guide_v1.0.pdf |
| Reference | # |
| Regex | ^[A-Za-z0-9._-]*[a-z]+$ |
| Namespace | ei:sample_preparation_guide |
Sample Preparation Guide Revision Required
| Name | sample_preparation_guide_revision |
| Description | The revision of the sample preparation guide. |
| Example | 1.0 |
| Reference | # |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ei:sample_preparation_guide_revision |
Deviations From Official Protocol Required
| Name | deviations_from_official_protocol |
| Description | Any deviations from the official protocol. Separate individual deviations with '|'. |
| Example | Temperature exceeded 25°C during storage | Sample handling delayed by 2 hours |
| Reference | # |
| Namespace | ei:deviations_from_official_protocol |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | FILE001 |
| Reference | # |
| Namespace | ei:file_id |
Imaging Protocol ID Required
| Name | imaging_protocol_id |
| Description | A unique alphanumeric identifier for the imaging protocol. |
| Example | IMGPRO001 |
| Reference | # |
| Namespace | ei:imaging_protocol_id |
File Name Required
| Name | file_name |
| Description | A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | file001.tiff |
| Reference | # |
| Namespace | ei:file_name |
File Type Required
| Name | file_type |
| Description | A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc. |
| Example | tiff |
| Reference | # |
| Namespace | ei:file_type |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Title Required
| Name | title |
| Description | A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Study of single cells in the human body |
| Reference | http://purl.org/dc/terms/title |
| Namespace | dcterms:title |
Description Required
| Name | description |
| Description | A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication. |
| Example | This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine. |
| Reference | http://purl.org/dc/terms/description |
| Namespace | dcterms:description |
Bibliographic Citation Required
| Name | bibliographicCitation |
| Description | A citation for the study resource, following a standard format. |
| Example | Doe J., et al. (2024). Single Cell Transcriptomic Analysis of Human Liver Cells. Journal of Cellular Biology. |
| Reference | http://purl.org/dc/terms/bibliographicCitation |
| Namespace | dcterms:bibliographicCitation |
Created Required
| Name | created |
| Description | The date when the study was created or registered. |
| Example | 2024-10-14 |
| Reference | http://purl.org/dc/terms/created |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | dcterms:created |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Laser microdissection |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Technology Required
| Name | technology |
| Description | The sorting or visualisation technology used. |
| Example | Vizgen |
| Reference | # |
| Namespace | ei:technology |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Regex | ^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$ |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Protocol Name Required
| Name | protocol_name |
| Description | A descriptive name of the protocol used for single-cell sequencing. |
| Example | 10X Genomics Single Cell 3' Library Prep |
| Reference | # |
| Namespace | ei:protocol_name |
Dissociation Description Required
| Name | dissociation_description |
| Description | A free-text description of the process used to separate cells from tissues or cell aggregates. |
| Example | Tissue was enzymatically dissociated using collagenase for 30 minutes. |
| Reference | # |
| Namespace | ei:dissociation_description |
Enrichment Markers
| Name | enrichment_markers |
| Description | Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms. |
| Example | CD45 |
| Reference | # |
| Namespace | faang:enrichment_markers |
Isolation Kit
| Name | isolation_kit |
| Description | The kit used to isolate the cells. |
| Example | 10x Nuclei Isolation Kit |
| Reference | # |
| Namespace | ei:isolation_kit |
| Allowed Values | 10x Nuclei Isolation Kit 3' standard throughput kit Custom |
Literature Source Reference
| Name | literature_source_reference |
| Description | Reference to literature sources that describe the protocol or methods used. |
| Example | Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview' |
| Reference | # |
| Namespace | ei:literature_source_reference |
Protocols IO Reference
| Name | protocols_io_reference |
| Description | Reference link to protocols.io for additional details on the protocol. |
| Example | https://www.protocols.io/view/sample-protocol-b2ubqesn |
| Reference | # |
| Regex | ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)+(?: \| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)*)*$ |
| Namespace | ei:protocols_io_reference |
Workflowhub Sop Reference
| Name | workflow_hub_sop_reference |
| Description | Reference to the Standard Operating Procedure (SOP) in workflow hub. |
| Example | https://workflowhub.eu/works/12345 |
| Reference | # |
| Namespace | ei:workflow_hub_sop_reference |
Dissociation Protocol Method
| Name | dissociation_protocol_method |
| Description | The method used to dissociate tissues into single cells. |
| Example | Mechanical and enzymatic dissociation |
| Reference | # |
| Namespace | ei:dissociation_protocol_method |
Single Cell Quality Metric
| Name | single_cell_quality_metric |
| Description | Metrics used to assess the quality of single cells before sequencing. |
| Example | Cell viability percentage |
| Reference | # |
| Namespace | ei:single_cell_quality_metric |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the sample |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Suspension Type Required
| Name | suspension_type |
| Description | The type of suspension used to keep cells in solution during processing. |
| Example | Cell |
| Reference | # |
| Namespace | ei:suspension_type |
| Allowed Values | Cell Nuclei Protoplast |
Cell Count
| Name | cell_count |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 10000 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cell_count |
Cell Viability
| Name | cell_viability |
| Description | The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis. |
| Example | 95 |
| Reference | # |
| Namespace | ei:cell_viability |
Cell Viability Assessment Method
| Name | cell_viability_assessment_method |
| Description | The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques. |
| Example | Trypan Blue Exclusion |
| Reference | # |
| Namespace | ei:cell_viability_assessment_method |
Cell Size
| Name | cell_size |
| Description | The size of the cell, typically measured in micrometres. |
| Example | 10 |
| Reference | # |
| Namespace | ei:cell_size |
Suspension Volume (µL)
| Name | suspension_volume_µl |
| Description | The volume of the cell suspension in microlitres (µL). |
| Example | 100 |
| Reference | # |
| Namespace | ei:suspension_volume_µl |
Suspension Concentration Cells Per µL
| Name | suspension_concentration_cells_per_µl |
| Description | The concentration of cells in the suspension in microlitres (µL). |
| Example | 1000 |
| Reference | # |
| Namespace | ei:suspension_concentration_cells_per_µl |
Suspension Dilution
| Name | suspension_dilution |
| Description | The dilution factor of the cell suspension. |
| Example | 1:10 |
| Reference | # |
| Namespace | ei:suspension_dilution |
Loading Volume Μl
| Name | loading_volume_µl |
| Description | The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:loading_volume_µl |
Suspension Dilution Buffer
| Name | suspension_dilution_buffer |
| Description | A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing. |
| Example | PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin) |
| Reference | # |
| Namespace | ei:suspension_dilution_buffer |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the library preparation. |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Library Preparation Kit Required
| Name | library_prep_kit |
| Description | Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids |
| Example | 10X Genomics Single Cell 3' v3 |
| Reference | https://w3id.org/mixs/0001145 |
| Namespace | mixs:library_prep_kit |
Library Preparation Kit Version Required
| Name | library_prep_kit_version |
| Description | The version number of the library preparation kit used for sequencing. |
| Example | 2 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ontology:library_prep_kit_version |
Amplification Method
| Name | amplification_method |
| Description | The method used to amplify the Complementary DNA (cDNA). |
| Example | PCR |
| Reference | # |
| Namespace | ei:amplification_method |
cDNA Amplification Cycles
| Name | cdna_amplification_cycles |
| Description | The number of cycles used during the Complementary DNA (cDNA) amplification process. |
| Example | 12 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cdna_amplification_cycles |
Average Size Distribution
| Name | average_size_distribution |
| Description | The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing. |
| Example | 350 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:average_size_distribution |
Library Construction Method
| Name | lib_construction_method |
| Description | The library construction method (including version) that was used. |
| Example | Smart-Seq2 |
| Reference | # |
| Namespace | ei:lib_construction_method |
Input Molecule
| Name | input_molecule |
| Description | The specific fraction of biological macromolecule from which the sequencing library is derived. |
| Example | RNA |
| Reference | # |
| Namespace | ei:input_molecule |
Primer
Primeness Required
| Name | primeness |
| Description | The end from which the molecule was sequenced. |
| Example | 5' |
| Reference | # |
| Namespace | ei:primeness |
| Allowed Values | 3' 5' Both |
End Bias
| Name | end_bias |
| Description | The end bias of the library. |
| Example | 3 |
| Reference | # |
| Namespace | ei:end_bias |
| Allowed Values | 3 5 |
Library Strand
| Name | library_strand |
| Description | The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none. |
| Example | Antisense |
| Reference | # |
| Namespace | ei:library_strand |
| Allowed Values | Antisense Both Sense Unstranded |
Spike In Required
| Name | spike_in |
| Description | External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used. |
| Example | Yes |
| Reference | # |
| Namespace | ei:spike_in |
| Allowed Values | No Yes |
Spike Type
| Name | spike_type |
| Description | The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA. |
| Example | Synthetic RNA |
| Reference | # |
| Namespace | ei:spike_type |
Spike In Dilution Or Concentration
| Name | spike_in_dilution_or_concentration |
| Description | The final concentration or dilution (for commercial sets) of the spike in mix. |
| Example | 1:1000 |
| Reference | # |
| Namespace | ei:spike_in_dilution_or_concentration |
i5 Index Required
| Name | i5_index |
| Description | Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing. |
| Example | ATCACG |
| Reference | # |
| Namespace | ei:i5_index |
i7 Index Required
| Name | i7_index |
| Description | Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs. |
| Example | CGATGT |
| Reference | # |
| Namespace | ei:i7_index |
Dual or Single Index Required
| Name | dual_single_index |
| Description | Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing. |
| Example | Dual |
| Reference | # |
| Namespace | ei:dual_single_index |
| Allowed Values | Dual Single |
I5 Sequence Required
| Name | i5_sequence |
| Description | The nucleotide sequence of the i5 index used in multiplexing during sequencing. |
| Example | ATCGTAGC |
| Reference | # |
| Namespace | ei:i5_sequence |
i7 Sequence Required
| Name | i7_sequence |
| Description | The specific nucleotide sequence of the i7 index used for a sample. |
| Example | TGCATGCA |
| Reference | # |
| Namespace | ei:i7_sequence |
Plate ID
| Name | plate_id |
| Description | Identifier for the 96-well plate used in sample preparation. |
| Example | PLT001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:plate_id |
Well Row
| Name | well_row |
| Description | The row identifier in a 96-well plate indicating the sample's position. |
| Example | A |
| Reference | # |
| Namespace | ei:well_row |
Well Column
| Name | well_col |
| Description | The column identifier in a 96-well plate indicating the sample's position. |
| Example | 5 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:well_col |
Cell Phenotype
| Name | cell_phenotype |
| Description | The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells. |
| Example | CD41- |
| Reference | # |
| Namespace | ei:cell_phenotype |
| Allowed Values | CD41+ CD41- |
Design description
| Name | design_description |
| Description | The design of the library including details of how it was constructed. |
| Reference | # |
| Namespace | ei:design_description |
Library selection Required
| Name | library_selection |
| Description | The method used to select for or against, enrich, or screen the material being sequenced. |
| Example | RANDOM PCR |
| Reference | # |
| Namespace | ei:library_selection |
| Allowed Values | 5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified |
Library source Required
| Name | library_source |
| Description | The type of source material that is being sequenced. |
| Example | GENOMIC |
| Reference | # |
| Namespace | ei:library_source |
| Allowed Values | GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA |
Library strategy Required
| Name | library_strategy |
| Description | The sequencing technique intended for this library. |
| Example | RNA-Seq |
| Reference | # |
| Namespace | ei:library_strategy |
| Allowed Values | AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | https://w3id.org/mixs/0000016 |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ontology:sequencing_id |
Sequencing Platform Name Required
| Name | sequencing_platform_name |
| Description | The name of the sequencing platform used for the experiment. |
| Example | Pacbio |
| Reference | http://purl.obolibrary.org/obo/NCIT_C172274 |
| Namespace | ontology:sequencing_platform_name |
Sequencing Instrument Model Required
| Name | sequencing_instrument_model |
| Description | This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability. |
| Example | Illumina NovaSeq 6000 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Namespace | ontology:sequencing_instrument_model |
| Allowed Values | 454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100 |
Library Layout Required
| Name | lib_layout |
| Description | Specify whether to expect single, paired, or other configuration of reads for sequencing |
| Example | Paired |
| Reference | https://w3id.org/mixs/0000111 |
| Namespace | mixs:lib_layout |
| Allowed Values | Other Paired Single Vector |
UMI Barcode Read
| Name | umi_barcode_read |
| Description | The type of read that contains the Unique Molecular Identifier (UMI) barcode. |
| Example | index2 |
| Reference | # |
| Namespace | ei:umi_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
UMI Barcode Offset
| Name | umi_barcode_offset |
| Description | The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 0 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_offset |
UMI Barcode Size
| Name | umi_barcode_size |
| Description | The size of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_size |
Cell Barcode Read
| Name | cell_barcode_read |
| Description | The type of read that contains the UMI barcode. |
| Example | index1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010203 |
| Namespace | ontology:cell_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
Cell Barcode Offset
| Name | cell_barcode_offset |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 10 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010204 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_offset |
Cell Barcode Size
| Name | cell_barcode_size |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 0 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010205 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_size |
cDNA Read Required
| Name | cdna_read |
| Description | The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing. |
| Example | read1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010195 |
| Namespace | ontology:cdna_read |
| Allowed Values | index1 index2 read1 read2 |
cDNA Read Offset
| Name | cdna_read_offset |
| Description | The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences. |
| Example | 6 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010201 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_offset |
cDNA Read Size
| Name | cdna_read_size |
| Description | The size of the Complementary DNA (cDNA) read. |
| Example | 75 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010202 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_size |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File Derived From
| Name | file_derived_from |
| Description | The name of the file that was used to generate the analysis derived data. |
| Example | file1_sequencing.json |
| Reference | # |
| Namespace | ei:file_derived_from |
Inferred Cell Type
| Name | inferred_cell_type |
| Description | Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer. |
| Example | type II bipolar neuron |
| Reference | # |
| Namespace | ei:inferred_cell_type |
Post Analysis Cell Well Quality
| Name | post_analysis_cell_well_quality |
| Description | Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed. |
| Example | Pass |
| Reference | # |
| Namespace | ei:post_analysis_cell_well_quality |
| Allowed Values | Fail Pass |
Other Derived Cell Attributes
| Name | other_derived_cell_attributes |
| Description | Any other cell level measurement or annotation as result of the analysis. |
| Example | Cluster |
| Reference | # |
| Namespace | ei:other_derived_cell_attributes |
| Allowed Values | Cluster Count Gene UMI tSNE coordinates |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Reference Genome
| Name | reference_genome |
| Description | Indicate version and include stable link to genome data (or attach genome fasta file). |
| Example | GRCh38, https://example.org/grch38.fa |
| Reference | # |
| Namespace | ei:reference_genome |
Genome Annotation
| Name | genome_annotation |
| Description | Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis. |
| Example | Ensembl v101, https://example.org/ensembl_v101.gtf |
| Reference | # |
| Namespace | ei:genome_annotation |
Annotation Filtering
| Name | annotation_filtering |
| Description | Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.) |
| Example | Filtered to include only protein-coding genes |
| Reference | # |
| Namespace | ei:annotation_filtering |
Genes vs Exons
| Name | genes_vs_exons |
| Description | Quantification using whole gene intervals or exons. |
| Example | Exon quantification |
| Reference | # |
| Namespace | ei:genes_vs_exons |
Library Structure
| Name | library_structure |
| Description | seqspec format |
| Example | Single-cell 3' library |
| Reference | # |
| Namespace | ei:library_structure |
Mapping and Demultiplexing Software
| Name | mapping_and_demultiplexing_software |
| Description | Reads/UMI |
| Example | Cell Ranger 6.0.0 |
| Reference | # |
| Namespace | ei:mapping_and_demultiplexing_software |
Read Mapping Statistics
| Name | read_mapping_statistics |
| Description | Statistics of the Reads or Unique Molecular Identifier (UMI). |
| Example | 80% reads mapped to reference |
| Reference | # |
| Namespace | ei:read_mapping_statistics |
Sequencing Saturation
| Name | sequencing_saturation |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | 95% sequencing saturation |
| Reference | # |
| Namespace | ei:sequencing_saturation |
UMIs or Barcode Distribution QC
| Name | umis_barcode_distribution_qc |
| Description | Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied |
| Example | Threshold: 10 UMIs per barcode |
| Reference | # |
| Namespace | ei:umis_barcode_distribution_qc |
Cell or Non-Cell Filtering Strategy
| Name | cell_non_cell_filtering_strategy |
| Description | Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells. |
| Example | Threshold: 5 UMIs for cell detection |
| Reference | # |
| Namespace | ei:cell_non_cell_filtering_strategy |
Other Quality Filters Applied
| Name | other_quality_filters_applied |
| Description | Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc. |
| Example | Cells with >20% mitochondrial reads discarded |
| Reference | # |
| Namespace | ei:other_quality_filters_applied |
Ambient RNA QC
| Name | ambient_rna_qc |
| Description | Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA |
| Example | Ambient RNA removed if >5% UMIs in background barcodes |
| Reference | # |
| Namespace | ei:ambient_rna_qc |
Predicted Doublet Rate QC
| Name | predicted_doublet_rate_qc |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | Predicted doublet rate: 1.5% |
| Reference | # |
| Namespace | ei:predicted_doublet_rate_qc |
Individual Organism SNP Demultiplexing
| Name | individual_organism_snp_demultiplexing |
| Description | If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used |
| Example | SNP UMAP embedding using CellSNP |
| Reference | # |
| Namespace | ei:individual_organism_snp_demultiplexing |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Clustering Algorithm and Version
| Name | clustering_algorithm_and_version |
| Description | If compared/integrated with existing datasets |
| Example | Louvain 0.8.0 |
| Reference | # |
| Namespace | ei:clustering_algorithm_and_version |
Clustering Parameters
| Name | clustering_parameters |
| Description | If compared/integrated with existing datasets |
| Example | Resolution: 0.6, K-nearest neighbors: 10 |
| Reference | # |
| Namespace | ei:clustering_parameters |
Integration/Batch Correction
| Name | integration_batch_correction |
| Description | If compared/integrated with existing datasets |
| Example | Harmony v1.0 |
| Reference | # |
| Namespace | ei:integration_batch_correction |
Source Code
| Name | source_code |
| Description | If any newly developed code/software has been used in the processing and downstream analysis of the dataset. |
| Example | Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization. |
| Reference | # |
| Namespace | ei:source_code |
UMI Count Matrix
| Name | umi_count_matrix |
| Description | Gene x cell matrix with UMI counts for each gene in each cell. |
| Example | The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv. |
| Reference | # |
| Namespace | ei:umi_count_matrix |
Ensembl IDs
| Name | ensembl_ids |
| Description | Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata. |
| Example | ENSG00000139618 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:ensembl_ids |
Functional Gene Annotations
| Name | functional_gene_annotations |
| Description | Any functional annotation generated/used (gene names, GOs, structural domains, etc.). |
| Example | Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding). |
| Reference | # |
| Namespace | ei:functional_gene_annotations |
Protein Models
| Name | protein_models |
| Description | FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs. |
| Example | The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID. |
| Reference | # |
| Namespace | ei:protein_models |
Cell Metadata
| Name | cell_metadata |
| Description | Table mapping cell IDs to cluster/cell type/broad cell type annotations. |
| Example | Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv. |
| Reference | # |
| Namespace | ei:cell_metadata |
Cluster-Level Normalised Expression Tables
| Name | cluster_level_normalised_expression_tables |
| Description | Expression tables that show normalised gene expression at the cluster or cell-type level. |
| Example | Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv. |
| Reference | # |
| Namespace | ei:cluster_level_normalised_expression_tables |
Other Resource Files
| Name | other_resource_files |
| Description | Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags). |
| Example | Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv. |
| Reference | # |
| Namespace | ei:other_resource_files |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:sequencing_id |
Read 1 File Required
| Name | read_1_file |
| Description | The name or accession of the file that contains read 1. |
| Example | file1_r1.fastq.gz |
| Reference | # |
| Namespace | ei:read_1_file |
Read 2 File
| Name | read_2_file |
| Description | The name or accession of the file that contains read 2. |
| Example | file2_r2.fastq.gz |
| Reference | # |
| Namespace | ei:read_2_file |
Index 1 File
| Name | index_1_file |
| Description | The name of the file that contains index 1. |
| Example | file1_i1.fastq.gz |
| Reference | # |
| Namespace | ei:index_1_file |
Index 2 File
| Name | index_2_file |
| Description | The name of the file that contains index 2. |
| Example | file2_i2.fastq.gz |
| Reference | # |
| Namespace | ei:index_2_file |
Read 1 Checksum Required
| Name | read_1_file_checksum |
| Description | Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | f8d29e41a73b5c02de9a6fb314e7c8ad |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_1_file_checksum |
Read 2 Checksum
| Name | read_2_file_checksum |
| Description | Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | a3f4c1b29d8e57fa41b02de6c7f9ab83 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_2_file_checksum |
White List Barcode File
| Name | white_list_barcode_file |
| Description | A file containing the known cell barcodes in the dataset. |
| Example | barcodes.tsv |
| Reference | # |
| Namespace | ei:white_list_barcode_file |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Expression Data Process Setting ID Required
| Name | expression_data_process_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_process_setting_id |
Matrix Type
| Name | matrix_type |
| Description | Matrix Type |
| Example | raw_counts |
| Reference | # |
| Namespace | ei:matrix_type |
| Allowed Values | imputed log1p nomalised pseudobulk raw_counts scaled |
Reference Genome Required
| Name | reference_genome |
| Description | The associated reference genome |
| Example | https://reference-genome-example.com |
| Reference | # |
| Regex | ^((https?|ftp):\/\/[^\s|]+)(\|((https?|ftp):\/\/[^\s|]+))*$ |
| Namespace | ei:reference_genome |
Annotation Version
| Name | annotation_version |
| Description | The annotation version of the associated reference genome |
| Example | GENCODE v44 |
| Reference | # |
| Namespace | ei:annotation_version |
Normalisation Method
| Name | normalisation_method |
| Description | Any normalisation processing performed |
| Example | Log normalisation |
| Reference | # |
| Namespace | ei:normalisation_method |
| Allowed Values | Library Size Normalisation Log Normalisation SCNorm SCTransform scran |
Highly Variable Gene Selection (HVG)
| Name | highly_variable_gene_selection |
| Description | Number of Highly Variable Genes |
| Example | seurat_v3, n=2000 |
| Reference | # |
| Namespace | ei:highly_variable_gene_selection |
Dimensionality Reduction
| Name | dimensionality_reduction |
| Description | Method used to reduce dimensionality in the expression data |
| Example | PCA |
| Reference | # |
| Namespace | ei:dimensionality_reduction |
| Allowed Values | Diffusion Map ICA NMF PCA UMAP t-SNE |
Number of Nearest Neighbours
| Name | n_neighbours |
| Description | Number of nearest neighbours used to calculate cluster membership |
| Example | pca:50 |
| Reference | # |
| Namespace | ei:n_neighbours |
Clustering Algorithm
| Name | clustering_algorithm |
| Description | Algorithm used to create clusters |
| Reference | # |
| Namespace | ei:clustering_algorithm |
Clustering Resolution
| Name | clustering_resolution |
| Description | Resolution parameter |
| Example | 2.5 |
| Reference | # |
| Regex | ^([0-9]*[.])?[0-9]+ |
| Namespace | ei:clustering_resolution |
Clustering Distance Metric
| Name | clustering_distance_metric |
| Description | Metic used to calculate a points distance to others |
| Example | cosine |
| Reference | # |
| Namespace | ei:clustering_distance_metric |
| Allowed Values | cosine euclidean hamming jaccard manhatten mehalanobis |
Software Versions
| Name | software_versions |
| Description | Primary software packages used for analysis |
| Reference | # |
| Namespace | ei:software_versions |
Cell Type Annotation
| Name | cell-type annotation |
| Description | Tools and Databases used for cell annotation |
| Reference | # |
| Namespace | ei:cell-type annotation |
Generated by Pipeline
| Name | generated_by_pipeline |
| Description | URL of the deposited pipeline used to create this data |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | ei:generated_by_pipeline |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | expression_data_file_id |
| Description | A unique alphanumeric identifier for the expression data file |
| Example | EXPFILE001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric identifier for library preparation |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Expression Data Process Setting ID Required
| Name | expression_data_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_setting_id |
File Name Required
| Name | expression_data_file |
| Description | Expression data file name |
| Example | exp_file.csv |
| Reference | # |
| Namespace | ei:expression_data_file |
File md5 Checkshum Required
| Name | expression_data_file_checksum |
| Description | calculated md5 checksum for this file |
| Example | 9e4b7a23f6c1d0ab85f29c47e3d8a610 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:expression_data_file_checksum |
File Format Required
| Name | expression_data_file_format |
| Description | The format of the expression file, such as h5ad or rds |
| Example | csv |
| Reference | # |
| Namespace | ei:expression_data_file_format |
| Allowed Values | csv h5ad loom mtx rds |
Number of Cells
| Name | n_cells |
| Description | The number of cells represented in the expression data |
| Example | 4 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_cells |
Number of Genes
| Name | n_genes |
| Description | The number of genese represented in the expression data |
| Example | 50 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_genes |
File Size in Bytes
| Name | file_size_bytes |
| Description | Size of the file recorded in bytes |
| Example | 90 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:file_size_bytes |
Date Generated
| Name | date_generated |
| Description | Approximate date this expression data was generated |
| Example | 2024-10-14 |
| Reference | # |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | ei:date_generated |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Project Name Required
| Name | project_name |
| Description | Official name of the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Study of single cells in the human body |
| Reference | https://w3id.org/mixs/0000092 |
| Namespace | mixs:project_name |
Description
| Name | description |
| Description | A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication. |
| Example | This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine. |
| Reference | http://purl.org/dc/terms/description |
| Namespace | dcterms:description |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Laser microdissection |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Technology Required
| Name | technology |
| Description | The sorting or visualisation technology used. |
| Example | Vizgen |
| Reference | # |
| Namespace | ei:technology |
Negative Control Type
| Name | neg_cont_type |
| Description | The substance or equipment used as a negative control in an investigation |
| Example | Phosphate buffer |
| Reference | https://w3id.org/mixs/0001321 |
| Namespace | mixs:neg_cont_type |
| Allowed Values | DNA-free PCR mix Distilled water Empty collection device Empty collection tube Phosphate buffer Sterile swab Sterile syringe |
Positive Control Type
| Name | pos_cont_type |
| Description | The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive |
| Example | substance1 |
| Reference | https://w3id.org/mixs/0001322 |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | mixs:pos_cont_type |
Experimental Factor
| Name | experimental_factor |
| Description | Variable aspects of an experiment design that can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI) |
| Example | EFO:0001779 |
| Reference | https://w3id.org/mixs/0000008 |
| Regex | ^[A-Z]{2,}:\d+$ |
| Namespace | mixs:experimental_factor |
Relevant Electronic Resources
| Name | associated_resource |
| Description | A related resource that is referenced, cited, or otherwise associated to the sequence. |
| Example | https://arctos.database.museum/media/10520962 | https://arctos.database.museum/media/10520964 |
| Reference | https://w3id.org/mixs/0000091 |
| Regex | ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)+(?: \| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)*)*$ |
| Namespace | mixs:associated_resource |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Regex | ^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$ |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Protocol Name Required
| Name | protocol_name |
| Description | A descriptive name of the protocol used for single-cell sequencing. |
| Example | 10X Genomics Single Cell 3' Library Prep |
| Reference | # |
| Namespace | ei:protocol_name |
Dissociation Description Required
| Name | dissociation_description |
| Description | A free-text description of the process used to separate cells from tissues or cell aggregates. |
| Example | Tissue was enzymatically dissociated using collagenase for 30 minutes. |
| Reference | # |
| Namespace | ei:dissociation_description |
Enrichment Markers
| Name | enrichment_markers |
| Description | Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms. |
| Example | CD45 |
| Reference | # |
| Namespace | faang:enrichment_markers |
Isolation Kit
| Name | isolation_kit |
| Description | The kit used to isolate the cells. |
| Example | 10x Nuclei Isolation Kit |
| Reference | # |
| Namespace | ei:isolation_kit |
| Allowed Values | 10x Nuclei Isolation Kit 3' standard throughput kit Custom |
Literature Source Reference
| Name | literature_source_reference |
| Description | Reference to literature sources that describe the protocol or methods used. |
| Example | Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview' |
| Reference | # |
| Namespace | ei:literature_source_reference |
Protocols IO Reference
| Name | protocols_io_reference |
| Description | Reference link to protocols.io for additional details on the protocol. |
| Example | https://www.protocols.io/view/sample-protocol-b2ubqesn |
| Reference | # |
| Regex | ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)+(?: \| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)*)*$ |
| Namespace | ei:protocols_io_reference |
Workflowhub Sop Reference
| Name | workflow_hub_sop_reference |
| Description | Reference to the Standard Operating Procedure (SOP) in workflow hub. |
| Example | https://workflowhub.eu/works/12345 |
| Reference | # |
| Namespace | ei:workflow_hub_sop_reference |
Dissociation Protocol Method
| Name | dissociation_protocol_method |
| Description | The method used to dissociate tissues into single cells. |
| Example | Mechanical and enzymatic dissociation |
| Reference | # |
| Namespace | ei:dissociation_protocol_method |
Single Cell Quality Metric
| Name | single_cell_quality_metric |
| Description | Metrics used to assess the quality of single cells before sequencing. |
| Example | Cell viability percentage |
| Reference | # |
| Namespace | ei:single_cell_quality_metric |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the sample |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Suspension Type Required
| Name | suspension_type |
| Description | The type of suspension used to keep cells in solution during processing. |
| Example | Cell |
| Reference | # |
| Namespace | ei:suspension_type |
| Allowed Values | Cell Nuclei Protoplast |
Cell Count
| Name | cell_count |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 10000 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cell_count |
Cell Viability
| Name | cell_viability |
| Description | The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis. |
| Example | 95 |
| Reference | # |
| Namespace | ei:cell_viability |
Cell Viability Assessment Method
| Name | cell_viability_assessment_method |
| Description | The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques. |
| Example | Trypan Blue Exclusion |
| Reference | # |
| Namespace | ei:cell_viability_assessment_method |
Cell Size
| Name | cell_size |
| Description | The size of the cell, typically measured in micrometres. |
| Example | 10 |
| Reference | # |
| Namespace | ei:cell_size |
Suspension Volume (µL)
| Name | suspension_volume_µl |
| Description | The volume of the cell suspension in microlitres (µL). |
| Example | 100 |
| Reference | # |
| Namespace | ei:suspension_volume_µl |
Suspension Concentration Cells Per µL
| Name | suspension_concentration_cells_per_µl |
| Description | The concentration of cells in the suspension in microlitres (µL). |
| Example | 1000 |
| Reference | # |
| Namespace | ei:suspension_concentration_cells_per_µl |
Suspension Dilution
| Name | suspension_dilution |
| Description | The dilution factor of the cell suspension. |
| Example | 1:10 |
| Reference | # |
| Namespace | ei:suspension_dilution |
Loading Volume Μl
| Name | loading_volume_µl |
| Description | The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:loading_volume_µl |
Suspension Dilution Buffer
| Name | suspension_dilution_buffer |
| Description | A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing. |
| Example | PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin) |
| Reference | # |
| Namespace | ei:suspension_dilution_buffer |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the library preparation. |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Library Preparation Kit Required
| Name | library_prep_kit |
| Description | Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids |
| Example | 10X Genomics Single Cell 3' v3 |
| Reference | https://w3id.org/mixs/0001145 |
| Namespace | mixs:library_prep_kit |
Library Preparation Kit Version Required
| Name | library_prep_kit_version |
| Description | The version number of the library preparation kit used for sequencing. |
| Example | 2 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ontology:library_prep_kit_version |
Amplification Method
| Name | amplification_method |
| Description | The method used to amplify the Complementary DNA (cDNA). |
| Example | PCR |
| Reference | # |
| Namespace | ei:amplification_method |
cDNA Amplification Cycles
| Name | cdna_amplification_cycles |
| Description | The number of cycles used during the Complementary DNA (cDNA) amplification process. |
| Example | 12 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cdna_amplification_cycles |
Average Size Distribution
| Name | average_size_distribution |
| Description | The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing. |
| Example | 350 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:average_size_distribution |
Library Construction Method
| Name | lib_construction_method |
| Description | The library construction method (including version) that was used. |
| Example | Smart-Seq2 |
| Reference | # |
| Namespace | ei:lib_construction_method |
Input Molecule
| Name | input_molecule |
| Description | The specific fraction of biological macromolecule from which the sequencing library is derived. |
| Example | RNA |
| Reference | # |
| Namespace | ei:input_molecule |
Primer
Primeness Required
| Name | primeness |
| Description | The end from which the molecule was sequenced. |
| Example | 5' |
| Reference | # |
| Namespace | ei:primeness |
| Allowed Values | 3' 5' Both |
End Bias
| Name | end_bias |
| Description | The end bias of the library. |
| Example | 3 |
| Reference | # |
| Namespace | ei:end_bias |
| Allowed Values | 3 5 |
Library Strand
| Name | library_strand |
| Description | The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none. |
| Example | Antisense |
| Reference | # |
| Namespace | ei:library_strand |
| Allowed Values | Antisense Both Sense Unstranded |
Spike In Required
| Name | spike_in |
| Description | External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used. |
| Example | Yes |
| Reference | # |
| Namespace | ei:spike_in |
| Allowed Values | No Yes |
Spike Type
| Name | spike_type |
| Description | The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA. |
| Example | Synthetic RNA |
| Reference | # |
| Namespace | ei:spike_type |
Spike In Dilution Or Concentration
| Name | spike_in_dilution_or_concentration |
| Description | The final concentration or dilution (for commercial sets) of the spike in mix. |
| Example | 1:1000 |
| Reference | # |
| Namespace | ei:spike_in_dilution_or_concentration |
i5 Index Required
| Name | i5_index |
| Description | Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing. |
| Example | ATCACG |
| Reference | # |
| Namespace | ei:i5_index |
i7 Index Required
| Name | i7_index |
| Description | Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs. |
| Example | CGATGT |
| Reference | # |
| Namespace | ei:i7_index |
Dual or Single Index Required
| Name | dual_single_index |
| Description | Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing. |
| Example | Dual |
| Reference | # |
| Namespace | ei:dual_single_index |
| Allowed Values | Dual Single |
I5 Sequence Required
| Name | i5_sequence |
| Description | The nucleotide sequence of the i5 index used in multiplexing during sequencing. |
| Example | ATCGTAGC |
| Reference | # |
| Namespace | ei:i5_sequence |
i7 Sequence Required
| Name | i7_sequence |
| Description | The specific nucleotide sequence of the i7 index used for a sample. |
| Example | TGCATGCA |
| Reference | # |
| Namespace | ei:i7_sequence |
Plate ID
| Name | plate_id |
| Description | Identifier for the 96-well plate used in sample preparation. |
| Example | PLT001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:plate_id |
Well Row
| Name | well_row |
| Description | The row identifier in a 96-well plate indicating the sample's position. |
| Example | A |
| Reference | # |
| Namespace | ei:well_row |
Well Column
| Name | well_col |
| Description | The column identifier in a 96-well plate indicating the sample's position. |
| Example | 5 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:well_col |
Cell Phenotype
| Name | cell_phenotype |
| Description | The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells. |
| Example | CD41- |
| Reference | # |
| Namespace | ei:cell_phenotype |
| Allowed Values | CD41+ CD41- |
Nucleic Acid Amplification
| Name | nucl_acid_amp |
| Description | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids. The link can be a PMID, DOI or URL. |
| Example | https://phylogenomics.me/protocols/16s-pcr-protocol/ |
| Reference | https://w3id.org/mixs/0000050 |
| Regex | ^PMID:\d+$|^doi:10.\d{2,9}/.*$|^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)$|([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ |
| Namespace | mixs:nucl_acid_amp |
Nucleic Acid Extraction
| Name | nucl_acid_ext |
| Description | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample |
| Example | https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf |
| Reference | https://w3id.org/mixs/0000038 |
| Regex | ^PMID:\d+$|^doi:10.\d{2,9}/.*$|^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)$|([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ |
| Namespace | mixs:nucl_acid_ext |
Amount or Size of Sample Collected
| Name | samp_size |
| Description | The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected |
| Example | 5 litre |
| Reference | https://w3id.org/mixs/0000037 |
| Regex | ^[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?( *- *[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?)? *([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ |
| Namespace | mixs:samp_size |
Estimated Size
| Name | estimated_size |
| Description | The estimated size of the genome prior to sequencing in base pairs (bp). Of particular importance in the sequencing of (eukaryotic) genome which could remain in draft form for a long or unspecified period |
| Example | 300000 |
| Reference | https://w3id.org/mixs/0000001 |
| Namespace | mixs:estimated_size |
Sample Volume or Weight for DNA Extraction
| Name | samp_vol_we_dna_ext |
| Description | Volume (ml) or mass (g) of total collected sample processed for DNA extraction. |
| Example | 1500 milliliter |
| Reference | https://w3id.org/mixs/0000024 |
| Regex | ^[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?(?: *- *[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?)? *(milliliter|gram|milligram|square centimeter)$ |
| Namespace | mixs:samp_vol_we_dna_ext |
Library Vector
| Name | lib_vector |
| Description | Cloning vector type(s) used in construction of libraries |
| Example | Bacteriophage P1 |
| Reference | https://w3id.org/mixs/0000041 |
| Namespace | mixs:lib_vector |
Adapters
| Name | adapters |
| Description | Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters |
| Example | AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT |
| Reference | https://w3id.org/mixs/0000042 |
| Namespace | mixs:adapters |
Sample Material Processing
| Name | samp_mat_process |
| Description | A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed |
| Example | filtering of seawater, storing samples in ethanol |
| Reference | https://w3id.org/mixs/0000048 |
| Namespace | mixs:samp_mat_process |
Design description
| Name | design_description |
| Description | The design of the library including details of how it was constructed. |
| Reference | # |
| Namespace | ei:design_description |
Library selection Required
| Name | library_selection |
| Description | The method used to select for or against, enrich, or screen the material being sequenced. |
| Example | RANDOM PCR |
| Reference | # |
| Namespace | ei:library_selection |
| Allowed Values | 5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified |
Library source Required
| Name | library_source |
| Description | The type of source material that is being sequenced. |
| Example | GENOMIC |
| Reference | # |
| Namespace | ei:library_source |
| Allowed Values | GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA |
Library strategy Required
| Name | library_strategy |
| Description | The sequencing technique intended for this library. |
| Example | RNA-Seq |
| Reference | # |
| Namespace | ei:library_strategy |
| Allowed Values | AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | https://w3id.org/mixs/0000016 |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ontology:sequencing_id |
Sequencing Platform Name Required
| Name | sequencing_platform_name |
| Description | The name of the sequencing platform used for the experiment. |
| Example | Pacbio |
| Reference | http://purl.obolibrary.org/obo/NCIT_C172274 |
| Namespace | ontology:sequencing_platform_name |
Sequencing Instrument Model Required
| Name | sequencing_instrument_model |
| Description | This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability. |
| Example | Illumina NovaSeq 6000 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Namespace | ontology:sequencing_instrument_model |
| Allowed Values | 454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100 |
Library Layout Required
| Name | lib_layout |
| Description | Specify whether to expect single, paired, or other configuration of reads for sequencing |
| Example | Paired |
| Reference | https://w3id.org/mixs/0000111 |
| Namespace | mixs:lib_layout |
| Allowed Values | Other Paired Single Vector |
UMI Barcode Read
| Name | umi_barcode_read |
| Description | The type of read that contains the Unique Molecular Identifier (UMI) barcode. |
| Example | index2 |
| Reference | # |
| Namespace | ei:umi_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
UMI Barcode Offset
| Name | umi_barcode_offset |
| Description | The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 0 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_offset |
UMI Barcode Size
| Name | umi_barcode_size |
| Description | The size of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_size |
Cell Barcode Read
| Name | cell_barcode_read |
| Description | The type of read that contains the UMI barcode. |
| Example | index1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010203 |
| Namespace | ontology:cell_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
Cell Barcode Offset
| Name | cell_barcode_offset |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 10 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010204 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_offset |
Cell Barcode Size
| Name | cell_barcode_size |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 0 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010205 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_size |
cDNA Read Required
| Name | cdna_read |
| Description | The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing. |
| Example | read1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010195 |
| Namespace | ontology:cdna_read |
| Allowed Values | index1 index2 read1 read2 |
cDNA Read Offset
| Name | cdna_read_offset |
| Description | The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences. |
| Example | 6 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010201 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_offset |
cDNA Read Size
| Name | cdna_read_size |
| Description | The size of the Complementary DNA (cDNA) read. |
| Example | 75 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010202 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_size |
Library Size
| Name | lib_size |
| Description | Total number of clones in the library prepared for the project |
| Example | 50 |
| Reference | https://w3id.org/mixs/0000039 |
| Regex | ^\d+$ |
| Namespace | mixs:lib_size |
Completeness Score
| Name | compl_score |
| Description | Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores |
| Example | med;60% |
| Reference | https://w3id.org/mixs/0000069 |
| Namespace | mixs:compl_score |
Library Reads Sequenced
| Name | lib_reads_seq |
| Description | Total number of clones sequenced from the library |
| Example | 20 |
| Reference | https://w3id.org/mixs/0000040 |
| Regex | ^\d+$ |
| Namespace | mixs:lib_reads_seq |
Number of Contigs
| Name | number_contig |
| Description | Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG |
| Example | 40 |
| Reference | https://w3id.org/mixs/0000060 |
| Namespace | mixs:number_contig |
Number of Replicons
| Name | num_replicons |
| Description | Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote |
| Example | 2 |
| Reference | https://w3id.org/mixs/0000022 |
| Regex | ^\d+$ |
| Namespace | mixs:num_replicons |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File Derived From
| Name | file_derived_from |
| Description | The name of the file that was used to generate the analysis derived data. |
| Example | file1_sequencing.json |
| Reference | # |
| Namespace | ei:file_derived_from |
Inferred Cell Type
| Name | inferred_cell_type |
| Description | Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer. |
| Example | type II bipolar neuron |
| Reference | # |
| Namespace | ei:inferred_cell_type |
Post Analysis Cell Well Quality
| Name | post_analysis_cell_well_quality |
| Description | Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed. |
| Example | Pass |
| Reference | # |
| Namespace | ei:post_analysis_cell_well_quality |
| Allowed Values | Fail Pass |
Other Derived Cell Attributes
| Name | other_derived_cell_attributes |
| Description | Any other cell level measurement or annotation as result of the analysis. |
| Example | Cluster |
| Reference | # |
| Namespace | ei:other_derived_cell_attributes |
| Allowed Values | Cluster Count Gene UMI tSNE coordinates |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Reference Genome
| Name | reference_genome |
| Description | Indicate version and include stable link to genome data (or attach genome fasta file). |
| Example | GRCh38, https://example.org/grch38.fa |
| Reference | # |
| Namespace | ei:reference_genome |
Genome Annotation
| Name | genome_annotation |
| Description | Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis. |
| Example | Ensembl v101, https://example.org/ensembl_v101.gtf |
| Reference | # |
| Namespace | ei:genome_annotation |
Annotation Filtering
| Name | annotation_filtering |
| Description | Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.) |
| Example | Filtered to include only protein-coding genes |
| Reference | # |
| Namespace | ei:annotation_filtering |
Genes vs Exons
| Name | genes_vs_exons |
| Description | Quantification using whole gene intervals or exons. |
| Example | Exon quantification |
| Reference | # |
| Namespace | ei:genes_vs_exons |
Library Structure
| Name | library_structure |
| Description | seqspec format |
| Example | Single-cell 3' library |
| Reference | # |
| Namespace | ei:library_structure |
Mapping and Demultiplexing Software
| Name | mapping_and_demultiplexing_software |
| Description | Reads/UMI |
| Example | Cell Ranger 6.0.0 |
| Reference | # |
| Namespace | ei:mapping_and_demultiplexing_software |
Read Mapping Statistics
| Name | read_mapping_statistics |
| Description | Statistics of the Reads or Unique Molecular Identifier (UMI). |
| Example | 80% reads mapped to reference |
| Reference | # |
| Namespace | ei:read_mapping_statistics |
Sequencing Saturation
| Name | sequencing_saturation |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | 95% sequencing saturation |
| Reference | # |
| Namespace | ei:sequencing_saturation |
UMIs or Barcode Distribution QC
| Name | umis_barcode_distribution_qc |
| Description | Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied |
| Example | Threshold: 10 UMIs per barcode |
| Reference | # |
| Namespace | ei:umis_barcode_distribution_qc |
Cell or Non-Cell Filtering Strategy
| Name | cell_non_cell_filtering_strategy |
| Description | Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells. |
| Example | Threshold: 5 UMIs for cell detection |
| Reference | # |
| Namespace | ei:cell_non_cell_filtering_strategy |
Other Quality Filters Applied
| Name | other_quality_filters_applied |
| Description | Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc. |
| Example | Cells with >20% mitochondrial reads discarded |
| Reference | # |
| Namespace | ei:other_quality_filters_applied |
Ambient RNA QC
| Name | ambient_rna_qc |
| Description | Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA |
| Example | Ambient RNA removed if >5% UMIs in background barcodes |
| Reference | # |
| Namespace | ei:ambient_rna_qc |
Predicted Doublet Rate QC
| Name | predicted_doublet_rate_qc |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | Predicted doublet rate: 1.5% |
| Reference | # |
| Namespace | ei:predicted_doublet_rate_qc |
Individual Organism SNP Demultiplexing
| Name | individual_organism_snp_demultiplexing |
| Description | If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used |
| Example | SNP UMAP embedding using CellSNP |
| Reference | # |
| Namespace | ei:individual_organism_snp_demultiplexing |
Assembly Name
| Name | assembly_name |
| Description | Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community |
| Example | JCVI_ISG_i3_1.0 |
| Reference | https://w3id.org/mixs/0000057 |
| Namespace | mixs:assembly_name |
Extrachromosomal Elements
| Name | extrachrom_elements |
| Description | Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids) |
| Example | 5 |
| Reference | https://w3id.org/mixs/0000023 |
| Regex | ^\d+$ |
| Namespace | mixs:extrachrom_elements |
Assembly Quality
| Name | assembly_qual |
| Description | The assembly quality category is based on sets of criteria outlined for each assembly quality category. |
| Example | High-quality draft genome |
| Reference | https://w3id.org/mixs/0000056 |
| Namespace | mixs:assembly_qual |
| Allowed Values | Finished genome Genome fragment(s) High-quality draft genome Low-quality draft genome Medium-quality draft genome |
Assembly Software
| Name | assembly_software |
| Description | Tool(s) used for assembly, including version number and parameters |
| Example | metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise |
| Reference | https://w3id.org/mixs/0000058 |
| Namespace | mixs:assembly_software |
Annotation
| Name | annot |
| Description | Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter |
| Example | prokka |
| Reference | https://w3id.org/mixs/0000059 |
| Namespace | mixs:annot |
Feature Prediction
| Name | feat_pred |
| Description | Method used to predict UViGs features such as ORFs, integration site, etc |
| Example | Prodigal;2.6.3;default parameters |
| Reference | https://w3id.org/mixs/0000061 |
| Regex | ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ |
| Namespace | mixs:feat_pred |
Completeness Software
| Name | compl_software |
| Description | Tools used for completion estimate, i.e. checkm, anvi'o, busco |
| Example | checkm |
| Reference | https://w3id.org/mixs/0000070 |
| Namespace | mixs:compl_software |
Similarity Search Method
| Name | sim_search_meth |
| Description | Tool used to compare ORFs with database, along with version and cutoffs used |
| Example | HMMER3;3.1b2;hmmsearch, cutoff of 50 on score |
| Reference | https://w3id.org/mixs/0000063 |
| Regex | ^([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+);([^\s-]{1,2}|[^\s-]+.+[^\s-]+)$ |
| Namespace | mixs:sim_search_meth |
Relevant Standard Operating Procedures
| Name | sop |
| Description | Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences |
| Example | http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/ |
| Reference | https://w3id.org/mixs/0000090 |
| Namespace | mixs:sop |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Clustering Algorithm and Version
| Name | clustering_algorithm_and_version |
| Description | If compared/integrated with existing datasets |
| Example | Louvain 0.8.0 |
| Reference | # |
| Namespace | ei:clustering_algorithm_and_version |
Clustering Parameters
| Name | clustering_parameters |
| Description | If compared/integrated with existing datasets |
| Example | Resolution: 0.6, K-nearest neighbors: 10 |
| Reference | # |
| Namespace | ei:clustering_parameters |
Integration/Batch Correction
| Name | integration_batch_correction |
| Description | If compared/integrated with existing datasets |
| Example | Harmony v1.0 |
| Reference | # |
| Namespace | ei:integration_batch_correction |
Source Code
| Name | source_code |
| Description | If any newly developed code/software has been used in the processing and downstream analysis of the dataset. |
| Example | Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization. |
| Reference | # |
| Namespace | ei:source_code |
UMI Count Matrix
| Name | umi_count_matrix |
| Description | Gene x cell matrix with UMI counts for each gene in each cell. |
| Example | The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv. |
| Reference | # |
| Namespace | ei:umi_count_matrix |
Ensembl IDs
| Name | ensembl_ids |
| Description | Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata. |
| Example | ENSG00000139618 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:ensembl_ids |
Functional Gene Annotations
| Name | functional_gene_annotations |
| Description | Any functional annotation generated/used (gene names, GOs, structural domains, etc.). |
| Example | Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding). |
| Reference | # |
| Namespace | ei:functional_gene_annotations |
Protein Models
| Name | protein_models |
| Description | FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs. |
| Example | The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID. |
| Reference | # |
| Namespace | ei:protein_models |
Cell Metadata
| Name | cell_metadata |
| Description | Table mapping cell IDs to cluster/cell type/broad cell type annotations. |
| Example | Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv. |
| Reference | # |
| Namespace | ei:cell_metadata |
Cluster-Level Normalised Expression Tables
| Name | cluster_level_normalised_expression_tables |
| Description | Expression tables that show normalised gene expression at the cluster or cell-type level. |
| Example | Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv. |
| Reference | # |
| Namespace | ei:cluster_level_normalised_expression_tables |
Other Resource Files
| Name | other_resource_files |
| Description | Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags). |
| Example | Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv. |
| Reference | # |
| Namespace | ei:other_resource_files |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:sequencing_id |
Read 1 File Required
| Name | read_1_file |
| Description | The name or accession of the file that contains read 1. |
| Example | file1_r1.fastq.gz |
| Reference | # |
| Namespace | ei:read_1_file |
Read 2 File
| Name | read_2_file |
| Description | The name or accession of the file that contains read 2. |
| Example | file2_r2.fastq.gz |
| Reference | # |
| Namespace | ei:read_2_file |
Index 1 File
| Name | index_1_file |
| Description | The name of the file that contains index 1. |
| Example | file1_i1.fastq.gz |
| Reference | # |
| Namespace | ei:index_1_file |
Index 2 File
| Name | index_2_file |
| Description | The name of the file that contains index 2. |
| Example | file2_i2.fastq.gz |
| Reference | # |
| Namespace | ei:index_2_file |
Read 1 Checksum Required
| Name | read_1_file_checksum |
| Description | Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | f8d29e41a73b5c02de9a6fb314e7c8ad |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_1_file_checksum |
Read 2 Checksum
| Name | read_2_file_checksum |
| Description | Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | a3f4c1b29d8e57fa41b02de6c7f9ab83 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_2_file_checksum |
White List Barcode File
| Name | white_list_barcode_file |
| Description | A file containing the known cell barcodes in the dataset. |
| Example | barcodes.tsv |
| Reference | # |
| Namespace | ei:white_list_barcode_file |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Expression Data Process Setting ID Required
| Name | expression_data_process_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_process_setting_id |
Matrix Type
| Name | matrix_type |
| Description | Matrix Type |
| Example | raw_counts |
| Reference | # |
| Namespace | ei:matrix_type |
| Allowed Values | imputed log1p nomalised pseudobulk raw_counts scaled |
Reference Genome Required
| Name | reference_genome |
| Description | The associated reference genome |
| Example | https://reference-genome-example.com |
| Reference | # |
| Regex | ^((https?|ftp):\/\/[^\s|]+)(\|((https?|ftp):\/\/[^\s|]+))*$ |
| Namespace | ei:reference_genome |
Annotation Version
| Name | annotation_version |
| Description | The annotation version of the associated reference genome |
| Example | GENCODE v44 |
| Reference | # |
| Namespace | ei:annotation_version |
Normalisation Method
| Name | normalisation_method |
| Description | Any normalisation processing performed |
| Example | Log normalisation |
| Reference | # |
| Namespace | ei:normalisation_method |
| Allowed Values | Library Size Normalisation Log Normalisation SCNorm SCTransform scran |
Highly Variable Gene Selection (HVG)
| Name | highly_variable_gene_selection |
| Description | Number of Highly Variable Genes |
| Example | seurat_v3, n=2000 |
| Reference | # |
| Namespace | ei:highly_variable_gene_selection |
Dimensionality Reduction
| Name | dimensionality_reduction |
| Description | Method used to reduce dimensionality in the expression data |
| Example | PCA |
| Reference | # |
| Namespace | ei:dimensionality_reduction |
| Allowed Values | Diffusion Map ICA NMF PCA UMAP t-SNE |
Number of Nearest Neighbours
| Name | n_neighbours |
| Description | Number of nearest neighbours used to calculate cluster membership |
| Example | pca:50 |
| Reference | # |
| Namespace | ei:n_neighbours |
Clustering Algorithm
| Name | clustering_algorithm |
| Description | Algorithm used to create clusters |
| Reference | # |
| Namespace | ei:clustering_algorithm |
Clustering Resolution
| Name | clustering_resolution |
| Description | Resolution parameter |
| Example | 2.5 |
| Reference | # |
| Regex | ^([0-9]*[.])?[0-9]+ |
| Namespace | ei:clustering_resolution |
Clustering Distance Metric
| Name | clustering_distance_metric |
| Description | Metic used to calculate a points distance to others |
| Example | cosine |
| Reference | # |
| Namespace | ei:clustering_distance_metric |
| Allowed Values | cosine euclidean hamming jaccard manhatten mehalanobis |
Software Versions
| Name | software_versions |
| Description | Primary software packages used for analysis |
| Reference | # |
| Namespace | ei:software_versions |
Cell Type Annotation
| Name | cell-type annotation |
| Description | Tools and Databases used for cell annotation |
| Reference | # |
| Namespace | ei:cell-type annotation |
Generated by Pipeline
| Name | generated_by_pipeline |
| Description | URL of the deposited pipeline used to create this data |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | ei:generated_by_pipeline |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | expression_data_file_id |
| Description | A unique alphanumeric identifier for the expression data file |
| Example | EXPFILE001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric identifier for library preparation |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Expression Data Process Setting ID Required
| Name | expression_data_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_setting_id |
File Name Required
| Name | expression_data_file |
| Description | Expression data file name |
| Example | exp_file.csv |
| Reference | # |
| Namespace | ei:expression_data_file |
File md5 Checkshum Required
| Name | expression_data_file_checksum |
| Description | calculated md5 checksum for this file |
| Example | 9e4b7a23f6c1d0ab85f29c47e3d8a610 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:expression_data_file_checksum |
File Format Required
| Name | expression_data_file_format |
| Description | The format of the expression file, such as h5ad or rds |
| Example | csv |
| Reference | # |
| Namespace | ei:expression_data_file_format |
| Allowed Values | csv h5ad loom mtx rds |
Number of Cells
| Name | n_cells |
| Description | The number of cells represented in the expression data |
| Example | 4 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_cells |
Number of Genes
| Name | n_genes |
| Description | The number of genese represented in the expression data |
| Example | 50 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_genes |
File Size in Bytes
| Name | file_size_bytes |
| Description | Size of the file recorded in bytes |
| Example | 90 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:file_size_bytes |
Date Generated
| Name | date_generated |
| Description | Approximate date this expression data was generated |
| Example | 2024-10-14 |
| Reference | # |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | ei:date_generated |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Title
| Name | title |
| Description | A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication. |
| Example | Study of single cells in the human body |
| Reference | http://purl.org/dc/terms/title |
| Namespace | dcterms:title |
Description
| Name | description |
| Description | A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication. |
| Example | This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine. |
| Reference | http://purl.org/dc/terms/description |
| Namespace | dcterms:description |
Workflow
| Name | workflow |
| Description | The workflow or protocol followed during the study. |
| Example | Laser microdissection |
| Reference | # |
| Namespace | ei:workflow |
| Allowed Values | Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics |
Technology Required
| Name | technology |
| Description | The sorting or visualisation technology used. |
| Example | Vizgen |
| Reference | # |
| Namespace | ei:technology |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Regex | ^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$ |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Protocol Name Required
| Name | protocol_name |
| Description | A descriptive name of the protocol used for single-cell sequencing. |
| Example | 10X Genomics Single Cell 3' Library Prep |
| Reference | # |
| Namespace | ei:protocol_name |
Dissociation Description Required
| Name | dissociation_description |
| Description | A free-text description of the process used to separate cells from tissues or cell aggregates. |
| Example | Tissue was enzymatically dissociated using collagenase for 30 minutes. |
| Reference | # |
| Namespace | ei:dissociation_description |
Enrichment Markers
| Name | enrichment_markers |
| Description | Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms. |
| Example | CD45 |
| Reference | # |
| Namespace | faang:enrichment_markers |
Isolation Kit
| Name | isolation_kit |
| Description | The kit used to isolate the cells. |
| Example | 10x Nuclei Isolation Kit |
| Reference | # |
| Namespace | ei:isolation_kit |
| Allowed Values | 10x Nuclei Isolation Kit 3' standard throughput kit Custom |
Literature Source Reference
| Name | literature_source_reference |
| Description | Reference to literature sources that describe the protocol or methods used. |
| Example | Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview' |
| Reference | # |
| Namespace | ei:literature_source_reference |
Protocols IO Reference
| Name | protocols_io_reference |
| Description | Reference link to protocols.io for additional details on the protocol. |
| Example | https://www.protocols.io/view/sample-protocol-b2ubqesn |
| Reference | # |
| Regex | ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)+(?: \| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)*)*$ |
| Namespace | ei:protocols_io_reference |
Workflowhub Sop Reference
| Name | workflow_hub_sop_reference |
| Description | Reference to the Standard Operating Procedure (SOP) in workflow hub. |
| Example | https://workflowhub.eu/works/12345 |
| Reference | # |
| Namespace | ei:workflow_hub_sop_reference |
Dissociation Protocol Method
| Name | dissociation_protocol_method |
| Description | The method used to dissociate tissues into single cells. |
| Example | Mechanical and enzymatic dissociation |
| Reference | # |
| Namespace | ei:dissociation_protocol_method |
Single Cell Quality Metric
| Name | single_cell_quality_metric |
| Description | Metrics used to assess the quality of single cells before sequencing. |
| Example | Cell viability percentage |
| Reference | # |
| Namespace | ei:single_cell_quality_metric |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the sample |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Suspension Type Required
| Name | suspension_type |
| Description | The type of suspension used to keep cells in solution during processing. |
| Example | Cell |
| Reference | # |
| Namespace | ei:suspension_type |
| Allowed Values | Cell Nuclei Protoplast |
Cell Count
| Name | cell_count |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 10000 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cell_count |
Cell Number
| Name | cell_number |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 101-10000 |
| Reference | # |
| Namespace | tol:cell_number |
| Allowed Values | 1 1000000+ 100001-500000 10001-50000 101-10000 11-50 2-10 500001-1000000 50001-100000 51-100 |
Cell Viability
| Name | cell_viability |
| Description | The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis. |
| Example | 95 |
| Reference | # |
| Namespace | ei:cell_viability |
Cell Viability Assessment Method
| Name | cell_viability_assessment_method |
| Description | The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques. |
| Example | Trypan Blue Exclusion |
| Reference | # |
| Namespace | ei:cell_viability_assessment_method |
Cell Size
| Name | cell_size |
| Description | The size of the cell, typically measured in micrometres. |
| Example | 10 |
| Reference | # |
| Namespace | ei:cell_size |
Suspension Volume (µL)
| Name | suspension_volume_µl |
| Description | The volume of the cell suspension in microlitres (µL). |
| Example | 100 |
| Reference | # |
| Namespace | ei:suspension_volume_µl |
Suspension Concentration Cells Per µL
| Name | suspension_concentration_cells_per_µl |
| Description | The concentration of cells in the suspension in microlitres (µL). |
| Example | 1000 |
| Reference | # |
| Namespace | ei:suspension_concentration_cells_per_µl |
Suspension Dilution
| Name | suspension_dilution |
| Description | The dilution factor of the cell suspension. |
| Example | 1:10 |
| Reference | # |
| Namespace | ei:suspension_dilution |
Loading Volume Μl
| Name | loading_volume_µl |
| Description | The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:loading_volume_µl |
Suspension Dilution Buffer
| Name | suspension_dilution_buffer |
| Description | A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing. |
| Example | PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin) |
| Reference | # |
| Namespace | ei:suspension_dilution_buffer |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the library preparation. |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Library Preparation Kit Required
| Name | library_prep_kit |
| Description | Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids |
| Example | 10X Genomics Single Cell 3' v3 |
| Reference | https://w3id.org/mixs/0001145 |
| Namespace | mixs:library_prep_kit |
Library Preparation Kit Version Required
| Name | library_prep_kit_version |
| Description | The version number of the library preparation kit used for sequencing. |
| Example | 2 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ontology:library_prep_kit_version |
Amplification Method
| Name | amplification_method |
| Description | The method used to amplify the Complementary DNA (cDNA). |
| Example | PCR |
| Reference | # |
| Namespace | ei:amplification_method |
cDNA Amplification Cycles
| Name | cdna_amplification_cycles |
| Description | The number of cycles used during the Complementary DNA (cDNA) amplification process. |
| Example | 12 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cdna_amplification_cycles |
Average Size Distribution
| Name | average_size_distribution |
| Description | The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing. |
| Example | 350 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:average_size_distribution |
Library Construction Method
| Name | lib_construction_method |
| Description | The library construction method (including version) that was used. |
| Example | Smart-Seq2 |
| Reference | # |
| Namespace | ei:lib_construction_method |
Input Molecule
| Name | input_molecule |
| Description | The specific fraction of biological macromolecule from which the sequencing library is derived. |
| Example | RNA |
| Reference | # |
| Namespace | ei:input_molecule |
Primer
Primeness Required
| Name | primeness |
| Description | The end from which the molecule was sequenced. |
| Example | 5' |
| Reference | # |
| Namespace | ei:primeness |
| Allowed Values | 3' 5' Both |
End Bias
| Name | end_bias |
| Description | The end bias of the library. |
| Example | 3 |
| Reference | # |
| Namespace | ei:end_bias |
| Allowed Values | 3 5 |
Library Strand
| Name | library_strand |
| Description | The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none. |
| Example | Antisense |
| Reference | # |
| Namespace | ei:library_strand |
| Allowed Values | Antisense Both Sense Unstranded |
Spike In Required
| Name | spike_in |
| Description | External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used. |
| Example | Yes |
| Reference | # |
| Namespace | ei:spike_in |
| Allowed Values | No Yes |
Spike Type
| Name | spike_type |
| Description | The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA. |
| Example | Synthetic RNA |
| Reference | # |
| Namespace | ei:spike_type |
Spike In Dilution Or Concentration
| Name | spike_in_dilution_or_concentration |
| Description | The final concentration or dilution (for commercial sets) of the spike in mix. |
| Example | 1:1000 |
| Reference | # |
| Namespace | ei:spike_in_dilution_or_concentration |
i5 Index Required
| Name | i5_index |
| Description | Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing. |
| Example | ATCACG |
| Reference | # |
| Namespace | ei:i5_index |
i7 Index Required
| Name | i7_index |
| Description | Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs. |
| Example | CGATGT |
| Reference | # |
| Namespace | ei:i7_index |
Dual or Single Index Required
| Name | dual_single_index |
| Description | Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing. |
| Example | Dual |
| Reference | # |
| Namespace | ei:dual_single_index |
| Allowed Values | Dual Single |
I5 Sequence Required
| Name | i5_sequence |
| Description | The nucleotide sequence of the i5 index used in multiplexing during sequencing. |
| Example | ATCGTAGC |
| Reference | # |
| Namespace | ei:i5_sequence |
i7 Sequence Required
| Name | i7_sequence |
| Description | The specific nucleotide sequence of the i7 index used for a sample. |
| Example | TGCATGCA |
| Reference | # |
| Namespace | ei:i7_sequence |
Plate ID
| Name | plate_id |
| Description | Identifier for the 96-well plate used in sample preparation. |
| Example | PLT001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:plate_id |
Well Row
| Name | well_row |
| Description | The row identifier in a 96-well plate indicating the sample's position. |
| Example | A |
| Reference | # |
| Namespace | ei:well_row |
Well Column
| Name | well_col |
| Description | The column identifier in a 96-well plate indicating the sample's position. |
| Example | 5 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:well_col |
Cell Phenotype
| Name | cell_phenotype |
| Description | The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells. |
| Example | CD41- |
| Reference | # |
| Namespace | ei:cell_phenotype |
| Allowed Values | CD41+ CD41- |
Design description
| Name | design_description |
| Description | The design of the library including details of how it was constructed. |
| Reference | # |
| Namespace | ei:design_description |
Library selection Required
| Name | library_selection |
| Description | The method used to select for or against, enrich, or screen the material being sequenced. |
| Example | RANDOM PCR |
| Reference | # |
| Namespace | ei:library_selection |
| Allowed Values | 5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified |
Library source Required
| Name | library_source |
| Description | The type of source material that is being sequenced. |
| Example | GENOMIC |
| Reference | # |
| Namespace | ei:library_source |
| Allowed Values | GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA |
Library strategy Required
| Name | library_strategy |
| Description | The sequencing technique intended for this library. |
| Example | RNA-Seq |
| Reference | # |
| Namespace | ei:library_strategy |
| Allowed Values | AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | https://w3id.org/mixs/0000016 |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ontology:sequencing_id |
Sequencing Platform Name Required
| Name | sequencing_platform_name |
| Description | The name of the sequencing platform used for the experiment. |
| Example | Pacbio |
| Reference | http://purl.obolibrary.org/obo/NCIT_C172274 |
| Namespace | ontology:sequencing_platform_name |
Sequencing Instrument Model Required
| Name | sequencing_instrument_model |
| Description | This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability. |
| Example | Illumina NovaSeq 6000 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Namespace | ontology:sequencing_instrument_model |
| Allowed Values | 454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100 |
Library Layout Required
| Name | lib_layout |
| Description | Specify whether to expect single, paired, or other configuration of reads for sequencing |
| Example | Paired |
| Reference | https://w3id.org/mixs/0000111 |
| Namespace | mixs:lib_layout |
| Allowed Values | Other Paired Single Vector |
UMI Barcode Read
| Name | umi_barcode_read |
| Description | The type of read that contains the Unique Molecular Identifier (UMI) barcode. |
| Example | index2 |
| Reference | # |
| Namespace | ei:umi_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
UMI Barcode Offset
| Name | umi_barcode_offset |
| Description | The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 0 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_offset |
UMI Barcode Size
| Name | umi_barcode_size |
| Description | The size of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_size |
Cell Barcode Read
| Name | cell_barcode_read |
| Description | The type of read that contains the UMI barcode. |
| Example | index1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010203 |
| Namespace | ontology:cell_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
Cell Barcode Offset
| Name | cell_barcode_offset |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 10 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010204 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_offset |
Cell Barcode Size
| Name | cell_barcode_size |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 0 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010205 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_size |
cDNA Read Required
| Name | cdna_read |
| Description | The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing. |
| Example | read1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010195 |
| Namespace | ontology:cdna_read |
| Allowed Values | index1 index2 read1 read2 |
cDNA Read Offset
| Name | cdna_read_offset |
| Description | The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences. |
| Example | 6 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010201 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_offset |
cDNA Read Size
| Name | cdna_read_size |
| Description | The size of the Complementary DNA (cDNA) read. |
| Example | 75 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010202 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_size |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File Derived From
| Name | file_derived_from |
| Description | The name of the file that was used to generate the analysis derived data. |
| Example | file1_sequencing.json |
| Reference | # |
| Namespace | ei:file_derived_from |
Inferred Cell Type
| Name | inferred_cell_type |
| Description | Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer. |
| Example | type II bipolar neuron |
| Reference | # |
| Namespace | ei:inferred_cell_type |
Post Analysis Cell Well Quality
| Name | post_analysis_cell_well_quality |
| Description | Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed. |
| Example | Pass |
| Reference | # |
| Namespace | ei:post_analysis_cell_well_quality |
| Allowed Values | Fail Pass |
Other Derived Cell Attributes
| Name | other_derived_cell_attributes |
| Description | Any other cell level measurement or annotation as result of the analysis. |
| Example | Cluster |
| Reference | # |
| Namespace | ei:other_derived_cell_attributes |
| Allowed Values | Cluster Count Gene UMI tSNE coordinates |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Reference Genome
| Name | reference_genome |
| Description | Indicate version and include stable link to genome data (or attach genome fasta file). |
| Example | GRCh38, https://example.org/grch38.fa |
| Reference | # |
| Namespace | ei:reference_genome |
Genome Annotation
| Name | genome_annotation |
| Description | Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis. |
| Example | Ensembl v101, https://example.org/ensembl_v101.gtf |
| Reference | # |
| Namespace | ei:genome_annotation |
Annotation Filtering
| Name | annotation_filtering |
| Description | Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.) |
| Example | Filtered to include only protein-coding genes |
| Reference | # |
| Namespace | ei:annotation_filtering |
Genes vs Exons
| Name | genes_vs_exons |
| Description | Quantification using whole gene intervals or exons. |
| Example | Exon quantification |
| Reference | # |
| Namespace | ei:genes_vs_exons |
Library Structure
| Name | library_structure |
| Description | seqspec format |
| Example | Single-cell 3' library |
| Reference | # |
| Namespace | ei:library_structure |
Mapping and Demultiplexing Software
| Name | mapping_and_demultiplexing_software |
| Description | Reads/UMI |
| Example | Cell Ranger 6.0.0 |
| Reference | # |
| Namespace | ei:mapping_and_demultiplexing_software |
Read Mapping Statistics
| Name | read_mapping_statistics |
| Description | Statistics of the Reads or Unique Molecular Identifier (UMI). |
| Example | 80% reads mapped to reference |
| Reference | # |
| Namespace | ei:read_mapping_statistics |
Sequencing Saturation
| Name | sequencing_saturation |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | 95% sequencing saturation |
| Reference | # |
| Namespace | ei:sequencing_saturation |
UMIs or Barcode Distribution QC
| Name | umis_barcode_distribution_qc |
| Description | Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied |
| Example | Threshold: 10 UMIs per barcode |
| Reference | # |
| Namespace | ei:umis_barcode_distribution_qc |
Cell or Non-Cell Filtering Strategy
| Name | cell_non_cell_filtering_strategy |
| Description | Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells. |
| Example | Threshold: 5 UMIs for cell detection |
| Reference | # |
| Namespace | ei:cell_non_cell_filtering_strategy |
Other Quality Filters Applied
| Name | other_quality_filters_applied |
| Description | Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc. |
| Example | Cells with >20% mitochondrial reads discarded |
| Reference | # |
| Namespace | ei:other_quality_filters_applied |
Ambient RNA QC
| Name | ambient_rna_qc |
| Description | Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA |
| Example | Ambient RNA removed if >5% UMIs in background barcodes |
| Reference | # |
| Namespace | ei:ambient_rna_qc |
Predicted Doublet Rate QC
| Name | predicted_doublet_rate_qc |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | Predicted doublet rate: 1.5% |
| Reference | # |
| Namespace | ei:predicted_doublet_rate_qc |
Individual Organism SNP Demultiplexing
| Name | individual_organism_snp_demultiplexing |
| Description | If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used |
| Example | SNP UMAP embedding using CellSNP |
| Reference | # |
| Namespace | ei:individual_organism_snp_demultiplexing |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Clustering Algorithm and Version
| Name | clustering_algorithm_and_version |
| Description | If compared/integrated with existing datasets |
| Example | Louvain 0.8.0 |
| Reference | # |
| Namespace | ei:clustering_algorithm_and_version |
Clustering Parameters
| Name | clustering_parameters |
| Description | If compared/integrated with existing datasets |
| Example | Resolution: 0.6, K-nearest neighbors: 10 |
| Reference | # |
| Namespace | ei:clustering_parameters |
Integration/Batch Correction
| Name | integration_batch_correction |
| Description | If compared/integrated with existing datasets |
| Example | Harmony v1.0 |
| Reference | # |
| Namespace | ei:integration_batch_correction |
Source Code
| Name | source_code |
| Description | If any newly developed code/software has been used in the processing and downstream analysis of the dataset. |
| Example | Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization. |
| Reference | # |
| Namespace | ei:source_code |
UMI Count Matrix
| Name | umi_count_matrix |
| Description | Gene x cell matrix with UMI counts for each gene in each cell. |
| Example | The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv. |
| Reference | # |
| Namespace | ei:umi_count_matrix |
Ensembl IDs
| Name | ensembl_ids |
| Description | Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata. |
| Example | ENSG00000139618 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:ensembl_ids |
Functional Gene Annotations
| Name | functional_gene_annotations |
| Description | Any functional annotation generated/used (gene names, GOs, structural domains, etc.). |
| Example | Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding). |
| Reference | # |
| Namespace | ei:functional_gene_annotations |
Protein Models
| Name | protein_models |
| Description | FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs. |
| Example | The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID. |
| Reference | # |
| Namespace | ei:protein_models |
Cell Metadata
| Name | cell_metadata |
| Description | Table mapping cell IDs to cluster/cell type/broad cell type annotations. |
| Example | Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv. |
| Reference | # |
| Namespace | ei:cell_metadata |
Cluster-Level Normalised Expression Tables
| Name | cluster_level_normalised_expression_tables |
| Description | Expression tables that show normalised gene expression at the cluster or cell-type level. |
| Example | Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv. |
| Reference | # |
| Namespace | ei:cluster_level_normalised_expression_tables |
Other Resource Files
| Name | other_resource_files |
| Description | Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags). |
| Example | Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv. |
| Reference | # |
| Namespace | ei:other_resource_files |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:sequencing_id |
Read 1 File Required
| Name | read_1_file |
| Description | The name or accession of the file that contains read 1. |
| Example | file1_r1.fastq.gz |
| Reference | # |
| Namespace | ei:read_1_file |
Read 2 File
| Name | read_2_file |
| Description | The name or accession of the file that contains read 2. |
| Example | file2_r2.fastq.gz |
| Reference | # |
| Namespace | ei:read_2_file |
Index 1 File
| Name | index_1_file |
| Description | The name of the file that contains index 1. |
| Example | file1_i1.fastq.gz |
| Reference | # |
| Namespace | ei:index_1_file |
Index 2 File
| Name | index_2_file |
| Description | The name of the file that contains index 2. |
| Example | file2_i2.fastq.gz |
| Reference | # |
| Namespace | ei:index_2_file |
Read 1 Checksum Required
| Name | read_1_file_checksum |
| Description | Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | f8d29e41a73b5c02de9a6fb314e7c8ad |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_1_file_checksum |
Read 2 Checksum
| Name | read_2_file_checksum |
| Description | Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | a3f4c1b29d8e57fa41b02de6c7f9ab83 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_2_file_checksum |
White List Barcode File
| Name | white_list_barcode_file |
| Description | A file containing the known cell barcodes in the dataset. |
| Example | barcodes.tsv |
| Reference | # |
| Namespace | ei:white_list_barcode_file |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Expression Data Process Setting ID Required
| Name | expression_data_process_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_process_setting_id |
Matrix Type
| Name | matrix_type |
| Description | Matrix Type |
| Example | raw_counts |
| Reference | # |
| Namespace | ei:matrix_type |
| Allowed Values | imputed log1p nomalised pseudobulk raw_counts scaled |
Reference Genome Required
| Name | reference_genome |
| Description | The associated reference genome |
| Example | https://reference-genome-example.com |
| Reference | # |
| Regex | ^((https?|ftp):\/\/[^\s|]+)(\|((https?|ftp):\/\/[^\s|]+))*$ |
| Namespace | ei:reference_genome |
Annotation Version
| Name | annotation_version |
| Description | The annotation version of the associated reference genome |
| Example | GENCODE v44 |
| Reference | # |
| Namespace | ei:annotation_version |
Normalisation Method
| Name | normalisation_method |
| Description | Any normalisation processing performed |
| Example | Log normalisation |
| Reference | # |
| Namespace | ei:normalisation_method |
| Allowed Values | Library Size Normalisation Log Normalisation SCNorm SCTransform scran |
Highly Variable Gene Selection (HVG)
| Name | highly_variable_gene_selection |
| Description | Number of Highly Variable Genes |
| Example | seurat_v3, n=2000 |
| Reference | # |
| Namespace | ei:highly_variable_gene_selection |
Dimensionality Reduction
| Name | dimensionality_reduction |
| Description | Method used to reduce dimensionality in the expression data |
| Example | PCA |
| Reference | # |
| Namespace | ei:dimensionality_reduction |
| Allowed Values | Diffusion Map ICA NMF PCA UMAP t-SNE |
Number of Nearest Neighbours
| Name | n_neighbours |
| Description | Number of nearest neighbours used to calculate cluster membership |
| Example | pca:50 |
| Reference | # |
| Namespace | ei:n_neighbours |
Clustering Algorithm
| Name | clustering_algorithm |
| Description | Algorithm used to create clusters |
| Reference | # |
| Namespace | ei:clustering_algorithm |
Clustering Resolution
| Name | clustering_resolution |
| Description | Resolution parameter |
| Example | 2.5 |
| Reference | # |
| Regex | ^([0-9]*[.])?[0-9]+ |
| Namespace | ei:clustering_resolution |
Clustering Distance Metric
| Name | clustering_distance_metric |
| Description | Metic used to calculate a points distance to others |
| Example | cosine |
| Reference | # |
| Namespace | ei:clustering_distance_metric |
| Allowed Values | cosine euclidean hamming jaccard manhatten mehalanobis |
Software Versions
| Name | software_versions |
| Description | Primary software packages used for analysis |
| Reference | # |
| Namespace | ei:software_versions |
Cell Type Annotation
| Name | cell-type annotation |
| Description | Tools and Databases used for cell annotation |
| Reference | # |
| Namespace | ei:cell-type annotation |
Generated by Pipeline
| Name | generated_by_pipeline |
| Description | URL of the deposited pipeline used to create this data |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | ei:generated_by_pipeline |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | expression_data_file_id |
| Description | A unique alphanumeric identifier for the expression data file |
| Example | EXPFILE001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric identifier for library preparation |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Expression Data Process Setting ID Required
| Name | expression_data_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_setting_id |
File Name Required
| Name | expression_data_file |
| Description | Expression data file name |
| Example | exp_file.csv |
| Reference | # |
| Namespace | ei:expression_data_file |
File md5 Checkshum Required
| Name | expression_data_file_checksum |
| Description | calculated md5 checksum for this file |
| Example | 9e4b7a23f6c1d0ab85f29c47e3d8a610 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:expression_data_file_checksum |
File Format Required
| Name | expression_data_file_format |
| Description | The format of the expression file, such as h5ad or rds |
| Example | csv |
| Reference | # |
| Namespace | ei:expression_data_file_format |
| Allowed Values | csv h5ad loom mtx rds |
Number of Cells
| Name | n_cells |
| Description | The number of cells represented in the expression data |
| Example | 4 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_cells |
Number of Genes
| Name | n_genes |
| Description | The number of genese represented in the expression data |
| Example | 50 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_genes |
File Size in Bytes
| Name | file_size_bytes |
| Description | Size of the file recorded in bytes |
| Example | 90 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:file_size_bytes |
Date Generated
| Name | date_generated |
| Description | Approximate date this expression data was generated |
| Example | 2024-10-14 |
| Reference | # |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | ei:date_generated |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Description
| Name | description |
| Description | A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication. |
| Example | This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine. |
| Reference | http://purl.org/dc/terms/description |
| Namespace | dcterms:description |
Material Required
| Name | material |
| Description | The type of material being described. |
| Example | Organism |
| Reference | # |
| Namespace | faang:material |
| Allowed Values | Cell culture Cell line Cell specimen Organism Organoid Pool of Specimens Single cell specimen Specimen from Organism |
Project Required
| Name | project |
| Description | State that the project is 'FAANG'. |
| Example | FAANG |
| Reference | # |
| Regex | ^FAANG$ |
| Namespace | faang:project |
Cell Enrichment Required
| Name | cell_enrichment |
| Description | The method by which specific cell populations are sorted or enriched, e.g. 'fluorescence-activated cell sorting (FACS)'. Please contact FAANG DCC to add more terms. |
| Example | Fluorescence-activated Cell Sorting (FACS) |
| Reference | # |
| Namespace | faang:cell_enrichment |
| Allowed Values | Bead-based sorting Cell culture Centrifugation Fluorescence-activated Cell Sorting (FACS) Magnetic levitation Raman-spectometry sorting, cell culture |
Licence
| Name | licence |
| Description | Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused. |
| Example | MIT |
| Reference | # |
| Namespace | ei:licence |
| Allowed Values | Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Orcid ID
| Name | orcid_id |
| Description | A 16-digit number that uniquely identify researchers. |
| Example | 0000-1234-5678-9012 |
| Reference | # |
| Regex | ^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$ |
| Namespace | ei:orcid_id |
First Name Required
| Name | givenName |
| Description | A first name (or given name) is the personal name given to an individual conducting the study. |
| Example | Jane |
| Reference | https://schema.org/givenName |
| Regex | ^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:givenName |
Last Name Required
| Name | familyName |
| Description | A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study. |
| Example | Doe |
| Reference | https://schema.org/familyName |
| Regex | ^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:familyName |
Email Address
| Name | |
| Description | A unique identifier used to send and receive electronic messages (emails) over the internet. |
| Example | jane.doe@example.com |
| Reference | https://schema.org/email |
| Regex | ^(?!.*\.{2,})(?!.*-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$ |
| Namespace | schema.org:email |
Affiliation or Institution Required
| Name | affiliation |
| Description | An organisation or institution that this person is associated with. |
| Example | University of Liverpool |
| Reference | https://schema.org/affiliation |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | schema.org:affiliation |
Funder
| Name | funder |
| Description | A person or organization that supports (sponsors) something through some kind of financial contribution. |
| Example | BBSRC |
| Reference | https://schema.org/funder |
| Namespace | schema.org:funder |
Grant Award
| Name | funding |
| Description | A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study. |
| Example | GRAK3489 |
| Reference | https://schema.org/funding |
| Regex | ^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$ |
| Namespace | schema.org:funding |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for the study if referring to |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Scientific Name or Organism
| Name | scientific_name |
| Description | The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification. |
| Example | Salvelinus alpinus |
| Reference | http://rs.tdwg.org/dwc/terms/scientificName |
| Regex | ^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$ |
| Namespace | ontology:scientific_name |
Taxon ID Required
| Name | taxon_id |
| Description | A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records. |
| Example | 8036 |
| Reference | http://rs.tdwg.org/dwc/terms/taxonID |
| Regex | ^[0-9]+$ |
| Namespace | ontology:taxon_id |
Biosample Accession Required
| Name | biosampleAccession |
| Description | A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets. |
| Example | SAMEA12907823 |
| Reference | http://purl.obolibrary.org/obo/T4FS_0000316 |
| Namespace | ontology:biosampleAccession |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Protocol Name Required
| Name | protocol_name |
| Description | A descriptive name of the protocol used for single-cell sequencing. |
| Example | 10X Genomics Single Cell 3' Library Prep |
| Reference | # |
| Namespace | ei:protocol_name |
Enrichment Markers
| Name | enrichment_markers |
| Description | Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms. |
| Example | CD45 |
| Reference | # |
| Namespace | faang:enrichment_markers |
Isolation Kit
| Name | isolation_kit |
| Description | The kit used to isolate the cells. |
| Example | 10x Nuclei Isolation Kit |
| Reference | # |
| Namespace | ei:isolation_kit |
| Allowed Values | 10x Nuclei Isolation Kit 3' standard throughput kit Custom |
Literature Source Reference
| Name | literature_source_reference |
| Description | Reference to literature sources that describe the protocol or methods used. |
| Example | Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview' |
| Reference | # |
| Namespace | ei:literature_source_reference |
Protocols IO Reference
| Name | protocols_io_reference |
| Description | Reference link to protocols.io for additional details on the protocol. |
| Example | https://www.protocols.io/view/sample-protocol-b2ubqesn |
| Reference | # |
| Regex | ^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)+(?: \| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)*)*$ |
| Namespace | ei:protocols_io_reference |
Single cell isolation protocol Required
| Name | single_cell_isolation_protocol |
| Description | Link to protocol describing how the single cells were separated into a single-cell suspension. |
| Example | https://api.faang.org/files/protocols/samples/INRAE_SOP_PLUS4PIGS_EMBRYOS_DISSOCIATION_PROTO4_20240710.pdf |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | faang:single_cell_isolation_protocol |
Workflowhub Sop Reference
| Name | workflow_hub_sop_reference |
| Description | Reference to the Standard Operating Procedure (SOP) in workflow hub. |
| Example | https://workflowhub.eu/works/12345 |
| Reference | # |
| Namespace | ei:workflow_hub_sop_reference |
Dissociation Protocol Method
| Name | dissociation_protocol_method |
| Description | The method used to dissociate tissues into single cells. |
| Example | Mechanical and enzymatic dissociation |
| Reference | # |
| Namespace | ei:dissociation_protocol_method |
Single Cell Quality Metric
| Name | single_cell_quality_metric |
| Description | Metrics used to assess the quality of single cells before sequencing. |
| Example | Cell viability percentage |
| Reference | # |
| Namespace | ei:single_cell_quality_metric |
Cell Type Required
| Name | cell_type |
| Description | Provide a cell type from the CL ontology. |
| Example | malignant cell |
| Reference | CL:0000000 |
| Regex | ^[A-Za-z\s]*[a-z]+$ |
| Namespace | faang:cell_type |
Tissue Dissociation Required
| Name | tissue_dissociation |
| Description | The method by which tissues are dissociated into purified or single cells in suspension. Examples are 'proteolysis', 'mesh passage', 'fine needle trituration'. For blood, milk and other fluids, where there is no tissue dissociation use 'fluids'. Please contact FAANG DCC to add more terms. |
| Example | Proteolysis |
| Reference | # |
| Namespace | faang:tissue_dissociation |
| Allowed Values | Fine needle trituration Fluids Mechanical dissociation Mesh passage Proteolysis |
Derived from Required
| Name | derived_from |
| Description | Sample name or BioSample ID for a specimen record. |
| Example | SSC_INRAE_GUT_ORGANOID_100I |
| Reference | # |
| Regex | ^[A-Za-z0-9_]+$ |
| Namespace | faang:derived_from |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the sample |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Sample ID Required
| Name | sample_id |
| Description | A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique. |
| Example | SAMPLE001 |
| Reference | # |
| Namespace | ei:sample_id |
Dissociation Protocol ID Required
| Name | dissociation_protocol_id |
| Description | A unique alphanumeric code for the dissociation protocol in the study |
| Example | DISSOC001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:dissociation_protocol_id |
Suspension Type Required
| Name | suspension_type |
| Description | The type of suspension used to keep cells in solution during processing. |
| Example | Cell |
| Reference | # |
| Namespace | ei:suspension_type |
| Allowed Values | Cell Nuclei Protoplast |
Purification Protocol Required
| Name | purification_protocol |
| Description | Link to protocol describing how the cells were purified. |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | faang:purification_protocol |
Cell Count
| Name | cell_count |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 10000 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cell_count |
Cell Number
| Name | cell_number |
| Description | An number representing the number of cells in the sequencing library. |
| Example | 101-10000 |
| Reference | # |
| Namespace | tol:cell_number |
| Allowed Values | 1 1000000+ 100001-500000 10001-50000 101-10000 11-50 2-10 500001-1000000 50001-100000 51-100 |
Cell Viability
| Name | cell_viability |
| Description | The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis. |
| Example | 95 |
| Reference | # |
| Namespace | ei:cell_viability |
Cell Viability Assessment Method
| Name | cell_viability_assessment_method |
| Description | The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques. |
| Example | Trypan Blue Exclusion |
| Reference | # |
| Namespace | ei:cell_viability_assessment_method |
Cell Size
| Name | cell_size |
| Description | The size of the cell, typically measured in micrometres. |
| Example | 10 |
| Reference | # |
| Namespace | ei:cell_size |
Suspension Volume (µL)
| Name | suspension_volume_µl |
| Description | The volume of the cell suspension in microlitres (µL). |
| Example | 100 |
| Reference | # |
| Namespace | ei:suspension_volume_µl |
Suspension Concentration Cells Per µL
| Name | suspension_concentration_cells_per_µl |
| Description | The concentration of cells in the suspension in microlitres (µL). |
| Example | 1000 |
| Reference | # |
| Namespace | ei:suspension_concentration_cells_per_µl |
Suspension Dilution
| Name | suspension_dilution |
| Description | The dilution factor of the cell suspension. |
| Example | 1:10 |
| Reference | # |
| Namespace | ei:suspension_dilution |
Loading Volume Μl
| Name | loading_volume_µl |
| Description | The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:loading_volume_µl |
Suspension Dilution Buffer
| Name | suspension_dilution_buffer |
| Description | A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing. |
| Example | PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin) |
| Reference | # |
| Namespace | ei:suspension_dilution_buffer |
Derived from Required
| Name | derived_from |
| Description | Sample name or BioSample ID for a specimen record. |
| Example | SAMEA112465628 |
| Reference | # |
| Regex | ^[A-Za-z0-9_]+$ |
| Namespace | faang:derived_from |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Cell Suspension ID Required
| Name | cell_suspension_id |
| Description | A unique alphanumeric code for the cell suspension for the library preparation. |
| Example | CELLSUSP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:cell_suspension_id |
Library Preparation Kit Required
| Name | library_prep_kit |
| Description | Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids |
| Example | 10X Genomics Single Cell 3' v3 |
| Reference | https://w3id.org/mixs/0001145 |
| Namespace | mixs:library_prep_kit |
Library Preparation Kit Version Required
| Name | library_prep_kit_version |
| Description | The version number of the library preparation kit used for sequencing. |
| Example | 2 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Regex | ^\d+(\.\d+)?$ |
| Namespace | ontology:library_prep_kit_version |
Amplification Method
| Name | amplification_method |
| Description | The method used to amplify the Complementary DNA (cDNA). |
| Example | PCR |
| Reference | # |
| Namespace | ei:amplification_method |
cDNA Amplification Cycles
| Name | cdna_amplification_cycles |
| Description | The number of cycles used during the Complementary DNA (cDNA) amplification process. |
| Example | 12 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:cdna_amplification_cycles |
Average Size Distribution
| Name | average_size_distribution |
| Description | The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing. |
| Example | 350 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:average_size_distribution |
Library Construction Method
| Name | lib_construction_method |
| Description | The library construction method (including version) that was used. |
| Example | Smart-Seq2 |
| Reference | # |
| Namespace | ei:lib_construction_method |
Input Molecule
| Name | input_molecule |
| Description | The specific fraction of biological macromolecule from which the sequencing library is derived. |
| Example | RNA |
| Reference | # |
| Namespace | ei:input_molecule |
Primer
Primeness
| Name | primeness |
| Description | The end from which the molecule was sequenced. |
| Example | 5' |
| Reference | # |
| Namespace | ei:primeness |
| Allowed Values | 3' 5' Both |
End Bias
| Name | end_bias |
| Description | The end bias of the library. |
| Example | 3 |
| Reference | # |
| Namespace | ei:end_bias |
| Allowed Values | 3 5 |
Library Strand
| Name | library_strand |
| Description | The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none. |
| Example | Antisense |
| Reference | # |
| Namespace | ei:library_strand |
| Allowed Values | Antisense Both Sense Unstranded |
Spike In
| Name | spike_in |
| Description | External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used. |
| Example | Yes |
| Reference | # |
| Namespace | ei:spike_in |
| Allowed Values | No Yes |
Spike Type
| Name | spike_type |
| Description | The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA. |
| Example | Synthetic RNA |
| Reference | # |
| Namespace | ei:spike_type |
Spike In Dilution Or Concentration
| Name | spike_in_dilution_or_concentration |
| Description | The final concentration or dilution (for commercial sets) of the spike in mix. |
| Example | 1:1000 |
| Reference | # |
| Namespace | ei:spike_in_dilution_or_concentration |
i5 Index Required
| Name | i5_index |
| Description | Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing. |
| Example | ATCACG |
| Reference | # |
| Namespace | ei:i5_index |
i7 Index Required
| Name | i7_index |
| Description | Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs. |
| Example | CGATGT |
| Reference | # |
| Namespace | ei:i7_index |
Dual or Single Index Required
| Name | dual_single_index |
| Description | Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing. |
| Example | Dual |
| Reference | # |
| Namespace | ei:dual_single_index |
| Allowed Values | Dual Single |
I5 Sequence Required
| Name | i5_sequence |
| Description | The nucleotide sequence of the i5 index used in multiplexing during sequencing. |
| Example | ATCGTAGC |
| Reference | # |
| Namespace | ei:i5_sequence |
i7 Sequence Required
| Name | i7_sequence |
| Description | The specific nucleotide sequence of the i7 index used for a sample. |
| Example | TGCATGCA |
| Reference | # |
| Namespace | ei:i7_sequence |
Plate ID
| Name | plate_id |
| Description | Identifier for the 96-well plate used in sample preparation. |
| Example | PLT001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:plate_id |
Well Row
| Name | well_row |
| Description | The row identifier in a 96-well plate indicating the sample's position. |
| Example | A |
| Reference | # |
| Namespace | ei:well_row |
Well Column
| Name | well_col |
| Description | The column identifier in a 96-well plate indicating the sample's position. |
| Example | 5 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:well_col |
Cell Phenotype
| Name | cell_phenotype |
| Description | The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells. |
| Example | CD41- |
| Reference | # |
| Namespace | ei:cell_phenotype |
| Allowed Values | CD41+ CD41- |
Pool Creation Date Required
| Name | pool_creation_date |
| Description | Date at which the pool was created. |
| Example | 2025-10-24 |
| Reference | # |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | faang:pool_creation_date |
Pool Creation Protocol Required
| Name | pool_creation_protocol |
| Description | A link to the protocol for pool of specimens creation. |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | faang:pool_creation_protocol |
Design description
| Name | design_description |
| Description | The design of the library including details of how it was constructed. |
| Reference | # |
| Namespace | ei:design_description |
Library selection Required
| Name | library_selection |
| Description | The method used to select for or against, enrich, or screen the material being sequenced. |
| Example | RANDOM PCR |
| Reference | # |
| Namespace | ei:library_selection |
| Allowed Values | 5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified |
Library source Required
| Name | library_source |
| Description | The type of source material that is being sequenced. |
| Example | GENOMIC |
| Reference | # |
| Namespace | ei:library_source |
| Allowed Values | GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA |
Library strategy Required
| Name | library_strategy |
| Description | The sequencing technique intended for this library. |
| Example | RNA-Seq |
| Reference | # |
| Namespace | ei:library_strategy |
| Allowed Values | AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | https://w3id.org/mixs/0000016 |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ontology:sequencing_id |
Sequencing Platform Name Required
| Name | sequencing_platform_name |
| Description | The name of the sequencing platform used for the experiment. |
| Example | Pacbio |
| Reference | http://purl.obolibrary.org/obo/NCIT_C172274 |
| Namespace | ontology:sequencing_platform_name |
Sequencing Instrument Model Required
| Name | sequencing_instrument_model |
| Description | This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability. |
| Example | Illumina NovaSeq 6000 |
| Reference | http://purl.obolibrary.org/obo/GENEPIO_0000149 |
| Namespace | ontology:sequencing_instrument_model |
| Allowed Values | 454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100 |
Library Layout Required
| Name | lib_layout |
| Description | Specify whether to expect single, paired, or other configuration of reads for sequencing |
| Example | Paired |
| Reference | https://w3id.org/mixs/0000111 |
| Namespace | mixs:lib_layout |
| Allowed Values | Other Paired Single Vector |
UMI Barcode Read
| Name | umi_barcode_read |
| Description | The type of read that contains the Unique Molecular Identifier (UMI) barcode. |
| Example | index2 |
| Reference | # |
| Namespace | ei:umi_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
UMI Barcode Offset
| Name | umi_barcode_offset |
| Description | The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 0 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_offset |
UMI Barcode Size
| Name | umi_barcode_size |
| Description | The size of the Unique Molecular Identifier (UMI) identifying barcode. |
| Example | 10 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:umi_barcode_size |
Cell Barcode Read
| Name | cell_barcode_read |
| Description | The type of read that contains the UMI barcode. |
| Example | index1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010203 |
| Namespace | ontology:cell_barcode_read |
| Allowed Values | index1 index2 read1 read2 |
Cell Barcode Offset
| Name | cell_barcode_offset |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 10 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010204 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_offset |
Cell Barcode Size
| Name | cell_barcode_size |
| Description | The offset in sequence of the cell identifying barcode. |
| Example | 0 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010205 |
| Regex | ^\d+$ |
| Namespace | ontology:cell_barcode_size |
cDNA Read Required
| Name | cdna_read |
| Description | The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing. |
| Example | read1 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010195 |
| Namespace | ontology:cdna_read |
| Allowed Values | index1 index2 read1 read2 |
cDNA Read Offset
| Name | cdna_read_offset |
| Description | The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences. |
| Example | 6 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010201 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_offset |
cDNA Read Size
| Name | cdna_read_size |
| Description | The size of the Complementary DNA (cDNA) read. |
| Example | 75 |
| Reference | http://www.ebi.ac.uk/efo/EFO_0010202 |
| Regex | ^\d+$ |
| Namespace | ontology:cdna_read_size |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File Derived From
| Name | file_derived_from |
| Description | The name of the file that was used to generate the analysis derived data. |
| Example | file1_sequencing.json |
| Reference | # |
| Namespace | ei:file_derived_from |
Inferred Cell Type
| Name | inferred_cell_type |
| Description | Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer. |
| Example | type II bipolar neuron |
| Reference | # |
| Namespace | ei:inferred_cell_type |
Post Analysis Cell Well Quality
| Name | post_analysis_cell_well_quality |
| Description | Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed. |
| Example | Pass |
| Reference | # |
| Namespace | ei:post_analysis_cell_well_quality |
| Allowed Values | Fail Pass |
Other Derived Cell Attributes
| Name | other_derived_cell_attributes |
| Description | Any other cell level measurement or annotation as result of the analysis. |
| Example | Cluster |
| Reference | # |
| Namespace | ei:other_derived_cell_attributes |
| Allowed Values | Cluster Count Gene UMI tSNE coordinates |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Reference Genome
| Name | reference_genome |
| Description | Indicate version and include stable link to genome data (or attach genome fasta file). |
| Example | GRCh38, https://example.org/grch38.fa |
| Reference | # |
| Namespace | ei:reference_genome |
Genome Annotation
| Name | genome_annotation |
| Description | Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis. |
| Example | Ensembl v101, https://example.org/ensembl_v101.gtf |
| Reference | # |
| Namespace | ei:genome_annotation |
Annotation Filtering
| Name | annotation_filtering |
| Description | Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.) |
| Example | Filtered to include only protein-coding genes |
| Reference | # |
| Namespace | ei:annotation_filtering |
Genes vs Exons
| Name | genes_vs_exons |
| Description | Quantification using whole gene intervals or exons. |
| Example | Exon quantification |
| Reference | # |
| Namespace | ei:genes_vs_exons |
Library Structure
| Name | library_structure |
| Description | seqspec format |
| Example | Single-cell 3' library |
| Reference | # |
| Namespace | ei:library_structure |
Mapping and Demultiplexing Software
| Name | mapping_and_demultiplexing_software |
| Description | Reads/UMI |
| Example | Cell Ranger 6.0.0 |
| Reference | # |
| Namespace | ei:mapping_and_demultiplexing_software |
Read Mapping Statistics
| Name | read_mapping_statistics |
| Description | Statistics of the Reads or Unique Molecular Identifier (UMI). |
| Example | 80% reads mapped to reference |
| Reference | # |
| Namespace | ei:read_mapping_statistics |
Sequencing Saturation
| Name | sequencing_saturation |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | 95% sequencing saturation |
| Reference | # |
| Namespace | ei:sequencing_saturation |
UMIs or Barcode Distribution QC
| Name | umis_barcode_distribution_qc |
| Description | Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied |
| Example | Threshold: 10 UMIs per barcode |
| Reference | # |
| Namespace | ei:umis_barcode_distribution_qc |
Cell or Non-Cell Filtering Strategy
| Name | cell_non_cell_filtering_strategy |
| Description | Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells. |
| Example | Threshold: 5 UMIs for cell detection |
| Reference | # |
| Namespace | ei:cell_non_cell_filtering_strategy |
Other Quality Filters Applied
| Name | other_quality_filters_applied |
| Description | Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc. |
| Example | Cells with >20% mitochondrial reads discarded |
| Reference | # |
| Namespace | ei:other_quality_filters_applied |
Ambient RNA QC
| Name | ambient_rna_qc |
| Description | Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA |
| Example | Ambient RNA removed if >5% UMIs in background barcodes |
| Reference | # |
| Namespace | ei:ambient_rna_qc |
Predicted Doublet Rate QC
| Name | predicted_doublet_rate_qc |
| Description | Depending on number of cells recovered (not targeted) and technology |
| Example | Predicted doublet rate: 1.5% |
| Reference | # |
| Namespace | ei:predicted_doublet_rate_qc |
Individual Organism SNP Demultiplexing
| Name | individual_organism_snp_demultiplexing |
| Description | If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used |
| Example | SNP UMAP embedding using CellSNP |
| Reference | # |
| Namespace | ei:individual_organism_snp_demultiplexing |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Clustering Algorithm and Version
| Name | clustering_algorithm_and_version |
| Description | If compared/integrated with existing datasets |
| Example | Louvain 0.8.0 |
| Reference | # |
| Namespace | ei:clustering_algorithm_and_version |
Clustering Parameters
| Name | clustering_parameters |
| Description | If compared/integrated with existing datasets |
| Example | Resolution: 0.6, K-nearest neighbors: 10 |
| Reference | # |
| Namespace | ei:clustering_parameters |
Integration/Batch Correction
| Name | integration_batch_correction |
| Description | If compared/integrated with existing datasets |
| Example | Harmony v1.0 |
| Reference | # |
| Namespace | ei:integration_batch_correction |
Source Code
| Name | source_code |
| Description | If any newly developed code/software has been used in the processing and downstream analysis of the dataset. |
| Example | Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization. |
| Reference | # |
| Namespace | ei:source_code |
UMI Count Matrix
| Name | umi_count_matrix |
| Description | Gene x cell matrix with UMI counts for each gene in each cell. |
| Example | The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv. |
| Reference | # |
| Namespace | ei:umi_count_matrix |
Ensembl IDs
| Name | ensembl_ids |
| Description | Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata. |
| Example | ENSG00000139618 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:ensembl_ids |
Functional Gene Annotations
| Name | functional_gene_annotations |
| Description | Any functional annotation generated/used (gene names, GOs, structural domains, etc.). |
| Example | Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding). |
| Reference | # |
| Namespace | ei:functional_gene_annotations |
Protein Models
| Name | protein_models |
| Description | FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs. |
| Example | The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID. |
| Reference | # |
| Namespace | ei:protein_models |
Cell Metadata
| Name | cell_metadata |
| Description | Table mapping cell IDs to cluster/cell type/broad cell type annotations. |
| Example | Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv. |
| Reference | # |
| Namespace | ei:cell_metadata |
Cluster-Level Normalised Expression Tables
| Name | cluster_level_normalised_expression_tables |
| Description | Expression tables that show normalised gene expression at the cluster or cell-type level. |
| Example | Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv. |
| Reference | # |
| Namespace | ei:cluster_level_normalised_expression_tables |
Other Resource Files
| Name | other_resource_files |
| Description | Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags). |
| Example | Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv. |
| Reference | # |
| Namespace | ei:other_resource_files |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | file_id |
| Description | A unique alphanumeric identifier for this file |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing. |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Sequencing ID Required
| Name | sequencing_id |
| Description | A unique alphanumeric reference or identifier for the sequencing protocol. |
| Example | SEQ001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:sequencing_id |
Read 1 File Required
| Name | read_1_file |
| Description | The name or accession of the file that contains read 1. |
| Example | file1_r1.fastq.gz |
| Reference | # |
| Namespace | ei:read_1_file |
Read 2 File
| Name | read_2_file |
| Description | The name or accession of the file that contains read 2. |
| Example | file2_r2.fastq.gz |
| Reference | # |
| Namespace | ei:read_2_file |
Index 1 File
| Name | index_1_file |
| Description | The name of the file that contains index 1. |
| Example | file1_i1.fastq.gz |
| Reference | # |
| Namespace | ei:index_1_file |
Index 2 File
| Name | index_2_file |
| Description | The name of the file that contains index 2. |
| Example | file2_i2.fastq.gz |
| Reference | # |
| Namespace | ei:index_2_file |
Read 1 Checksum Required
| Name | read_1_file_checksum |
| Description | Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | f8d29e41a73b5c02de9a6fb314e7c8ad |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_1_file_checksum |
Read 2 Checksum
| Name | read_2_file_checksum |
| Description | Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,). |
| Example | a3f4c1b29d8e57fa41b02de6c7f9ab83 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:read_2_file_checksum |
White List Barcode File
| Name | white_list_barcode_file |
| Description | A file containing the known cell barcodes in the dataset. |
| Example | barcodes.tsv |
| Reference | # |
| Namespace | ei:white_list_barcode_file |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
Expression Data Process Setting ID Required
| Name | expression_data_process_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_process_setting_id |
Matrix Type
| Name | matrix_type |
| Description | Matrix Type |
| Example | raw_counts |
| Reference | # |
| Namespace | ei:matrix_type |
| Allowed Values | imputed log1p nomalised pseudobulk raw_counts scaled |
Reference Genome Required
| Name | reference_genome |
| Description | The associated reference genome |
| Example | https://reference-genome-example.com |
| Reference | # |
| Regex | ^((https?|ftp):\/\/[^\s|]+)(\|((https?|ftp):\/\/[^\s|]+))*$ |
| Namespace | ei:reference_genome |
Annotation Version
| Name | annotation_version |
| Description | The annotation version of the associated reference genome |
| Example | GENCODE v44 |
| Reference | # |
| Namespace | ei:annotation_version |
Normalisation Method
| Name | normalisation_method |
| Description | Any normalisation processing performed |
| Example | Log normalisation |
| Reference | # |
| Namespace | ei:normalisation_method |
| Allowed Values | Library Size Normalisation Log Normalisation SCNorm SCTransform scran |
Highly Variable Gene Selection (HVG)
| Name | highly_variable_gene_selection |
| Description | Number of Highly Variable Genes |
| Example | seurat_v3, n=2000 |
| Reference | # |
| Namespace | ei:highly_variable_gene_selection |
Dimensionality Reduction
| Name | dimensionality_reduction |
| Description | Method used to reduce dimensionality in the expression data |
| Example | PCA |
| Reference | # |
| Namespace | ei:dimensionality_reduction |
| Allowed Values | Diffusion Map ICA NMF PCA UMAP t-SNE |
Number of Nearest Neighbours
| Name | n_neighbours |
| Description | Number of nearest neighbours used to calculate cluster membership |
| Example | pca:50 |
| Reference | # |
| Namespace | ei:n_neighbours |
Clustering Algorithm
| Name | clustering_algorithm |
| Description | Algorithm used to create clusters |
| Reference | # |
| Namespace | ei:clustering_algorithm |
Clustering Resolution
| Name | clustering_resolution |
| Description | Resolution parameter |
| Example | 2.5 |
| Reference | # |
| Regex | ^([0-9]*[.])?[0-9]+ |
| Namespace | ei:clustering_resolution |
Clustering Distance Metric
| Name | clustering_distance_metric |
| Description | Metic used to calculate a points distance to others |
| Example | cosine |
| Reference | # |
| Namespace | ei:clustering_distance_metric |
| Allowed Values | cosine euclidean hamming jaccard manhatten mehalanobis |
Software Versions
| Name | software_versions |
| Description | Primary software packages used for analysis |
| Reference | # |
| Namespace | ei:software_versions |
Cell Type Annotation
| Name | cell-type annotation |
| Description | Tools and Databases used for cell annotation |
| Reference | # |
| Namespace | ei:cell-type annotation |
Generated by Pipeline
| Name | generated_by_pipeline |
| Description | URL of the deposited pipeline used to create this data |
| Reference | # |
| Regex | ^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$ |
| Namespace | ei:generated_by_pipeline |
Study ID Required
| Name | study_id |
| Description | A unique alphanumeric identifier for this study |
| Example | STUDY001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:study_id |
File ID Required
| Name | expression_data_file_id |
| Description | A unique alphanumeric identifier for the expression data file |
| Example | EXPFILE001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_file_id |
Library Preparation ID Required
| Name | library_prep_id |
| Description | A unique alphanumeric identifier for library preparation |
| Example | LIBPREP001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:library_prep_id |
Expression Data Process Setting ID Required
| Name | expression_data_setting_id |
| Description | A unique alphanumeric identifier for the expression data process setting |
| Example | EXPSET001 |
| Reference | # |
| Regex | ^[a-zA-Z0-9]+$ |
| Namespace | ei:expression_data_setting_id |
File Name Required
| Name | expression_data_file |
| Description | Expression data file name |
| Example | exp_file.csv |
| Reference | # |
| Namespace | ei:expression_data_file |
File md5 Checkshum Required
| Name | expression_data_file_checksum |
| Description | calculated md5 checksum for this file |
| Example | 9e4b7a23f6c1d0ab85f29c47e3d8a610 |
| Reference | # |
| Regex | ^[0-9a-f]{32}$ |
| Namespace | ei:expression_data_file_checksum |
File Format Required
| Name | expression_data_file_format |
| Description | The format of the expression file, such as h5ad or rds |
| Example | csv |
| Reference | # |
| Namespace | ei:expression_data_file_format |
| Allowed Values | csv h5ad loom mtx rds |
Number of Cells
| Name | n_cells |
| Description | The number of cells represented in the expression data |
| Example | 4 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_cells |
Number of Genes
| Name | n_genes |
| Description | The number of genese represented in the expression data |
| Example | 50 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:n_genes |
File Size in Bytes
| Name | file_size_bytes |
| Description | Size of the file recorded in bytes |
| Example | 90 |
| Reference | # |
| Regex | ^\d+$ |
| Namespace | ei:file_size_bytes |
Date Generated
| Name | date_generated |
| Description | Approximate date this expression data was generated |
| Example | 2024-10-14 |
| Reference | # |
| Regex | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
| Namespace | ei:date_generated |