Single-cell Schemas

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Title Required

Name	title
Description	The title for your dataset. This will be displayed when search results including your data are shown. Often this will be the same as an associated publication.
Example	SARS-COV-2 drug repurposing - Caco2 cell line
Reference	#
Regex	^.{25,}$
Namespace	rembi:title

Description Required

Name	description
Description	Use this field to describe your dataset. This can be the abstract to an accompanying publication.
Example	High-throughput screening of repurposed drugs against SARS-CoV-2 in Caco-2 cells
Reference	http://purl.org/dc/terms/1.1/title
Regex	^.{25,}$
Namespace	rembi:description

Release Date Required

Name	private_until_date
Description	The date until which the data remains private and embargoed.
Example	2027-06-01T00:00:00
Reference	http://purl.obolibrary.org/obo/SLSO_0001056
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	rembi:private_until_date

Keywords Required

Name	keywords
Description	Keywords describing your data that can be used to aid search and classification.
Example	CRISPR
Reference	http://schema.org/keywords
Namespace	rembi:keywords

Licence

Name	licence
Description	The license under which the data are available.
Example	MIT License
Reference	http://purl.org/dc/elements/1.1/license
Namespace	rembi:licence
Allowed Values	Apache License 2.0 Creative Commons Attribution 4.0 International Creative Commons Attribution Share Alike 4.0 International Creative Commons Zero v1.0 Universal GNU General Public License v3.0 or later MIT License

Funding Statement

Name	funding_statement
Description	A description of how the data generation was funded.
Example	Data generation for this study was supported by a grant from the BBSRC, which funded annotation and analysis activities.
Reference	http://purl.obolibrary.org/obo/IAO_0000623
Namespace	rembi:funding_statement

Acknowledgements

Name	acknowledgements
Description	Any people or groups that should be acknowledged as part of the dataset.
Example	We acknowledge the contributions of the field research team at the University of Edinburgh, the sequencing support from the Earlham Institute, and funding provided by the BBSRC. Special thanks to local conservation volunteers for assistance in sample collection.
Reference	http://purl.obolibrary.org/obo/IAO_0000324
Namespace	rembi:acknowledgements

Rembi Version Required

Name	rembi_version
Description	The version of REMBI. The current version to be used is 1.5.
Example	1.5
Reference	#
Regex	^1\.5$
Namespace	rembi:rembi_version

Grant Reference

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Identifier Required

Name	identifier
Description	The identifier for the grant.
Example	12345
Reference	http://purl.org/dc/terms/identifier
Namespace	rembi:identifier

Funder Required

Name	funder
Description	The funding body provididing support.
Example	Biotechnology and Biological Sciences Research Council (BBSRC)
Reference	https://schema.org/funder
Namespace	rembi:funder

Publication

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Title Required

Name	title
Description	Title of associated publication.
Example	High-throughput drug screening identifies potential SARS-CoV-2 inhibitors in Caco2 cells
Reference	http://purl.org/dc/terms/1.1/title
Namespace	rembi:title

Authors

Name	authors
Description	Authors of the associated publication. Multiple authors should be listed in order of contribution. Each name should be formatted as Last name, First initial (e.g. Doe, J.). Separate multiple authors with commas.
Example	Doe J., Lee A., Gupta R., Zhao L., Thompson M.
Reference	http://purl.obolibrary.org/obo/GENEPIO_0001517
Namespace	rembi:authors

DOI

Name	doi
Description	A Digital Object Identifier (DOI) is a unique alphanumeric string assigned to a digital object, such as a journal article, dataset, or publication, to provide a permanent link to its location on the internet. It ensures reliable citation and access. The DOI should follow the standard format (e.g., 10.1234/example.doi) and link to the original source of the publication or data referenced.
Example	10.1038/s41586-020-2577-1
Reference	http://purl.obolibrary.org/obo/ONTOAVIDA_00000015
Regex	^10\.\d{4,9}/[-._;()/:A-Za-z0-9]+$
Namespace	rembi:doi

Year

Name	year
Description	Year of publication.
Example	2025
Reference	http://rs.tdwg.org/dwc/terms/year
Regex	^(19\|20)\d{2}$
Namespace	rembi:year

Pubmed ID

Name	pubmed_id
Description	PubMed identifier for the publication.
Example	32726801
Reference	http://purl.obolibrary.org/obo/MS_1000879
Regex	^\d{1,8}$
Namespace	rembi:pubmed_id

Link

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Link URL Required

Name	link_url
Description	The URL of a link relevant to the dataset.
Example	https://example.org/zebrafish-embryo
Reference	#
Namespace	rembi:link_url

Link Type

Name	link_type
Description	The type of the link.
Example	Dataset
Reference	#
Namespace	rembi:link_type

Link Description

Name	link_description
Description	The description of the linked content.
Example	Image analysis code
Reference	#
Namespace	rembi:link_description

Study Component

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Study Component ID Required

Name	study_component_id
Description	A unique alphanumeric identifier for the study component
Example	STUDYCOMP001
Reference	#
Namespace	ei:study_component_id

Name Required

Name	name
Description	The name of your study component.
Example	Confocal images
Reference	#
Namespace	rembi:name

Description Required

Name	description
Description	An explanation of your study component.
Example	Stitched max-projected fluorescent confocal images
Reference	#
Namespace	rembi:description

Annotations

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Annotation ID Required

Name	annotation_id
Description	A unique alphanumeric identifier for the image annotation record.
Example	ANNOT001
Reference	#
Namespace	rembi:annotation_id

Annotation Overview Required

Name	annotation_overview
Description	Short descriptive summary indicating the type of annotation and how it was generated
Example	Cell nuclei marked using DAPI staining.
Reference	#
Namespace	rembi:annotation_overview

File Type

Name	file_type
Description	The format of the annotation file.
Example	gff
Reference	http://purl.obolibrary.org/obo/SLSO_0001157
Namespace	rembi:file_type

Annotation Type

Name	annotation_type
Description	Defines the type of annotation (e.g., class_labels, bounding_boxes, counts, derived_annotations).
Example	geometrical_annotations
Reference	http://purl.obolibrary.org/obo/NCIT_C89919
Namespace	rembi:annotation_type
Allowed Values	bounding_boxes class_labels counts derived_annotations geometrical_annotations graphs other point_annotations segmentation_mask tracks weak_annotations

Annotation Method Required

Name	annotation_method
Description	Description of how the annotations where created. Including protocols used for consensus and quality assurance, if applicable.
Example	crowdsourced
Reference	#
Namespace	rembi:annotation_method

Annotation Criteria

Name	annotation_criteria
Description	Rules used to generate annotations
Example	only nuclei in focus were segmented
Reference	#
Namespace	rembi:annotation_criteria

Annotation Coverage

Name	annotation_coverage
Description	The proportion of images from the dataset that were annotated.
Example	All data that satisfied the Annotation Criteria were annotated.
Reference	#
Namespace	rembi:annotation_coverage

Annotation Confidence Level

Name	annotation_confidence_level
Description	Confidence on annotation accuracy
Example	more than 95% pixel consensus where multiple annotators independently segmented the same object
Reference	#
Namespace	rembi:annotation_confidence_level

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Person ID Required

Name	person_id
Description	A unique alphanumeric identifier for the author.
Example	PERSON001
Reference	#
Namespace	ei:person_id

Annotation ID Required

Name	annotation_id
Description	A unique alphanumeric identifier for the image annotation record.
Example	ANNOT001
Reference	#
Namespace	ei:annotation_id

Author First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Author Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	rembi:email

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{4}$
Namespace	rembi:orcid_id

Affiliation or Institution Required

Name	affiliation
Description	A URL to a public registry containing organisation information or the name of the organisation. A Research Organisation Registry (ROR) URL is recommended if a URL is provided.
Example	https://ror.org/018cxtf62
Reference	https://schema.org/affiliation
Namespace	rembi:affiliation

Role

Name	role
Description	Author role in the study. If multiple separate by pipe sybom
Example	Senior Bioinformatician
Reference	http://www.w3.org/2006/vcard/ns#role
Namespace	rembi:role

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique alphanumeric identifier for this sample
Example	SAMP001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism Required

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ei:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ei:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ei:biosampleAccession

Biological Entity Required

Name	biological_entity
Description	What is being imaged
Example	Drosophila endoderm
Reference	#
Namespace	rembi:biological_entity

Common Name

Name	common_name
Description	Common name
Example	rock worm
Reference	#
Namespace	rembi:common_name

Description

Name	description
Description	High level description of sample.
Example	Bronchial epithelial cell culture
Reference	#
Namespace	rembi:description

Intrinsic Variables

Name	intrinsic_variables
Description	Intrinsic (e.g. genetic) alteration if applicable
Example	stable overexpression of HIST1H2BJ-mCherry and LMNA
Reference	#
Namespace	rembi:intrinsic_variables

Extrinsic Variables

Name	extrinsic_variables
Description	External sample treatment (e.g. reagent) if applicable
Example	2-(9-oxoacridin-10-yl)acetic acid
Reference	#
Namespace	rembi:extrinsic_variables

Experimental Variables

Name	experimental_variables
Description	What is intentionally varied (e.g. time) between multiple entries in this study component
Example	Time
Reference	#
Namespace	rembi:experimental_variables

Specimen

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Specimen ID Required

Name	specimen_id
Description	A unique alphanumeric identifier for this specimen
Example	SPEC001
Reference	#
Namespace	ei:specimen_id

Sample ID Required

Name	sample_id
Description	A unique alphanumeric identifier for this sample
Example	SAMP001
Reference	#
Namespace	ei:sample_id

Study Component ID Required

Name	study_component_id
Description	A unique alphanumeric identifier for the study component
Example	STUDYCOMP001
Reference	#
Namespace	ei:study_component_id

Sample Preparation Required

Name	sample_preparation
Description	How the sample was prepared for imaging.
Example	Cells were cultured on poly-L-lysine treated coverslips. Culture media was aspirated, and coverslips were washed once with PBS. Cells were fixed by incubating for 10 min with 4 % formaldehyde/PBS, washed twice with PBS, and permeabilized by incubating (>3 h, -20°C) in 70 % ethanol. Cells were rehydrated by incubating (5 min, RT) with FISH wash buffer (10 % formamide, 2x SSC). For hybridization, coverslips were placed cell-coated side down on a 48μl drop containing 100 nM Quasar570-labelled probes complementary to one of REV-ERBα, CRY2, or TP53 transcripts (Biosearch Technologies) (see Table S6 for probe sequences), 0.1 g/ml dextran sulfate, 1 mg/ml E. coli tRNA, 2 mM VRC, 20 μg/ml BSA, 2x SSC, 10 % formamide and incubated (37°C, 20 h) in a sealed parafilm chamber. Coverslips were twice incubated (37°C, 30 min) in pre-warmed FISH wash buffer, then in PBS containing 0.5 μg/ml 4’,6-diamidino-2-phenylindole (DAPI) (5 min, RT), washed twice with PBS, dipped in water, air-dried, placed cell-coated side down on a drop of ProLong Diamond Antifade Mountant (Life Technologies), allowed to polymerize for 24 h in the dark and then sealed with nail varnish.
Reference	#
Namespace	rembi:sample_preparation

Growth Protocol

Name	growth_protocol
Description	How the specimen was grown, e.g. cell line cultures, crosses or plant growth.
Example	Cells grown on coverslips were fixed in ice-cold methanol at _20 _ C for 10 min. After blocking in 0.2% gelatine from cold-water fish (Sigma) in PBS (PBS/FSG) for 15 min, coverslips were incubated with primary antibodies in blocking solution for 1h. Following washes with 0.2% PBS/FSG, the cells were incubated with a 1:500 dilution of secondary antibodies for 1 h (donkey anti- mouse/rabbit/goat/sheep conjugated to Alexa 488 or Alexa 594; Molecular Probes or donkey anti-mouse conjugated to DyLight 405, Jackson ImmunoResearch). The cells were counterstained with 1 _g ml_1 Hoechst 33342 (Sigma) to visualize chromatin. After washing with 0.2% PBS/FSG, the coverslips were mounted on glass slides by inverting them into mounting solution (ProLong Gold antifade, Molecular Probes). The samples were allowed to cure for 24-48 h.
Reference	#
Namespace	rembi:growth_protocol

Image Acquisition

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Image Acquisition ID Required

Name	image_acquisition_id
Description	A unique alphanumeric identifier for the image acquisition
Example	IMGACQ001
Reference	#
Namespace	ei:image_acquisition_id

Specimen ID Required

Name	specimen_id
Description	A unique alphanumeric identifier for this specimen
Example	SPEC001
Reference	#
Namespace	ei:specimen_id

Image Method Required

Name	image_method
Description	What method was used to capture images.
Example	secondary_electron imaging
Reference	FBbi:00000222
Namespace	ei:image_method

Imaging Instrument Required

Name	imaging_instrument
Description	Description of the instrument used to capture the images.
Example	DeltaVision OMX V3 Blaze system (GE Healthcare) equipped with a 60x/1.42 NA PlanApo oil immersion objective (Olympus), pco.edge 5.5 sCMOS cameras (PCO) and 405, 488, 593 and 640 nm lasers
Reference	#
Namespace	rembi:imaging_instrument

Image Acquisition Parameters Required

Name	image_acquisition_parameters
Description	How the images were acquired, including instrument settings/parameters.
Example	Embryos were imaged on a Luxendo MuVi SPIM light-sheet microscope, using 30x magnification setting on the Nikon 10x/0.3 water objective. The 488 nm laser was used to image nuclei (His-GFP), and the 561 nm laser was used to image transcriptional dots (MCP-mCherry), both at 5% laser power. Exposure time for the green channel was 55 ms and exposure for the red channel was 70 ms. The line illumination tool was used to improve background levels and was set to 40 pixels.
Reference	#
Namespace	rembi:image_acquisition_parameters

Image Analysis

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Image Analysis ID Required

Name	image_analysis_id
Description	A unique alphanumeric identifier for the image analysis
Example	IMGANAL001
Reference	#
Namespace	ei:image_analysis_id

Study Component ID Required

Name	study_component_id
Description	A unique alphanumeric identifier for the study component
Example	STUDYCOMP001
Reference	#
Namespace	ei:study_component_id

Analysis Overview Required

Name	analysis_overview
Description	How image analysis was carried out.
Example	Each 3D-SIM image contained one nucleus (in a small number of cases multiple nuclei were present, which did not affect the analysis). The image analysis pipeline contained six main steps: bivalent skeleton tracing, trace fluorescence intensity quantification, HEI10 peak detection, HEI10 foci identification, HEI10 foci intensity quantification, and total bivalent intensity quantification. Note that the normalization steps used for foci identification differ from those used for foci intensity quantification; the former was intended to robustly identify foci from noisy traces, whilst the latter was used to carefully quantify foci HEI10 levels.
Reference	#
Namespace	rembi:analysis_overview

Image Correlation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

Image Correlation ID Required

Name	image_correlation_id
Description	A unique alphanumeric identifier for the image correlation
Example	IMGCORR001
Reference	#
Namespace	ei:image_correlation_id

Image Analysis ID Required

Name	image_analysis_id
Description	A unique alphanumeric identifier for the image analysis
Example	IMGANAL001
Reference	#
Namespace	ei:image_analysis_id

Spatial and Temporal Alignment Required

Name	spatial_and_temporal_alignment
Description	Method used to correlate images from different modalities (e.g. manual overlay, alignment algorithm etc)
Example	Alignment algorithm
Reference	#
Namespace	rembi:spatial_and_temporal_alignment

Fiducials Used Required

Name	fiducials_used
Description	Features from correlated datasets used for colocalisation
Example	Fluorescent bead markers
Reference	#
Namespace	rembi:fiducials_used

Transformation Matrix or Other Information Required

Name	transformation_matrix
Description	Correlation transformations
Example	Translation and rotation matrix applied using ImageJ plugin
Reference	#
Namespace	rembi:transformation_matrix

File Level Metadata

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	FILE001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:file_id

Study Component ID Required

Name	study_component_id
Description	A unique alphanumeric identifier for the study component
Example	STUDYCOMP001
Reference	#
Namespace	ei:study_component_id

Annotation ID Required

Name	annotation_id
Description	A unique alphanumeric identifier for the image annotation record.
Example	ANNOT001
Reference	#
Namespace	rembi:annotation_id

Image File name Required

Name	source_image_id
Description	The file name of the image including the extension. Common file names end with tiff, jpeg, png, gif, bmp, and ome-tiff etc.
Example	file001.png
Reference	#
Namespace	rembi:source_image_id

Transformations

Name	transformations
Description	Any preprocessing or transformations applied to the image.
Example	z-stack flattening
Reference	#
Namespace	rembi:transformations

Spatial Information

Name	spatial_information
Description	Spatial resolution, scale, or coordinate info related to the image.
Example	pixel_size=0.5µm
Reference	#
Namespace	rembi:spatial_information

Annotation Creation Time

Name	annotation_creation_time
Description	Timestamp of when the annotation was created.
Example	2025-05-15T14:32:00Z
Reference	#
Regex	^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z$
Namespace	rembi:annotation_creation_time

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Title Required

Name	title
Description	A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Spatial Transcriptomics FISH of Human Lung Tissue
Reference	http://purl.org/dc/terms/title
Namespace	dcterms:title

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Spatial Transcriptomics
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{4}$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMP001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Imaging Protocol

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

Platform Required

Name	platform
Description	The platform used to isolate the cells.
Example	Illumina NovaSeq
Reference	#
Namespace	ei:platform

Instrument Required

Name	instrument
Description	The instrument used to isolate the cells.
Example	Illumina NovaSeq 6000
Reference	#
Namespace	ei:instrument

Target Probe Code Required

Name	target_probe_code
Description	The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell.
Example	Oligo-dT
Reference	#
Namespace	ei:target_probe_code

Section Thickness (µm)

Name	section_thickness_µm
Description	The thickness of the tissue section in micrometres.
Example	10
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:section_thickness_µm

Section Thickness Measurement Method

Name	section_thickness_measurement_method
Description	The method used to measure tissue section thickness.
Example	Microtome
Reference	#
Namespace	ei:section_thickness_measurement_method

Section Thickness Temperature

Name	section_thickness_temperature
Description	The temperature at which the section was made in degree celsius.
Example	22
Reference	#
Regex	^-?\d+(\.\d+)?$
Namespace	ei:section_thickness_temperature

Is Pathological

Name	is_pathological
Description	A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue.
Example	No
Reference	#
Namespace	ei:is_pathological
Allowed Values	No Yes

Photobleaching Duration In Hours

Name	photobleaching_duration_in_hours
Description	The duration of photobleaching in hours
Example	2
Reference	#
Regex	^\d+$
Namespace	ei:photobleaching_duration_in_hours

Clearing with ProteinaseK Required

Name	clearing_with_proteinasek
Description	The duration of clearing at 47°C with Proteinase K.
Example	24 hrs
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_with_proteinasek

Clearing without ProteinaseK Required

Name	clearing_without_proteinasek
Description	The duration of tissue clearing at 37°C without Proteinase K.
Example	4.5 days
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_without_proteinasek

Instrument User Guide Required

Name	instrument_user_guide
Description	The user guide for the instrument used.
Example	User Guide
Reference	#
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ei:instrument_user_guide

Instrument User Guide Revision Required

Name	instrument_user_guide_revision
Description	The revision of the instrument user guide.
Example	1.2
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:instrument_user_guide_revision

Sample Preparation Guide Required

Name	sample_preparation_guide
Description	The guide used for sample preparation.
Example	example_guide_v1.0.pdf
Reference	#
Regex	^[A-Za-z0-9._-]*[a-z]+$
Namespace	ei:sample_preparation_guide

Sample Preparation Guide Revision Required

Name	sample_preparation_guide_revision
Description	The revision of the sample preparation guide.
Example	1.0
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:sample_preparation_guide_revision

Deviations From Official Protocol Required

Name	deviations_from_official_protocol
Description	Any deviations from the official protocol. Separate individual deviations with '\|'.
Example	Temperature exceeded 25°C during storage \| Sample handling delayed by 2 hours
Reference	#
Namespace	ei:deviations_from_official_protocol

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	FILE001
Reference	#
Namespace	ei:file_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

File Name Required

Name	file_name
Description	A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	file001.tiff
Reference	#
Namespace	ei:file_name

File Type Required

Name	file_type
Description	A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	tiff
Reference	#
Namespace	ei:file_type

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Project Name Required

Name	project_name
Description	Official name of the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Spatial Transcriptomics FISH of Human Lung Tissue
Reference	https://w3id.org/mixs/0000092
Namespace	mixs:project_name

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Spatial Transcriptomics
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{4}$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMP001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Imaging Protocol

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

Platform Required

Name	platform
Description	The platform used to isolate the cells.
Example	Illumina NovaSeq
Reference	#
Namespace	ei:platform

Instrument Required

Name	instrument
Description	The instrument used to isolate the cells.
Example	Illumina NovaSeq 6000
Reference	#
Namespace	ei:instrument

Target Probe Code Required

Name	target_probe_code
Description	The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell.
Example	Oligo-dT
Reference	#
Namespace	ei:target_probe_code

Section Thickness (µm)

Name	section_thickness_µm
Description	The thickness of the tissue section in micrometres.
Example	10
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:section_thickness_µm

Section Thickness Measurement Method

Name	section_thickness_measurement_method
Description	The method used to measure tissue section thickness.
Example	Microtome
Reference	#
Namespace	ei:section_thickness_measurement_method

Section Thickness Temperature

Name	section_thickness_temperature
Description	The temperature at which the section was made in degree celsius.
Example	22
Reference	#
Regex	^-?\d+(\.\d+)?$
Namespace	ei:section_thickness_temperature

Is Pathological

Name	is_pathological
Description	A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue.
Example	No
Reference	#
Namespace	ei:is_pathological
Allowed Values	No Yes

Photobleaching Duration In Hours

Name	photobleaching_duration_in_hours
Description	The duration of photobleaching in hours
Example	2
Reference	#
Regex	^\d+$
Namespace	ei:photobleaching_duration_in_hours

Clearing with ProteinaseK Required

Name	clearing_with_proteinasek
Description	The duration of clearing at 47°C with Proteinase K.
Example	24 hrs
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_with_proteinasek

Clearing without ProteinaseK Required

Name	clearing_without_proteinasek
Description	The duration of tissue clearing at 37°C without Proteinase K.
Example	4.5 days
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_without_proteinasek

Instrument User Guide Required

Name	instrument_user_guide
Description	The user guide for the instrument used.
Example	User Guide
Reference	#
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ei:instrument_user_guide

Instrument User Guide Revision Required

Name	instrument_user_guide_revision
Description	The revision of the instrument user guide.
Example	1.2
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:instrument_user_guide_revision

Sample Preparation Guide Required

Name	sample_preparation_guide
Description	The guide used for sample preparation.
Example	example_guide_v1.0.pdf
Reference	#
Regex	^[A-Za-z0-9._-]*[a-z]+$
Namespace	ei:sample_preparation_guide

Sample Preparation Guide Revision Required

Name	sample_preparation_guide_revision
Description	The revision of the sample preparation guide.
Example	1.0
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:sample_preparation_guide_revision

Deviations From Official Protocol Required

Name	deviations_from_official_protocol
Description	Any deviations from the official protocol. Separate individual deviations with '\|'.
Example	Temperature exceeded 25°C during storage \| Sample handling delayed by 2 hours
Reference	#
Namespace	ei:deviations_from_official_protocol

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	FILE001
Reference	#
Namespace	ei:file_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

File Name Required

Name	file_name
Description	A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	file001.tiff
Reference	#
Namespace	ei:file_name

File Type Required

Name	file_type
Description	A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	tiff
Reference	#
Namespace	ei:file_type

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Title

Name	title
Description	A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Spatial Transcriptomics FISH of Human Lung Tissue
Reference	http://purl.org/dc/terms/title
Namespace	dcterms:title

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Spatial Transcriptomics
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{4}$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique alphanumeric reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMP001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Imaging Protocol

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

Platform Required

Name	platform
Description	The platform used to isolate the cells.
Example	Illumina NovaSeq
Reference	#
Namespace	ei:platform

Instrument Required

Name	instrument
Description	The instrument used to isolate the cells.
Example	Illumina NovaSeq 6000
Reference	#
Namespace	ei:instrument

Target Probe Code Required

Name	target_probe_code
Description	The type of probes used to detect and quantify specific RNA molecules in their native spatial context within a tissue or cell.
Example	Oligo-dT
Reference	#
Namespace	ei:target_probe_code

Section Thickness (µm)

Name	section_thickness_µm
Description	The thickness of the tissue section in micrometres.
Example	10
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:section_thickness_µm

Section Thickness Measurement Method

Name	section_thickness_measurement_method
Description	The method used to measure tissue section thickness.
Example	Microtome
Reference	#
Namespace	ei:section_thickness_measurement_method

Section Thickness Temperature

Name	section_thickness_temperature
Description	The temperature at which the section was made in degree celsius.
Example	22
Reference	#
Regex	^-?\d+(\.\d+)?$
Namespace	ei:section_thickness_temperature

Is Pathological

Name	is_pathological
Description	A quality inhering in a bearer by virtue of the bearer's being abnormal and having a destructive effect on living tissue.
Example	No
Reference	#
Namespace	ei:is_pathological
Allowed Values	No Yes

Photobleaching Duration In Hours

Name	photobleaching_duration_in_hours
Description	The duration of photobleaching in hours
Example	2
Reference	#
Regex	^\d+$
Namespace	ei:photobleaching_duration_in_hours

Clearing with ProteinaseK Required

Name	clearing_with_proteinasek
Description	The duration of clearing at 47°C with Proteinase K.
Example	24 hrs
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_with_proteinasek

Clearing without ProteinaseK Required

Name	clearing_without_proteinasek
Description	The duration of tissue clearing at 37°C without Proteinase K.
Example	4.5 days
Reference	#
Regex	^\d+(\.\d+)?\s*(hrs?\|days?\|mins?\|seconds?)$
Namespace	ei:clearing_without_proteinasek

Instrument User Guide Required

Name	instrument_user_guide
Description	The user guide for the instrument used.
Example	User Guide
Reference	#
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ei:instrument_user_guide

Instrument User Guide Revision Required

Name	instrument_user_guide_revision
Description	The revision of the instrument user guide.
Example	1.2
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:instrument_user_guide_revision

Sample Preparation Guide Required

Name	sample_preparation_guide
Description	The guide used for sample preparation.
Example	example_guide_v1.0.pdf
Reference	#
Regex	^[A-Za-z0-9._-]*[a-z]+$
Namespace	ei:sample_preparation_guide

Sample Preparation Guide Revision Required

Name	sample_preparation_guide_revision
Description	The revision of the sample preparation guide.
Example	1.0
Reference	#
Regex	^\d+(\.\d+)?$
Namespace	ei:sample_preparation_guide_revision

Deviations From Official Protocol Required

Name	deviations_from_official_protocol
Description	Any deviations from the official protocol. Separate individual deviations with '\|'.
Example	Temperature exceeded 25°C during storage \| Sample handling delayed by 2 hours
Reference	#
Namespace	ei:deviations_from_official_protocol

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	FILE001
Reference	#
Namespace	ei:file_id

Imaging Protocol ID Required

Name	imaging_protocol_id
Description	A unique alphanumeric identifier for the imaging protocol.
Example	IMGPRO001
Reference	#
Namespace	ei:imaging_protocol_id

File Name Required

Name	file_name
Description	A file name is used to uniquely identify a data file related to the study. Common file names end with tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	file001.tiff
Reference	#
Namespace	ei:file_name

File Type Required

Name	file_type
Description	A file type is a name given to a specific kind of file. Common file types are tiff, jpeg, png, gif, bmp and ome-tiff etc.
Example	tiff
Reference	#
Namespace	ei:file_type

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Title Required

Name	title
Description	A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Study of single cells in the human body
Reference	http://purl.org/dc/terms/title
Namespace	dcterms:title

Description Required

Name	description
Description	A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication.
Example	This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine.
Reference	http://purl.org/dc/terms/description
Namespace	dcterms:description

Bibliographic Citation Required

Name	bibliographicCitation
Description	A citation for the study resource, following a standard format.
Example	Doe J., et al. (2024). Single Cell Transcriptomic Analysis of Human Liver Cells. Journal of Cellular Biology.
Reference	http://purl.org/dc/terms/bibliographicCitation
Namespace	dcterms:bibliographicCitation

Created Required

Name	created
Description	The date when the study was created or registered.
Example	2024-10-14
Reference	http://purl.org/dc/terms/created
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	dcterms:created

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Laser microdissection
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Technology Required

Name	technology
Description	The sorting or visualisation technology used.
Example	Vizgen
Reference	#
Namespace	ei:technology

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Regex	^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Dissociation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Protocol Name Required

Name	protocol_name
Description	A descriptive name of the protocol used for single-cell sequencing.
Example	10X Genomics Single Cell 3' Library Prep
Reference	#
Namespace	ei:protocol_name

Dissociation Description Required

Name	dissociation_description
Description	A free-text description of the process used to separate cells from tissues or cell aggregates.
Example	Tissue was enzymatically dissociated using collagenase for 30 minutes.
Reference	#
Namespace	ei:dissociation_description

Enrichment Markers

Name	enrichment_markers
Description	Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms.
Example	CD45
Reference	#
Namespace	faang:enrichment_markers

Isolation Kit

Name	isolation_kit
Description	The kit used to isolate the cells.
Example	10x Nuclei Isolation Kit
Reference	#
Namespace	ei:isolation_kit
Allowed Values	10x Nuclei Isolation Kit 3' standard throughput kit Custom

Literature Source Reference

Name	literature_source_reference
Description	Reference to literature sources that describe the protocol or methods used.
Example	Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview'
Reference	#
Namespace	ei:literature_source_reference

Protocols IO Reference

Name	protocols_io_reference
Description	Reference link to protocols.io for additional details on the protocol.
Example	https://www.protocols.io/view/sample-protocol-b2ubqesn
Reference	#
Regex	^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])+(?: \\| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]))$
Namespace	ei:protocols_io_reference

Workflowhub Sop Reference

Name	workflow_hub_sop_reference
Description	Reference to the Standard Operating Procedure (SOP) in workflow hub.
Example	https://workflowhub.eu/works/12345
Reference	#
Namespace	ei:workflow_hub_sop_reference

Dissociation Protocol Method

Name	dissociation_protocol_method
Description	The method used to dissociate tissues into single cells.
Example	Mechanical and enzymatic dissociation
Reference	#
Namespace	ei:dissociation_protocol_method

Single Cell Quality Metric

Name	single_cell_quality_metric
Description	Metrics used to assess the quality of single cells before sequencing.
Example	Cell viability percentage
Reference	#
Namespace	ei:single_cell_quality_metric

Cell Suspension

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the sample
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Suspension Type Required

Name	suspension_type
Description	The type of suspension used to keep cells in solution during processing.
Example	Cell
Reference	#
Namespace	ei:suspension_type
Allowed Values	Cell Nuclei Protoplast

Cell Count

Name	cell_count
Description	An number representing the number of cells in the sequencing library.
Example	10000
Reference	#
Regex	^\d+$
Namespace	ei:cell_count

Cell Viability

Name	cell_viability
Description	The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis.
Example	95
Reference	#
Namespace	ei:cell_viability

Cell Viability Assessment Method

Name	cell_viability_assessment_method
Description	The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques.
Example	Trypan Blue Exclusion
Reference	#
Namespace	ei:cell_viability_assessment_method

Cell Size

Name	cell_size
Description	The size of the cell, typically measured in micrometres.
Example	10
Reference	#
Namespace	ei:cell_size

Suspension Volume (µL)

Name	suspension_volume_µl
Description	The volume of the cell suspension in microlitres (µL).
Example	100
Reference	#
Namespace	ei:suspension_volume_µl

Suspension Concentration Cells Per µL

Name	suspension_concentration_cells_per_µl
Description	The concentration of cells in the suspension in microlitres (µL).
Example	1000
Reference	#
Namespace	ei:suspension_concentration_cells_per_µl

Suspension Dilution

Name	suspension_dilution
Description	The dilution factor of the cell suspension.
Example	1:10
Reference	#
Namespace	ei:suspension_dilution

Loading Volume Μl

Name	loading_volume_µl
Description	The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:loading_volume_µl

Suspension Dilution Buffer

Name	suspension_dilution_buffer
Description	A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing.
Example	PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin)
Reference	#
Namespace	ei:suspension_dilution_buffer

Library Preparation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the library preparation.
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Library Preparation Kit Required

Name	library_prep_kit
Description	Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids
Example	10X Genomics Single Cell 3' v3
Reference	https://w3id.org/mixs/0001145
Namespace	mixs:library_prep_kit

Library Preparation Kit Version Required

Name	library_prep_kit_version
Description	The version number of the library preparation kit used for sequencing.
Example	2
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Regex	^\d+(\.\d+)?$
Namespace	ontology:library_prep_kit_version

Amplification Method

Name	amplification_method
Description	The method used to amplify the Complementary DNA (cDNA).
Example	PCR
Reference	#
Namespace	ei:amplification_method

cDNA Amplification Cycles

Name	cdna_amplification_cycles
Description	The number of cycles used during the Complementary DNA (cDNA) amplification process.
Example	12
Reference	#
Regex	^\d+$
Namespace	ei:cdna_amplification_cycles

Average Size Distribution

Name	average_size_distribution
Description	The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing.
Example	350
Reference	#
Regex	^\d+$
Namespace	ei:average_size_distribution

Library Construction Method

Name	lib_construction_method
Description	The library construction method (including version) that was used.
Example	Smart-Seq2
Reference	#
Namespace	ei:lib_construction_method

Input Molecule

Name	input_molecule
Description	The specific fraction of biological macromolecule from which the sequencing library is derived.
Example	RNA
Reference	#
Namespace	ei:input_molecule

Primer

Name	primer
Description	The type of primer used for reverse transcription. This allows users to identify content of the cDNA library input for mRNA.
Example	Random
Reference	#
Namespace	ei:primer
Allowed Values	Oligo-dT Random

Primeness Required

Name	primeness
Description	The end from which the molecule was sequenced.
Example	5'
Reference	#
Namespace	ei:primeness
Allowed Values	3' 5' Both

End Bias

Name	end_bias
Description	The end bias of the library.
Example	3
Reference	#
Namespace	ei:end_bias
Allowed Values	3 5

Library Strand

Name	library_strand
Description	The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none.
Example	Antisense
Reference	#
Namespace	ei:library_strand
Allowed Values	Antisense Both Sense Unstranded

Spike In Required

Name	spike_in
Description	External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used.
Example	Yes
Reference	#
Namespace	ei:spike_in
Allowed Values	No Yes

Spike Type

Name	spike_type
Description	The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA.
Example	Synthetic RNA
Reference	#
Namespace	ei:spike_type

Spike In Dilution Or Concentration

Name	spike_in_dilution_or_concentration
Description	The final concentration or dilution (for commercial sets) of the spike in mix.
Example	1:1000
Reference	#
Namespace	ei:spike_in_dilution_or_concentration

i5 Index Required

Name	i5_index
Description	Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing.
Example	ATCACG
Reference	#
Namespace	ei:i5_index

i7 Index Required

Name	i7_index
Description	Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs.
Example	CGATGT
Reference	#
Namespace	ei:i7_index

Dual or Single Index Required

Name	dual_single_index
Description	Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing.
Example	Dual
Reference	#
Namespace	ei:dual_single_index
Allowed Values	Dual Single

I5 Sequence Required

Name	i5_sequence
Description	The nucleotide sequence of the i5 index used in multiplexing during sequencing.
Example	ATCGTAGC
Reference	#
Namespace	ei:i5_sequence

i7 Sequence Required

Name	i7_sequence
Description	The specific nucleotide sequence of the i7 index used for a sample.
Example	TGCATGCA
Reference	#
Namespace	ei:i7_sequence

Plate ID

Name	plate_id
Description	Identifier for the 96-well plate used in sample preparation.
Example	PLT001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:plate_id

Well Row

Name	well_row
Description	The row identifier in a 96-well plate indicating the sample's position.
Example	A
Reference	#
Namespace	ei:well_row

Well Column

Name	well_col
Description	The column identifier in a 96-well plate indicating the sample's position.
Example	5
Reference	#
Regex	^\d+$
Namespace	ei:well_col

Cell Phenotype

Name	cell_phenotype
Description	The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells.
Example	CD41-
Reference	#
Namespace	ei:cell_phenotype
Allowed Values	CD41+ CD41-

Design description

Name	design_description
Description	The design of the library including details of how it was constructed.
Reference	#
Namespace	ei:design_description

Library selection Required

Name	library_selection
Description	The method used to select for or against, enrich, or screen the material being sequenced.
Example	RANDOM PCR
Reference	#
Namespace	ei:library_selection
Allowed Values	5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified

Library source Required

Name	library_source
Description	The type of source material that is being sequenced.
Example	GENOMIC
Reference	#
Namespace	ei:library_source
Allowed Values	GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA

Library strategy Required

Name	library_strategy
Description	The sequencing technique intended for this library.
Example	RNA-Seq
Reference	#
Namespace	ei:library_strategy
Allowed Values	AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq

Sequencing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	https://w3id.org/mixs/0000016
Regex	^[a-zA-Z0-9]+$
Namespace	ontology:sequencing_id

Sequencing Platform Name Required

Name	sequencing_platform_name
Description	The name of the sequencing platform used for the experiment.
Example	Pacbio
Reference	http://purl.obolibrary.org/obo/NCIT_C172274
Namespace	ontology:sequencing_platform_name

Sequencing Instrument Model Required

Name	sequencing_instrument_model
Description	This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability.
Example	Illumina NovaSeq 6000
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Namespace	ontology:sequencing_instrument_model
Allowed Values	454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100

Library Layout Required

Name	lib_layout
Description	Specify whether to expect single, paired, or other configuration of reads for sequencing
Example	Paired
Reference	https://w3id.org/mixs/0000111
Namespace	mixs:lib_layout
Allowed Values	Other Paired Single Vector

UMI Barcode Read

Name	umi_barcode_read
Description	The type of read that contains the Unique Molecular Identifier (UMI) barcode.
Example	index2
Reference	#
Namespace	ei:umi_barcode_read
Allowed Values	index1 index2 read1 read2

UMI Barcode Offset

Name	umi_barcode_offset
Description	The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode.
Example	0
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_offset

UMI Barcode Size

Name	umi_barcode_size
Description	The size of the Unique Molecular Identifier (UMI) identifying barcode.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_size

Cell Barcode Read

Name	cell_barcode_read
Description	The type of read that contains the UMI barcode.
Example	index1
Reference	http://www.ebi.ac.uk/efo/EFO_0010203
Namespace	ontology:cell_barcode_read
Allowed Values	index1 index2 read1 read2

Cell Barcode Offset

Name	cell_barcode_offset
Description	The offset in sequence of the cell identifying barcode.
Example	10
Reference	http://www.ebi.ac.uk/efo/EFO_0010204
Regex	^\d+$
Namespace	ontology:cell_barcode_offset

Cell Barcode Size

Name	cell_barcode_size
Description	The offset in sequence of the cell identifying barcode.
Example	0
Reference	http://www.ebi.ac.uk/efo/EFO_0010205
Regex	^\d+$
Namespace	ontology:cell_barcode_size

cDNA Read Required

Name	cdna_read
Description	The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing.
Example	read1
Reference	http://www.ebi.ac.uk/efo/EFO_0010195
Namespace	ontology:cdna_read
Allowed Values	index1 index2 read1 read2

cDNA Read Offset

Name	cdna_read_offset
Description	The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences.
Example	6
Reference	http://www.ebi.ac.uk/efo/EFO_0010201
Regex	^\d+$
Namespace	ontology:cdna_read_offset

cDNA Read Size

Name	cdna_read_size
Description	The size of the Complementary DNA (cDNA) read.
Example	75
Reference	http://www.ebi.ac.uk/efo/EFO_0010202
Regex	^\d+$
Namespace	ontology:cdna_read_size

Analysis Derived Data

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File Derived From

Name	file_derived_from
Description	The name of the file that was used to generate the analysis derived data.
Example	file1_sequencing.json
Reference	#
Namespace	ei:file_derived_from

Inferred Cell Type

Name	inferred_cell_type
Description	Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer.
Example	type II bipolar neuron
Reference	#
Namespace	ei:inferred_cell_type

Post Analysis Cell Well Quality

Name	post_analysis_cell_well_quality
Description	Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed.
Example	Pass
Reference	#
Namespace	ei:post_analysis_cell_well_quality
Allowed Values	Fail Pass

Other Derived Cell Attributes

Name	other_derived_cell_attributes
Description	Any other cell level measurement or annotation as result of the analysis.
Example	Cluster
Reference	#
Namespace	ei:other_derived_cell_attributes
Allowed Values	Cluster Count Gene UMI tSNE coordinates

Raw Data Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Reference Genome

Name	reference_genome
Description	Indicate version and include stable link to genome data (or attach genome fasta file).
Example	GRCh38, https://example.org/grch38.fa
Reference	#
Namespace	ei:reference_genome

Genome Annotation

Name	genome_annotation
Description	Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis.
Example	Ensembl v101, https://example.org/ensembl_v101.gtf
Reference	#
Namespace	ei:genome_annotation

Annotation Filtering

Name	annotation_filtering
Description	Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.)
Example	Filtered to include only protein-coding genes
Reference	#
Namespace	ei:annotation_filtering

Genes vs Exons

Name	genes_vs_exons
Description	Quantification using whole gene intervals or exons.
Example	Exon quantification
Reference	#
Namespace	ei:genes_vs_exons

Library Structure

Name	library_structure
Description	seqspec format
Example	Single-cell 3' library
Reference	#
Namespace	ei:library_structure

Mapping and Demultiplexing Software

Name	mapping_and_demultiplexing_software
Description	Reads/UMI
Example	Cell Ranger 6.0.0
Reference	#
Namespace	ei:mapping_and_demultiplexing_software

Read Mapping Statistics

Name	read_mapping_statistics
Description	Statistics of the Reads or Unique Molecular Identifier (UMI).
Example	80% reads mapped to reference
Reference	#
Namespace	ei:read_mapping_statistics

Sequencing Saturation

Name	sequencing_saturation
Description	Depending on number of cells recovered (not targeted) and technology
Example	95% sequencing saturation
Reference	#
Namespace	ei:sequencing_saturation

UMIs or Barcode Distribution QC

Name	umis_barcode_distribution_qc
Description	Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied
Example	Threshold: 10 UMIs per barcode
Reference	#
Namespace	ei:umis_barcode_distribution_qc

Cell or Non-Cell Filtering Strategy

Name	cell_non_cell_filtering_strategy
Description	Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells.
Example	Threshold: 5 UMIs for cell detection
Reference	#
Namespace	ei:cell_non_cell_filtering_strategy

Other Quality Filters Applied

Name	other_quality_filters_applied
Description	Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc.
Example	Cells with >20% mitochondrial reads discarded
Reference	#
Namespace	ei:other_quality_filters_applied

Ambient RNA QC

Name	ambient_rna_qc
Description	Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA
Example	Ambient RNA removed if >5% UMIs in background barcodes
Reference	#
Namespace	ei:ambient_rna_qc

Predicted Doublet Rate QC

Name	predicted_doublet_rate_qc
Description	Depending on number of cells recovered (not targeted) and technology
Example	Predicted doublet rate: 1.5%
Reference	#
Namespace	ei:predicted_doublet_rate_qc

Individual Organism SNP Demultiplexing

Name	individual_organism_snp_demultiplexing
Description	If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used
Example	SNP UMAP embedding using CellSNP
Reference	#
Namespace	ei:individual_organism_snp_demultiplexing

Downstream Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Clustering Algorithm and Version

Name	clustering_algorithm_and_version
Description	If compared/integrated with existing datasets
Example	Louvain 0.8.0
Reference	#
Namespace	ei:clustering_algorithm_and_version

Clustering Parameters

Name	clustering_parameters
Description	If compared/integrated with existing datasets
Example	Resolution: 0.6, K-nearest neighbors: 10
Reference	#
Namespace	ei:clustering_parameters

Integration/Batch Correction

Name	integration_batch_correction
Description	If compared/integrated with existing datasets
Example	Harmony v1.0
Reference	#
Namespace	ei:integration_batch_correction

Data Availability Checklist

Source Code

Name	source_code
Description	If any newly developed code/software has been used in the processing and downstream analysis of the dataset.
Example	Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization.
Reference	#
Namespace	ei:source_code

UMI Count Matrix

Name	umi_count_matrix
Description	Gene x cell matrix with UMI counts for each gene in each cell.
Example	The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv.
Reference	#
Namespace	ei:umi_count_matrix

Ensembl IDs

Name	ensembl_ids
Description	Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata.
Example	ENSG00000139618
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:ensembl_ids

Functional Gene Annotations

Name	functional_gene_annotations
Description	Any functional annotation generated/used (gene names, GOs, structural domains, etc.).
Example	Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding).
Reference	#
Namespace	ei:functional_gene_annotations

Protein Models

Name	protein_models
Description	FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs.
Example	The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID.
Reference	#
Namespace	ei:protein_models

Cell Metadata

Name	cell_metadata
Description	Table mapping cell IDs to cluster/cell type/broad cell type annotations.
Example	Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv.
Reference	#
Namespace	ei:cell_metadata

Cluster-Level Normalised Expression Tables

Name	cluster_level_normalised_expression_tables
Description	Expression tables that show normalised gene expression at the cluster or cell-type level.
Example	Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv.
Reference	#
Namespace	ei:cluster_level_normalised_expression_tables

Other Resource Files

Name	other_resource_files
Description	Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags).
Example	Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv.
Reference	#
Namespace	ei:other_resource_files

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:sequencing_id

Read 1 File Required

Name	read_1_file
Description	The name or accession of the file that contains read 1.
Example	file1_r1.fastq.gz
Reference	#
Namespace	ei:read_1_file

Read 2 File

Name	read_2_file
Description	The name or accession of the file that contains read 2.
Example	file2_r2.fastq.gz
Reference	#
Namespace	ei:read_2_file

Index 1 File

Name	index_1_file
Description	The name of the file that contains index 1.
Example	file1_i1.fastq.gz
Reference	#
Namespace	ei:index_1_file

Index 2 File

Name	index_2_file
Description	The name of the file that contains index 2.
Example	file2_i2.fastq.gz
Reference	#
Namespace	ei:index_2_file

Read 1 Checksum Required

Name	read_1_file_checksum
Description	Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	f8d29e41a73b5c02de9a6fb314e7c8ad
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_1_file_checksum

Read 2 Checksum

Name	read_2_file_checksum
Description	Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	a3f4c1b29d8e57fa41b02de6c7f9ab83
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_2_file_checksum

White List Barcode File

Name	white_list_barcode_file
Description	A file containing the known cell barcodes in the dataset.
Example	barcodes.tsv
Reference	#
Namespace	ei:white_list_barcode_file

Expression Data Process Setting

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Expression Data Process Setting ID Required

Name	expression_data_process_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_process_setting_id

Matrix Type

Name	matrix_type
Description	Matrix Type
Example	raw_counts
Reference	#
Namespace	ei:matrix_type
Allowed Values	imputed log1p nomalised pseudobulk raw_counts scaled

Reference Genome Required

Name	reference_genome
Description	The associated reference genome
Example	https://reference-genome-example.com
Reference	#
Regex	^((https?\|ftp):\/\/[^\s\|]+)(\\|((https?\|ftp):\/\/[^\s\|]+))*$
Namespace	ei:reference_genome

Annotation Version

Name	annotation_version
Description	The annotation version of the associated reference genome
Example	GENCODE v44
Reference	#
Namespace	ei:annotation_version

Normalisation Method

Name	normalisation_method
Description	Any normalisation processing performed
Example	Log normalisation
Reference	#
Namespace	ei:normalisation_method
Allowed Values	Library Size Normalisation Log Normalisation SCNorm SCTransform scran

Highly Variable Gene Selection (HVG)

Name	highly_variable_gene_selection
Description	Number of Highly Variable Genes
Example	seurat_v3, n=2000
Reference	#
Namespace	ei:highly_variable_gene_selection

Dimensionality Reduction

Name	dimensionality_reduction
Description	Method used to reduce dimensionality in the expression data
Example	PCA
Reference	#
Namespace	ei:dimensionality_reduction
Allowed Values	Diffusion Map ICA NMF PCA UMAP t-SNE

Number of Nearest Neighbours

Name	n_neighbours
Description	Number of nearest neighbours used to calculate cluster membership
Example	pca:50
Reference	#
Namespace	ei:n_neighbours

Clustering Algorithm

Name	clustering_algorithm
Description	Algorithm used to create clusters
Reference	#
Namespace	ei:clustering_algorithm

Clustering Resolution

Name	clustering_resolution
Description	Resolution parameter
Example	2.5
Reference	#
Regex	^([0-9]*[.])?[0-9]+
Namespace	ei:clustering_resolution

Clustering Distance Metric

Name	clustering_distance_metric
Description	Metic used to calculate a points distance to others
Example	cosine
Reference	#
Namespace	ei:clustering_distance_metric
Allowed Values	cosine euclidean hamming jaccard manhatten mehalanobis

Software Versions

Name	software_versions
Description	Primary software packages used for analysis
Reference	#
Namespace	ei:software_versions

Cell Type Annotation

Name	cell-type annotation
Description	Tools and Databases used for cell annotation
Reference	#
Namespace	ei:cell-type annotation

Generated by Pipeline

Name	generated_by_pipeline
Description	URL of the deposited pipeline used to create this data
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	ei:generated_by_pipeline

Notes

Name	notes
Description	Any other information
Reference	#
Namespace	ei:notes

Expression Data File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	expression_data_file_id
Description	A unique alphanumeric identifier for the expression data file
Example	EXPFILE001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric identifier for library preparation
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Expression Data Process Setting ID Required

Name	expression_data_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_setting_id

File Name Required

Name	expression_data_file
Description	Expression data file name
Example	exp_file.csv
Reference	#
Namespace	ei:expression_data_file

File md5 Checkshum Required

Name	expression_data_file_checksum
Description	calculated md5 checksum for this file
Example	9e4b7a23f6c1d0ab85f29c47e3d8a610
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:expression_data_file_checksum

File Format Required

Name	expression_data_file_format
Description	The format of the expression file, such as h5ad or rds
Example	csv
Reference	#
Namespace	ei:expression_data_file_format
Allowed Values	csv h5ad loom mtx rds

Number of Cells

Name	n_cells
Description	The number of cells represented in the expression data
Example	4
Reference	#
Regex	^\d+$
Namespace	ei:n_cells

Number of Genes

Name	n_genes
Description	The number of genese represented in the expression data
Example	50
Reference	#
Regex	^\d+$
Namespace	ei:n_genes

File Size in Bytes

Name	file_size_bytes
Description	Size of the file recorded in bytes
Example	90
Reference	#
Regex	^\d+$
Namespace	ei:file_size_bytes

Date Generated

Name	date_generated
Description	Approximate date this expression data was generated
Example	2024-10-14
Reference	#
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	ei:date_generated

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Project Name Required

Name	project_name
Description	Official name of the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Study of single cells in the human body
Reference	https://w3id.org/mixs/0000092
Namespace	mixs:project_name

Description

Name	description
Description	A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication.
Example	This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine.
Reference	http://purl.org/dc/terms/description
Namespace	dcterms:description

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Laser microdissection
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Technology Required

Name	technology
Description	The sorting or visualisation technology used.
Example	Vizgen
Reference	#
Namespace	ei:technology

Negative Control Type

Name	neg_cont_type
Description	The substance or equipment used as a negative control in an investigation
Example	Phosphate buffer
Reference	https://w3id.org/mixs/0001321
Namespace	mixs:neg_cont_type
Allowed Values	DNA-free PCR mix Distilled water Empty collection device Empty collection tube Phosphate buffer Sterile swab Sterile syringe

Positive Control Type

Name	pos_cont_type
Description	The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive
Example	substance1
Reference	https://w3id.org/mixs/0001322
Regex	^[a-zA-Z0-9]+$
Namespace	mixs:pos_cont_type

Experimental Factor

Name	experimental_factor
Description	Variable aspects of an experiment design that can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI)
Example	EFO:0001779
Reference	https://w3id.org/mixs/0000008
Regex	^[A-Z]{2,}:\d+$
Namespace	mixs:experimental_factor

Relevant Electronic Resources

Name	associated_resource
Description	A related resource that is referenced, cited, or otherwise associated to the sequence.
Example	https://arctos.database.museum/media/10520962 \| https://arctos.database.museum/media/10520964
Reference	https://w3id.org/mixs/0000091
Regex	^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])+(?: \\| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]))$
Namespace	mixs:associated_resource

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Regex	^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Dissociation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Protocol Name Required

Name	protocol_name
Description	A descriptive name of the protocol used for single-cell sequencing.
Example	10X Genomics Single Cell 3' Library Prep
Reference	#
Namespace	ei:protocol_name

Dissociation Description Required

Name	dissociation_description
Description	A free-text description of the process used to separate cells from tissues or cell aggregates.
Example	Tissue was enzymatically dissociated using collagenase for 30 minutes.
Reference	#
Namespace	ei:dissociation_description

Enrichment Markers

Name	enrichment_markers
Description	Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms.
Example	CD45
Reference	#
Namespace	faang:enrichment_markers

Isolation Kit

Name	isolation_kit
Description	The kit used to isolate the cells.
Example	10x Nuclei Isolation Kit
Reference	#
Namespace	ei:isolation_kit
Allowed Values	10x Nuclei Isolation Kit 3' standard throughput kit Custom

Literature Source Reference

Name	literature_source_reference
Description	Reference to literature sources that describe the protocol or methods used.
Example	Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview'
Reference	#
Namespace	ei:literature_source_reference

Protocols IO Reference

Name	protocols_io_reference
Description	Reference link to protocols.io for additional details on the protocol.
Example	https://www.protocols.io/view/sample-protocol-b2ubqesn
Reference	#
Regex	^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])+(?: \\| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]))$
Namespace	ei:protocols_io_reference

Workflowhub Sop Reference

Name	workflow_hub_sop_reference
Description	Reference to the Standard Operating Procedure (SOP) in workflow hub.
Example	https://workflowhub.eu/works/12345
Reference	#
Namespace	ei:workflow_hub_sop_reference

Dissociation Protocol Method

Name	dissociation_protocol_method
Description	The method used to dissociate tissues into single cells.
Example	Mechanical and enzymatic dissociation
Reference	#
Namespace	ei:dissociation_protocol_method

Single Cell Quality Metric

Name	single_cell_quality_metric
Description	Metrics used to assess the quality of single cells before sequencing.
Example	Cell viability percentage
Reference	#
Namespace	ei:single_cell_quality_metric

Cell Suspension

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the sample
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Suspension Type Required

Name	suspension_type
Description	The type of suspension used to keep cells in solution during processing.
Example	Cell
Reference	#
Namespace	ei:suspension_type
Allowed Values	Cell Nuclei Protoplast

Cell Count

Name	cell_count
Description	An number representing the number of cells in the sequencing library.
Example	10000
Reference	#
Regex	^\d+$
Namespace	ei:cell_count

Cell Viability

Name	cell_viability
Description	The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis.
Example	95
Reference	#
Namespace	ei:cell_viability

Cell Viability Assessment Method

Name	cell_viability_assessment_method
Description	The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques.
Example	Trypan Blue Exclusion
Reference	#
Namespace	ei:cell_viability_assessment_method

Cell Size

Name	cell_size
Description	The size of the cell, typically measured in micrometres.
Example	10
Reference	#
Namespace	ei:cell_size

Suspension Volume (µL)

Name	suspension_volume_µl
Description	The volume of the cell suspension in microlitres (µL).
Example	100
Reference	#
Namespace	ei:suspension_volume_µl

Suspension Concentration Cells Per µL

Name	suspension_concentration_cells_per_µl
Description	The concentration of cells in the suspension in microlitres (µL).
Example	1000
Reference	#
Namespace	ei:suspension_concentration_cells_per_µl

Suspension Dilution

Name	suspension_dilution
Description	The dilution factor of the cell suspension.
Example	1:10
Reference	#
Namespace	ei:suspension_dilution

Loading Volume Μl

Name	loading_volume_µl
Description	The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:loading_volume_µl

Suspension Dilution Buffer

Name	suspension_dilution_buffer
Description	A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing.
Example	PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin)
Reference	#
Namespace	ei:suspension_dilution_buffer

Library Preparation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the library preparation.
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Library Preparation Kit Required

Name	library_prep_kit
Description	Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids
Example	10X Genomics Single Cell 3' v3
Reference	https://w3id.org/mixs/0001145
Namespace	mixs:library_prep_kit

Library Preparation Kit Version Required

Name	library_prep_kit_version
Description	The version number of the library preparation kit used for sequencing.
Example	2
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Regex	^\d+(\.\d+)?$
Namespace	ontology:library_prep_kit_version

Amplification Method

Name	amplification_method
Description	The method used to amplify the Complementary DNA (cDNA).
Example	PCR
Reference	#
Namespace	ei:amplification_method

cDNA Amplification Cycles

Name	cdna_amplification_cycles
Description	The number of cycles used during the Complementary DNA (cDNA) amplification process.
Example	12
Reference	#
Regex	^\d+$
Namespace	ei:cdna_amplification_cycles

Average Size Distribution

Name	average_size_distribution
Description	The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing.
Example	350
Reference	#
Regex	^\d+$
Namespace	ei:average_size_distribution

Library Construction Method

Name	lib_construction_method
Description	The library construction method (including version) that was used.
Example	Smart-Seq2
Reference	#
Namespace	ei:lib_construction_method

Input Molecule

Name	input_molecule
Description	The specific fraction of biological macromolecule from which the sequencing library is derived.
Example	RNA
Reference	#
Namespace	ei:input_molecule

Primer

Name	primer
Description	The type of primer used for reverse transcription. This allows users to identify content of the cDNA library input for mRNA.
Example	Random
Reference	#
Namespace	ei:primer
Allowed Values	Oligo-dT Random

Primeness Required

Name	primeness
Description	The end from which the molecule was sequenced.
Example	5'
Reference	#
Namespace	ei:primeness
Allowed Values	3' 5' Both

End Bias

Name	end_bias
Description	The end bias of the library.
Example	3
Reference	#
Namespace	ei:end_bias
Allowed Values	3 5

Library Strand

Name	library_strand
Description	The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none.
Example	Antisense
Reference	#
Namespace	ei:library_strand
Allowed Values	Antisense Both Sense Unstranded

Spike In Required

Name	spike_in
Description	External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used.
Example	Yes
Reference	#
Namespace	ei:spike_in
Allowed Values	No Yes

Spike Type

Name	spike_type
Description	The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA.
Example	Synthetic RNA
Reference	#
Namespace	ei:spike_type

Spike In Dilution Or Concentration

Name	spike_in_dilution_or_concentration
Description	The final concentration or dilution (for commercial sets) of the spike in mix.
Example	1:1000
Reference	#
Namespace	ei:spike_in_dilution_or_concentration

i5 Index Required

Name	i5_index
Description	Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing.
Example	ATCACG
Reference	#
Namespace	ei:i5_index

i7 Index Required

Name	i7_index
Description	Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs.
Example	CGATGT
Reference	#
Namespace	ei:i7_index

Dual or Single Index Required

Name	dual_single_index
Description	Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing.
Example	Dual
Reference	#
Namespace	ei:dual_single_index
Allowed Values	Dual Single

I5 Sequence Required

Name	i5_sequence
Description	The nucleotide sequence of the i5 index used in multiplexing during sequencing.
Example	ATCGTAGC
Reference	#
Namespace	ei:i5_sequence

i7 Sequence Required

Name	i7_sequence
Description	The specific nucleotide sequence of the i7 index used for a sample.
Example	TGCATGCA
Reference	#
Namespace	ei:i7_sequence

Plate ID

Name	plate_id
Description	Identifier for the 96-well plate used in sample preparation.
Example	PLT001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:plate_id

Well Row

Name	well_row
Description	The row identifier in a 96-well plate indicating the sample's position.
Example	A
Reference	#
Namespace	ei:well_row

Well Column

Name	well_col
Description	The column identifier in a 96-well plate indicating the sample's position.
Example	5
Reference	#
Regex	^\d+$
Namespace	ei:well_col

Cell Phenotype

Name	cell_phenotype
Description	The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells.
Example	CD41-
Reference	#
Namespace	ei:cell_phenotype
Allowed Values	CD41+ CD41-

Nucleic Acid Amplification

Name	nucl_acid_amp
Description	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids. The link can be a PMID, DOI or URL.
Example	https://phylogenomics.me/protocols/16s-pcr-protocol/
Reference	https://w3id.org/mixs/0000050
Regex	^PMID:\d+$\|^doi:10.\d{2,9}/.$\|^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])$\|([^\s-]{1,2}\|[^\s-]+.+[^\s-]+)$
Namespace	mixs:nucl_acid_amp

Nucleic Acid Extraction

Name	nucl_acid_ext
Description	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample
Example	https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf
Reference	https://w3id.org/mixs/0000038
Regex	^PMID:\d+$\|^doi:10.\d{2,9}/.$\|^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])$\|([^\s-]{1,2}\|[^\s-]+.+[^\s-]+)$
Namespace	mixs:nucl_acid_ext

Amount or Size of Sample Collected

Name	samp_size
Description	The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected
Example	5 litre
Reference	https://w3id.org/mixs/0000037
Regex	^[-+]?[0-9]\.?[0-9]+(?:[eE][-+]?[0-9]+)?( - [-+]?[0-9]\.?[0-9]+(?:[eE][-+]?[0-9]+)?)? *([^\s-]{1,2}\|[^\s-]+.+[^\s-]+)$
Namespace	mixs:samp_size

Estimated Size

Name	estimated_size
Description	The estimated size of the genome prior to sequencing in base pairs (bp). Of particular importance in the sequencing of (eukaryotic) genome which could remain in draft form for a long or unspecified period
Example	300000
Reference	https://w3id.org/mixs/0000001
Namespace	mixs:estimated_size

Sample Volume or Weight for DNA Extraction

Name	samp_vol_we_dna_ext
Description	Volume (ml) or mass (g) of total collected sample processed for DNA extraction.
Example	1500 milliliter
Reference	https://w3id.org/mixs/0000024
Regex	^[-+]?[0-9]\.?[0-9]+(?:[eE][-+]?[0-9]+)?(?: - [-+]?[0-9]\.?[0-9]+(?:[eE][-+]?[0-9]+)?)? *(milliliter\|gram\|milligram\|square centimeter)$
Namespace	mixs:samp_vol_we_dna_ext

Library Vector

Name	lib_vector
Description	Cloning vector type(s) used in construction of libraries
Example	Bacteriophage P1
Reference	https://w3id.org/mixs/0000041
Namespace	mixs:lib_vector

Adapters

Name	adapters
Description	Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters
Example	AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT
Reference	https://w3id.org/mixs/0000042
Namespace	mixs:adapters

Sample Material Processing

Name	samp_mat_process
Description	A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed
Example	filtering of seawater, storing samples in ethanol
Reference	https://w3id.org/mixs/0000048
Namespace	mixs:samp_mat_process

Design description

Name	design_description
Description	The design of the library including details of how it was constructed.
Reference	#
Namespace	ei:design_description

Library selection Required

Name	library_selection
Description	The method used to select for or against, enrich, or screen the material being sequenced.
Example	RANDOM PCR
Reference	#
Namespace	ei:library_selection
Allowed Values	5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified

Library source Required

Name	library_source
Description	The type of source material that is being sequenced.
Example	GENOMIC
Reference	#
Namespace	ei:library_source
Allowed Values	GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA

Library strategy Required

Name	library_strategy
Description	The sequencing technique intended for this library.
Example	RNA-Seq
Reference	#
Namespace	ei:library_strategy
Allowed Values	AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq

Sequencing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	https://w3id.org/mixs/0000016
Regex	^[a-zA-Z0-9]+$
Namespace	ontology:sequencing_id

Sequencing Platform Name Required

Name	sequencing_platform_name
Description	The name of the sequencing platform used for the experiment.
Example	Pacbio
Reference	http://purl.obolibrary.org/obo/NCIT_C172274
Namespace	ontology:sequencing_platform_name

Sequencing Instrument Model Required

Name	sequencing_instrument_model
Description	This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability.
Example	Illumina NovaSeq 6000
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Namespace	ontology:sequencing_instrument_model
Allowed Values	454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100

Library Layout Required

Name	lib_layout
Description	Specify whether to expect single, paired, or other configuration of reads for sequencing
Example	Paired
Reference	https://w3id.org/mixs/0000111
Namespace	mixs:lib_layout
Allowed Values	Other Paired Single Vector

UMI Barcode Read

Name	umi_barcode_read
Description	The type of read that contains the Unique Molecular Identifier (UMI) barcode.
Example	index2
Reference	#
Namespace	ei:umi_barcode_read
Allowed Values	index1 index2 read1 read2

UMI Barcode Offset

Name	umi_barcode_offset
Description	The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode.
Example	0
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_offset

UMI Barcode Size

Name	umi_barcode_size
Description	The size of the Unique Molecular Identifier (UMI) identifying barcode.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_size

Cell Barcode Read

Name	cell_barcode_read
Description	The type of read that contains the UMI barcode.
Example	index1
Reference	http://www.ebi.ac.uk/efo/EFO_0010203
Namespace	ontology:cell_barcode_read
Allowed Values	index1 index2 read1 read2

Cell Barcode Offset

Name	cell_barcode_offset
Description	The offset in sequence of the cell identifying barcode.
Example	10
Reference	http://www.ebi.ac.uk/efo/EFO_0010204
Regex	^\d+$
Namespace	ontology:cell_barcode_offset

Cell Barcode Size

Name	cell_barcode_size
Description	The offset in sequence of the cell identifying barcode.
Example	0
Reference	http://www.ebi.ac.uk/efo/EFO_0010205
Regex	^\d+$
Namespace	ontology:cell_barcode_size

cDNA Read Required

Name	cdna_read
Description	The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing.
Example	read1
Reference	http://www.ebi.ac.uk/efo/EFO_0010195
Namespace	ontology:cdna_read
Allowed Values	index1 index2 read1 read2

cDNA Read Offset

Name	cdna_read_offset
Description	The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences.
Example	6
Reference	http://www.ebi.ac.uk/efo/EFO_0010201
Regex	^\d+$
Namespace	ontology:cdna_read_offset

cDNA Read Size

Name	cdna_read_size
Description	The size of the Complementary DNA (cDNA) read.
Example	75
Reference	http://www.ebi.ac.uk/efo/EFO_0010202
Regex	^\d+$
Namespace	ontology:cdna_read_size

Library Size

Name	lib_size
Description	Total number of clones in the library prepared for the project
Example	50
Reference	https://w3id.org/mixs/0000039
Regex	^\d+$
Namespace	mixs:lib_size

Completeness Score

Name	compl_score
Description	Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores
Example	med;60%
Reference	https://w3id.org/mixs/0000069
Namespace	mixs:compl_score

Library Reads Sequenced

Name	lib_reads_seq
Description	Total number of clones sequenced from the library
Example	20
Reference	https://w3id.org/mixs/0000040
Regex	^\d+$
Namespace	mixs:lib_reads_seq

Number of Contigs

Name	number_contig
Description	Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG
Example	40
Reference	https://w3id.org/mixs/0000060
Namespace	mixs:number_contig

Number of Replicons

Name	num_replicons
Description	Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote
Example	2
Reference	https://w3id.org/mixs/0000022
Regex	^\d+$
Namespace	mixs:num_replicons

Analysis Derived Data

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File Derived From

Name	file_derived_from
Description	The name of the file that was used to generate the analysis derived data.
Example	file1_sequencing.json
Reference	#
Namespace	ei:file_derived_from

Inferred Cell Type

Name	inferred_cell_type
Description	Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer.
Example	type II bipolar neuron
Reference	#
Namespace	ei:inferred_cell_type

Post Analysis Cell Well Quality

Name	post_analysis_cell_well_quality
Description	Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed.
Example	Pass
Reference	#
Namespace	ei:post_analysis_cell_well_quality
Allowed Values	Fail Pass

Other Derived Cell Attributes

Name	other_derived_cell_attributes
Description	Any other cell level measurement or annotation as result of the analysis.
Example	Cluster
Reference	#
Namespace	ei:other_derived_cell_attributes
Allowed Values	Cluster Count Gene UMI tSNE coordinates

Raw Data Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Reference Genome

Name	reference_genome
Description	Indicate version and include stable link to genome data (or attach genome fasta file).
Example	GRCh38, https://example.org/grch38.fa
Reference	#
Namespace	ei:reference_genome

Genome Annotation

Name	genome_annotation
Description	Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis.
Example	Ensembl v101, https://example.org/ensembl_v101.gtf
Reference	#
Namespace	ei:genome_annotation

Annotation Filtering

Name	annotation_filtering
Description	Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.)
Example	Filtered to include only protein-coding genes
Reference	#
Namespace	ei:annotation_filtering

Genes vs Exons

Name	genes_vs_exons
Description	Quantification using whole gene intervals or exons.
Example	Exon quantification
Reference	#
Namespace	ei:genes_vs_exons

Library Structure

Name	library_structure
Description	seqspec format
Example	Single-cell 3' library
Reference	#
Namespace	ei:library_structure

Mapping and Demultiplexing Software

Name	mapping_and_demultiplexing_software
Description	Reads/UMI
Example	Cell Ranger 6.0.0
Reference	#
Namespace	ei:mapping_and_demultiplexing_software

Read Mapping Statistics

Name	read_mapping_statistics
Description	Statistics of the Reads or Unique Molecular Identifier (UMI).
Example	80% reads mapped to reference
Reference	#
Namespace	ei:read_mapping_statistics

Sequencing Saturation

Name	sequencing_saturation
Description	Depending on number of cells recovered (not targeted) and technology
Example	95% sequencing saturation
Reference	#
Namespace	ei:sequencing_saturation

UMIs or Barcode Distribution QC

Name	umis_barcode_distribution_qc
Description	Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied
Example	Threshold: 10 UMIs per barcode
Reference	#
Namespace	ei:umis_barcode_distribution_qc

Cell or Non-Cell Filtering Strategy

Name	cell_non_cell_filtering_strategy
Description	Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells.
Example	Threshold: 5 UMIs for cell detection
Reference	#
Namespace	ei:cell_non_cell_filtering_strategy

Other Quality Filters Applied

Name	other_quality_filters_applied
Description	Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc.
Example	Cells with >20% mitochondrial reads discarded
Reference	#
Namespace	ei:other_quality_filters_applied

Ambient RNA QC

Name	ambient_rna_qc
Description	Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA
Example	Ambient RNA removed if >5% UMIs in background barcodes
Reference	#
Namespace	ei:ambient_rna_qc

Predicted Doublet Rate QC

Name	predicted_doublet_rate_qc
Description	Depending on number of cells recovered (not targeted) and technology
Example	Predicted doublet rate: 1.5%
Reference	#
Namespace	ei:predicted_doublet_rate_qc

Individual Organism SNP Demultiplexing

Name	individual_organism_snp_demultiplexing
Description	If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used
Example	SNP UMAP embedding using CellSNP
Reference	#
Namespace	ei:individual_organism_snp_demultiplexing

Assembly Name

Name	assembly_name
Description	Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community
Example	JCVI_ISG_i3_1.0
Reference	https://w3id.org/mixs/0000057
Namespace	mixs:assembly_name

Extrachromosomal Elements

Name	extrachrom_elements
Description	Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)
Example	5
Reference	https://w3id.org/mixs/0000023
Regex	^\d+$
Namespace	mixs:extrachrom_elements

Assembly Quality

Name	assembly_qual
Description	The assembly quality category is based on sets of criteria outlined for each assembly quality category.
Example	High-quality draft genome
Reference	https://w3id.org/mixs/0000056
Namespace	mixs:assembly_qual
Allowed Values	Finished genome Genome fragment(s) High-quality draft genome Low-quality draft genome Medium-quality draft genome

Assembly Software

Name	assembly_software
Description	Tool(s) used for assembly, including version number and parameters
Example	metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise
Reference	https://w3id.org/mixs/0000058
Namespace	mixs:assembly_software

Annotation

Name	annot
Description	Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter
Example	prokka
Reference	https://w3id.org/mixs/0000059
Namespace	mixs:annot

Feature Prediction

Name	feat_pred
Description	Method used to predict UViGs features such as ORFs, integration site, etc
Example	Prodigal;2.6.3;default parameters
Reference	https://w3id.org/mixs/0000061
Regex	^([^\s-]{1,2}\|[^\s-]+.+[^\s-]+);([^\s-]{1,2}\|[^\s-]+.+[^\s-]+);([^\s-]{1,2}\|[^\s-]+.+[^\s-]+)$
Namespace	mixs:feat_pred

Completeness Software

Name	compl_software
Description	Tools used for completion estimate, i.e. checkm, anvi'o, busco
Example	checkm
Reference	https://w3id.org/mixs/0000070
Namespace	mixs:compl_software

Similarity Search Method

Name	sim_search_meth
Description	Tool used to compare ORFs with database, along with version and cutoffs used
Example	HMMER3;3.1b2;hmmsearch, cutoff of 50 on score
Reference	https://w3id.org/mixs/0000063
Regex	^([^\s-]{1,2}\|[^\s-]+.+[^\s-]+);([^\s-]{1,2}\|[^\s-]+.+[^\s-]+);([^\s-]{1,2}\|[^\s-]+.+[^\s-]+)$
Namespace	mixs:sim_search_meth

Relevant Standard Operating Procedures

Name	sop
Description	Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
Example	http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/
Reference	https://w3id.org/mixs/0000090
Namespace	mixs:sop

Downstream Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Clustering Algorithm and Version

Name	clustering_algorithm_and_version
Description	If compared/integrated with existing datasets
Example	Louvain 0.8.0
Reference	#
Namespace	ei:clustering_algorithm_and_version

Clustering Parameters

Name	clustering_parameters
Description	If compared/integrated with existing datasets
Example	Resolution: 0.6, K-nearest neighbors: 10
Reference	#
Namespace	ei:clustering_parameters

Integration/Batch Correction

Name	integration_batch_correction
Description	If compared/integrated with existing datasets
Example	Harmony v1.0
Reference	#
Namespace	ei:integration_batch_correction

Data Availability Checklist

Source Code

Name	source_code
Description	If any newly developed code/software has been used in the processing and downstream analysis of the dataset.
Example	Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization.
Reference	#
Namespace	ei:source_code

UMI Count Matrix

Name	umi_count_matrix
Description	Gene x cell matrix with UMI counts for each gene in each cell.
Example	The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv.
Reference	#
Namespace	ei:umi_count_matrix

Ensembl IDs

Name	ensembl_ids
Description	Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata.
Example	ENSG00000139618
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:ensembl_ids

Functional Gene Annotations

Name	functional_gene_annotations
Description	Any functional annotation generated/used (gene names, GOs, structural domains, etc.).
Example	Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding).
Reference	#
Namespace	ei:functional_gene_annotations

Protein Models

Name	protein_models
Description	FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs.
Example	The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID.
Reference	#
Namespace	ei:protein_models

Cell Metadata

Name	cell_metadata
Description	Table mapping cell IDs to cluster/cell type/broad cell type annotations.
Example	Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv.
Reference	#
Namespace	ei:cell_metadata

Cluster-Level Normalised Expression Tables

Name	cluster_level_normalised_expression_tables
Description	Expression tables that show normalised gene expression at the cluster or cell-type level.
Example	Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv.
Reference	#
Namespace	ei:cluster_level_normalised_expression_tables

Other Resource Files

Name	other_resource_files
Description	Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags).
Example	Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv.
Reference	#
Namespace	ei:other_resource_files

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:sequencing_id

Read 1 File Required

Name	read_1_file
Description	The name or accession of the file that contains read 1.
Example	file1_r1.fastq.gz
Reference	#
Namespace	ei:read_1_file

Read 2 File

Name	read_2_file
Description	The name or accession of the file that contains read 2.
Example	file2_r2.fastq.gz
Reference	#
Namespace	ei:read_2_file

Index 1 File

Name	index_1_file
Description	The name of the file that contains index 1.
Example	file1_i1.fastq.gz
Reference	#
Namespace	ei:index_1_file

Index 2 File

Name	index_2_file
Description	The name of the file that contains index 2.
Example	file2_i2.fastq.gz
Reference	#
Namespace	ei:index_2_file

Read 1 Checksum Required

Name	read_1_file_checksum
Description	Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	f8d29e41a73b5c02de9a6fb314e7c8ad
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_1_file_checksum

Read 2 Checksum

Name	read_2_file_checksum
Description	Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	a3f4c1b29d8e57fa41b02de6c7f9ab83
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_2_file_checksum

White List Barcode File

Name	white_list_barcode_file
Description	A file containing the known cell barcodes in the dataset.
Example	barcodes.tsv
Reference	#
Namespace	ei:white_list_barcode_file

Expression Data Process Setting

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Expression Data Process Setting ID Required

Name	expression_data_process_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_process_setting_id

Matrix Type

Name	matrix_type
Description	Matrix Type
Example	raw_counts
Reference	#
Namespace	ei:matrix_type
Allowed Values	imputed log1p nomalised pseudobulk raw_counts scaled

Reference Genome Required

Name	reference_genome
Description	The associated reference genome
Example	https://reference-genome-example.com
Reference	#
Regex	^((https?\|ftp):\/\/[^\s\|]+)(\\|((https?\|ftp):\/\/[^\s\|]+))*$
Namespace	ei:reference_genome

Annotation Version

Name	annotation_version
Description	The annotation version of the associated reference genome
Example	GENCODE v44
Reference	#
Namespace	ei:annotation_version

Normalisation Method

Name	normalisation_method
Description	Any normalisation processing performed
Example	Log normalisation
Reference	#
Namespace	ei:normalisation_method
Allowed Values	Library Size Normalisation Log Normalisation SCNorm SCTransform scran

Highly Variable Gene Selection (HVG)

Name	highly_variable_gene_selection
Description	Number of Highly Variable Genes
Example	seurat_v3, n=2000
Reference	#
Namespace	ei:highly_variable_gene_selection

Dimensionality Reduction

Name	dimensionality_reduction
Description	Method used to reduce dimensionality in the expression data
Example	PCA
Reference	#
Namespace	ei:dimensionality_reduction
Allowed Values	Diffusion Map ICA NMF PCA UMAP t-SNE

Number of Nearest Neighbours

Name	n_neighbours
Description	Number of nearest neighbours used to calculate cluster membership
Example	pca:50
Reference	#
Namespace	ei:n_neighbours

Clustering Algorithm

Name	clustering_algorithm
Description	Algorithm used to create clusters
Reference	#
Namespace	ei:clustering_algorithm

Clustering Resolution

Name	clustering_resolution
Description	Resolution parameter
Example	2.5
Reference	#
Regex	^([0-9]*[.])?[0-9]+
Namespace	ei:clustering_resolution

Clustering Distance Metric

Name	clustering_distance_metric
Description	Metic used to calculate a points distance to others
Example	cosine
Reference	#
Namespace	ei:clustering_distance_metric
Allowed Values	cosine euclidean hamming jaccard manhatten mehalanobis

Software Versions

Name	software_versions
Description	Primary software packages used for analysis
Reference	#
Namespace	ei:software_versions

Cell Type Annotation

Name	cell-type annotation
Description	Tools and Databases used for cell annotation
Reference	#
Namespace	ei:cell-type annotation

Generated by Pipeline

Name	generated_by_pipeline
Description	URL of the deposited pipeline used to create this data
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	ei:generated_by_pipeline

Notes

Name	notes
Description	Any other information
Reference	#
Namespace	ei:notes

Expression Data File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	expression_data_file_id
Description	A unique alphanumeric identifier for the expression data file
Example	EXPFILE001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric identifier for library preparation
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Expression Data Process Setting ID Required

Name	expression_data_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_setting_id

File Name Required

Name	expression_data_file
Description	Expression data file name
Example	exp_file.csv
Reference	#
Namespace	ei:expression_data_file

File md5 Checkshum Required

Name	expression_data_file_checksum
Description	calculated md5 checksum for this file
Example	9e4b7a23f6c1d0ab85f29c47e3d8a610
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:expression_data_file_checksum

File Format Required

Name	expression_data_file_format
Description	The format of the expression file, such as h5ad or rds
Example	csv
Reference	#
Namespace	ei:expression_data_file_format
Allowed Values	csv h5ad loom mtx rds

Number of Cells

Name	n_cells
Description	The number of cells represented in the expression data
Example	4
Reference	#
Regex	^\d+$
Namespace	ei:n_cells

Number of Genes

Name	n_genes
Description	The number of genese represented in the expression data
Example	50
Reference	#
Regex	^\d+$
Namespace	ei:n_genes

File Size in Bytes

Name	file_size_bytes
Description	Size of the file recorded in bytes
Example	90
Reference	#
Regex	^\d+$
Namespace	ei:file_size_bytes

Date Generated

Name	date_generated
Description	Approximate date this expression data was generated
Example	2024-10-14
Reference	#
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	ei:date_generated

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Title

Name	title
Description	A name given to the study or project. Project title should be fewer than 30 words, such as a title of a grant proposal or a publication.
Example	Study of single cells in the human body
Reference	http://purl.org/dc/terms/title
Namespace	dcterms:title

Description

Name	description
Description	A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication.
Example	This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine.
Reference	http://purl.org/dc/terms/description
Namespace	dcterms:description

Workflow

Name	workflow
Description	The workflow or protocol followed during the study.
Example	Laser microdissection
Reference	#
Namespace	ei:workflow
Allowed Values	Laser microdissection Laser microdissection, Culturing Laser microdissection, Culturing, Sequencing Laser microdissection, Sequencing Microfluidics, Facs, Culturing Microfluidics, Facs, Culturing, Sequencing Microfluidics, Facs, Sequencing Spatial Transcriptomics

Technology Required

Name	technology
Description	The sorting or visualisation technology used.
Example	Vizgen
Reference	#
Namespace	ei:technology

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Regex	^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Dissociation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Protocol Name Required

Name	protocol_name
Description	A descriptive name of the protocol used for single-cell sequencing.
Example	10X Genomics Single Cell 3' Library Prep
Reference	#
Namespace	ei:protocol_name

Dissociation Description Required

Name	dissociation_description
Description	A free-text description of the process used to separate cells from tissues or cell aggregates.
Example	Tissue was enzymatically dissociated using collagenase for 30 minutes.
Reference	#
Namespace	ei:dissociation_description

Enrichment Markers

Name	enrichment_markers
Description	Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms.
Example	CD45
Reference	#
Namespace	faang:enrichment_markers

Isolation Kit

Name	isolation_kit
Description	The kit used to isolate the cells.
Example	10x Nuclei Isolation Kit
Reference	#
Namespace	ei:isolation_kit
Allowed Values	10x Nuclei Isolation Kit 3' standard throughput kit Custom

Literature Source Reference

Name	literature_source_reference
Description	Reference to literature sources that describe the protocol or methods used.
Example	Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview'
Reference	#
Namespace	ei:literature_source_reference

Protocols IO Reference

Name	protocols_io_reference
Description	Reference link to protocols.io for additional details on the protocol.
Example	https://www.protocols.io/view/sample-protocol-b2ubqesn
Reference	#
Regex	^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])+(?: \\| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]))$
Namespace	ei:protocols_io_reference

Workflowhub Sop Reference

Name	workflow_hub_sop_reference
Description	Reference to the Standard Operating Procedure (SOP) in workflow hub.
Example	https://workflowhub.eu/works/12345
Reference	#
Namespace	ei:workflow_hub_sop_reference

Dissociation Protocol Method

Name	dissociation_protocol_method
Description	The method used to dissociate tissues into single cells.
Example	Mechanical and enzymatic dissociation
Reference	#
Namespace	ei:dissociation_protocol_method

Single Cell Quality Metric

Name	single_cell_quality_metric
Description	Metrics used to assess the quality of single cells before sequencing.
Example	Cell viability percentage
Reference	#
Namespace	ei:single_cell_quality_metric

Cell Suspension

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the sample
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Suspension Type Required

Name	suspension_type
Description	The type of suspension used to keep cells in solution during processing.
Example	Cell
Reference	#
Namespace	ei:suspension_type
Allowed Values	Cell Nuclei Protoplast

Cell Count

Name	cell_count
Description	An number representing the number of cells in the sequencing library.
Example	10000
Reference	#
Regex	^\d+$
Namespace	ei:cell_count

Cell Number

Name	cell_number
Description	An number representing the number of cells in the sequencing library.
Example	101-10000
Reference	#
Namespace	tol:cell_number
Allowed Values	1 1000000+ 100001-500000 10001-50000 101-10000 11-50 2-10 500001-1000000 50001-100000 51-100

Cell Viability

Name	cell_viability
Description	The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis.
Example	95
Reference	#
Namespace	ei:cell_viability

Cell Viability Assessment Method

Name	cell_viability_assessment_method
Description	The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques.
Example	Trypan Blue Exclusion
Reference	#
Namespace	ei:cell_viability_assessment_method

Cell Size

Name	cell_size
Description	The size of the cell, typically measured in micrometres.
Example	10
Reference	#
Namespace	ei:cell_size

Suspension Volume (µL)

Name	suspension_volume_µl
Description	The volume of the cell suspension in microlitres (µL).
Example	100
Reference	#
Namespace	ei:suspension_volume_µl

Suspension Concentration Cells Per µL

Name	suspension_concentration_cells_per_µl
Description	The concentration of cells in the suspension in microlitres (µL).
Example	1000
Reference	#
Namespace	ei:suspension_concentration_cells_per_µl

Suspension Dilution

Name	suspension_dilution
Description	The dilution factor of the cell suspension.
Example	1:10
Reference	#
Namespace	ei:suspension_dilution

Loading Volume Μl

Name	loading_volume_µl
Description	The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:loading_volume_µl

Suspension Dilution Buffer

Name	suspension_dilution_buffer
Description	A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing.
Example	PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin)
Reference	#
Namespace	ei:suspension_dilution_buffer

Library Preparation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the library preparation.
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Library Preparation Kit Required

Name	library_prep_kit
Description	Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids
Example	10X Genomics Single Cell 3' v3
Reference	https://w3id.org/mixs/0001145
Namespace	mixs:library_prep_kit

Library Preparation Kit Version Required

Name	library_prep_kit_version
Description	The version number of the library preparation kit used for sequencing.
Example	2
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Regex	^\d+(\.\d+)?$
Namespace	ontology:library_prep_kit_version

Amplification Method

Name	amplification_method
Description	The method used to amplify the Complementary DNA (cDNA).
Example	PCR
Reference	#
Namespace	ei:amplification_method

cDNA Amplification Cycles

Name	cdna_amplification_cycles
Description	The number of cycles used during the Complementary DNA (cDNA) amplification process.
Example	12
Reference	#
Regex	^\d+$
Namespace	ei:cdna_amplification_cycles

Average Size Distribution

Name	average_size_distribution
Description	The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing.
Example	350
Reference	#
Regex	^\d+$
Namespace	ei:average_size_distribution

Library Construction Method

Name	lib_construction_method
Description	The library construction method (including version) that was used.
Example	Smart-Seq2
Reference	#
Namespace	ei:lib_construction_method

Input Molecule

Name	input_molecule
Description	The specific fraction of biological macromolecule from which the sequencing library is derived.
Example	RNA
Reference	#
Namespace	ei:input_molecule

Primer

Name	primer
Description	The type of primer used for reverse transcription. This allows users to identify content of the cDNA library input for mRNA.
Example	Random
Reference	#
Namespace	ei:primer
Allowed Values	Oligo-dT Random

Primeness Required

Name	primeness
Description	The end from which the molecule was sequenced.
Example	5'
Reference	#
Namespace	ei:primeness
Allowed Values	3' 5' Both

End Bias

Name	end_bias
Description	The end bias of the library.
Example	3
Reference	#
Namespace	ei:end_bias
Allowed Values	3 5

Library Strand

Name	library_strand
Description	The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none.
Example	Antisense
Reference	#
Namespace	ei:library_strand
Allowed Values	Antisense Both Sense Unstranded

Spike In Required

Name	spike_in
Description	External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used.
Example	Yes
Reference	#
Namespace	ei:spike_in
Allowed Values	No Yes

Spike Type

Name	spike_type
Description	The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA.
Example	Synthetic RNA
Reference	#
Namespace	ei:spike_type

Spike In Dilution Or Concentration

Name	spike_in_dilution_or_concentration
Description	The final concentration or dilution (for commercial sets) of the spike in mix.
Example	1:1000
Reference	#
Namespace	ei:spike_in_dilution_or_concentration

i5 Index Required

Name	i5_index
Description	Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing.
Example	ATCACG
Reference	#
Namespace	ei:i5_index

i7 Index Required

Name	i7_index
Description	Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs.
Example	CGATGT
Reference	#
Namespace	ei:i7_index

Dual or Single Index Required

Name	dual_single_index
Description	Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing.
Example	Dual
Reference	#
Namespace	ei:dual_single_index
Allowed Values	Dual Single

I5 Sequence Required

Name	i5_sequence
Description	The nucleotide sequence of the i5 index used in multiplexing during sequencing.
Example	ATCGTAGC
Reference	#
Namespace	ei:i5_sequence

i7 Sequence Required

Name	i7_sequence
Description	The specific nucleotide sequence of the i7 index used for a sample.
Example	TGCATGCA
Reference	#
Namespace	ei:i7_sequence

Plate ID

Name	plate_id
Description	Identifier for the 96-well plate used in sample preparation.
Example	PLT001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:plate_id

Well Row

Name	well_row
Description	The row identifier in a 96-well plate indicating the sample's position.
Example	A
Reference	#
Namespace	ei:well_row

Well Column

Name	well_col
Description	The column identifier in a 96-well plate indicating the sample's position.
Example	5
Reference	#
Regex	^\d+$
Namespace	ei:well_col

Cell Phenotype

Name	cell_phenotype
Description	The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells.
Example	CD41-
Reference	#
Namespace	ei:cell_phenotype
Allowed Values	CD41+ CD41-

Design description

Name	design_description
Description	The design of the library including details of how it was constructed.
Reference	#
Namespace	ei:design_description

Library selection Required

Name	library_selection
Description	The method used to select for or against, enrich, or screen the material being sequenced.
Example	RANDOM PCR
Reference	#
Namespace	ei:library_selection
Allowed Values	5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified

Library source Required

Name	library_source
Description	The type of source material that is being sequenced.
Example	GENOMIC
Reference	#
Namespace	ei:library_source
Allowed Values	GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA

Library strategy Required

Name	library_strategy
Description	The sequencing technique intended for this library.
Example	RNA-Seq
Reference	#
Namespace	ei:library_strategy
Allowed Values	AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq

Sequencing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	https://w3id.org/mixs/0000016
Regex	^[a-zA-Z0-9]+$
Namespace	ontology:sequencing_id

Sequencing Platform Name Required

Name	sequencing_platform_name
Description	The name of the sequencing platform used for the experiment.
Example	Pacbio
Reference	http://purl.obolibrary.org/obo/NCIT_C172274
Namespace	ontology:sequencing_platform_name

Sequencing Instrument Model Required

Name	sequencing_instrument_model
Description	This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability.
Example	Illumina NovaSeq 6000
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Namespace	ontology:sequencing_instrument_model
Allowed Values	454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100

Library Layout Required

Name	lib_layout
Description	Specify whether to expect single, paired, or other configuration of reads for sequencing
Example	Paired
Reference	https://w3id.org/mixs/0000111
Namespace	mixs:lib_layout
Allowed Values	Other Paired Single Vector

UMI Barcode Read

Name	umi_barcode_read
Description	The type of read that contains the Unique Molecular Identifier (UMI) barcode.
Example	index2
Reference	#
Namespace	ei:umi_barcode_read
Allowed Values	index1 index2 read1 read2

UMI Barcode Offset

Name	umi_barcode_offset
Description	The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode.
Example	0
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_offset

UMI Barcode Size

Name	umi_barcode_size
Description	The size of the Unique Molecular Identifier (UMI) identifying barcode.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_size

Cell Barcode Read

Name	cell_barcode_read
Description	The type of read that contains the UMI barcode.
Example	index1
Reference	http://www.ebi.ac.uk/efo/EFO_0010203
Namespace	ontology:cell_barcode_read
Allowed Values	index1 index2 read1 read2

Cell Barcode Offset

Name	cell_barcode_offset
Description	The offset in sequence of the cell identifying barcode.
Example	10
Reference	http://www.ebi.ac.uk/efo/EFO_0010204
Regex	^\d+$
Namespace	ontology:cell_barcode_offset

Cell Barcode Size

Name	cell_barcode_size
Description	The offset in sequence of the cell identifying barcode.
Example	0
Reference	http://www.ebi.ac.uk/efo/EFO_0010205
Regex	^\d+$
Namespace	ontology:cell_barcode_size

cDNA Read Required

Name	cdna_read
Description	The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing.
Example	read1
Reference	http://www.ebi.ac.uk/efo/EFO_0010195
Namespace	ontology:cdna_read
Allowed Values	index1 index2 read1 read2

cDNA Read Offset

Name	cdna_read_offset
Description	The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences.
Example	6
Reference	http://www.ebi.ac.uk/efo/EFO_0010201
Regex	^\d+$
Namespace	ontology:cdna_read_offset

cDNA Read Size

Name	cdna_read_size
Description	The size of the Complementary DNA (cDNA) read.
Example	75
Reference	http://www.ebi.ac.uk/efo/EFO_0010202
Regex	^\d+$
Namespace	ontology:cdna_read_size

Analysis Derived Data

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File Derived From

Name	file_derived_from
Description	The name of the file that was used to generate the analysis derived data.
Example	file1_sequencing.json
Reference	#
Namespace	ei:file_derived_from

Inferred Cell Type

Name	inferred_cell_type
Description	Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer.
Example	type II bipolar neuron
Reference	#
Namespace	ei:inferred_cell_type

Post Analysis Cell Well Quality

Name	post_analysis_cell_well_quality
Description	Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed.
Example	Pass
Reference	#
Namespace	ei:post_analysis_cell_well_quality
Allowed Values	Fail Pass

Other Derived Cell Attributes

Name	other_derived_cell_attributes
Description	Any other cell level measurement or annotation as result of the analysis.
Example	Cluster
Reference	#
Namespace	ei:other_derived_cell_attributes
Allowed Values	Cluster Count Gene UMI tSNE coordinates

Raw Data Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Reference Genome

Name	reference_genome
Description	Indicate version and include stable link to genome data (or attach genome fasta file).
Example	GRCh38, https://example.org/grch38.fa
Reference	#
Namespace	ei:reference_genome

Genome Annotation

Name	genome_annotation
Description	Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis.
Example	Ensembl v101, https://example.org/ensembl_v101.gtf
Reference	#
Namespace	ei:genome_annotation

Annotation Filtering

Name	annotation_filtering
Description	Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.)
Example	Filtered to include only protein-coding genes
Reference	#
Namespace	ei:annotation_filtering

Genes vs Exons

Name	genes_vs_exons
Description	Quantification using whole gene intervals or exons.
Example	Exon quantification
Reference	#
Namespace	ei:genes_vs_exons

Library Structure

Name	library_structure
Description	seqspec format
Example	Single-cell 3' library
Reference	#
Namespace	ei:library_structure

Mapping and Demultiplexing Software

Name	mapping_and_demultiplexing_software
Description	Reads/UMI
Example	Cell Ranger 6.0.0
Reference	#
Namespace	ei:mapping_and_demultiplexing_software

Read Mapping Statistics

Name	read_mapping_statistics
Description	Statistics of the Reads or Unique Molecular Identifier (UMI).
Example	80% reads mapped to reference
Reference	#
Namespace	ei:read_mapping_statistics

Sequencing Saturation

Name	sequencing_saturation
Description	Depending on number of cells recovered (not targeted) and technology
Example	95% sequencing saturation
Reference	#
Namespace	ei:sequencing_saturation

UMIs or Barcode Distribution QC

Name	umis_barcode_distribution_qc
Description	Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied
Example	Threshold: 10 UMIs per barcode
Reference	#
Namespace	ei:umis_barcode_distribution_qc

Cell or Non-Cell Filtering Strategy

Name	cell_non_cell_filtering_strategy
Description	Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells.
Example	Threshold: 5 UMIs for cell detection
Reference	#
Namespace	ei:cell_non_cell_filtering_strategy

Other Quality Filters Applied

Name	other_quality_filters_applied
Description	Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc.
Example	Cells with >20% mitochondrial reads discarded
Reference	#
Namespace	ei:other_quality_filters_applied

Ambient RNA QC

Name	ambient_rna_qc
Description	Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA
Example	Ambient RNA removed if >5% UMIs in background barcodes
Reference	#
Namespace	ei:ambient_rna_qc

Predicted Doublet Rate QC

Name	predicted_doublet_rate_qc
Description	Depending on number of cells recovered (not targeted) and technology
Example	Predicted doublet rate: 1.5%
Reference	#
Namespace	ei:predicted_doublet_rate_qc

Individual Organism SNP Demultiplexing

Name	individual_organism_snp_demultiplexing
Description	If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used
Example	SNP UMAP embedding using CellSNP
Reference	#
Namespace	ei:individual_organism_snp_demultiplexing

Downstream Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Clustering Algorithm and Version

Name	clustering_algorithm_and_version
Description	If compared/integrated with existing datasets
Example	Louvain 0.8.0
Reference	#
Namespace	ei:clustering_algorithm_and_version

Clustering Parameters

Name	clustering_parameters
Description	If compared/integrated with existing datasets
Example	Resolution: 0.6, K-nearest neighbors: 10
Reference	#
Namespace	ei:clustering_parameters

Integration/Batch Correction

Name	integration_batch_correction
Description	If compared/integrated with existing datasets
Example	Harmony v1.0
Reference	#
Namespace	ei:integration_batch_correction

Data Availability Checklist

Source Code

Name	source_code
Description	If any newly developed code/software has been used in the processing and downstream analysis of the dataset.
Example	Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization.
Reference	#
Namespace	ei:source_code

UMI Count Matrix

Name	umi_count_matrix
Description	Gene x cell matrix with UMI counts for each gene in each cell.
Example	The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv.
Reference	#
Namespace	ei:umi_count_matrix

Ensembl IDs

Name	ensembl_ids
Description	Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata.
Example	ENSG00000139618
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:ensembl_ids

Functional Gene Annotations

Name	functional_gene_annotations
Description	Any functional annotation generated/used (gene names, GOs, structural domains, etc.).
Example	Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding).
Reference	#
Namespace	ei:functional_gene_annotations

Protein Models

Name	protein_models
Description	FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs.
Example	The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID.
Reference	#
Namespace	ei:protein_models

Cell Metadata

Name	cell_metadata
Description	Table mapping cell IDs to cluster/cell type/broad cell type annotations.
Example	Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv.
Reference	#
Namespace	ei:cell_metadata

Cluster-Level Normalised Expression Tables

Name	cluster_level_normalised_expression_tables
Description	Expression tables that show normalised gene expression at the cluster or cell-type level.
Example	Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv.
Reference	#
Namespace	ei:cluster_level_normalised_expression_tables

Other Resource Files

Name	other_resource_files
Description	Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags).
Example	Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv.
Reference	#
Namespace	ei:other_resource_files

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:sequencing_id

Read 1 File Required

Name	read_1_file
Description	The name or accession of the file that contains read 1.
Example	file1_r1.fastq.gz
Reference	#
Namespace	ei:read_1_file

Read 2 File

Name	read_2_file
Description	The name or accession of the file that contains read 2.
Example	file2_r2.fastq.gz
Reference	#
Namespace	ei:read_2_file

Index 1 File

Name	index_1_file
Description	The name of the file that contains index 1.
Example	file1_i1.fastq.gz
Reference	#
Namespace	ei:index_1_file

Index 2 File

Name	index_2_file
Description	The name of the file that contains index 2.
Example	file2_i2.fastq.gz
Reference	#
Namespace	ei:index_2_file

Read 1 Checksum Required

Name	read_1_file_checksum
Description	Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	f8d29e41a73b5c02de9a6fb314e7c8ad
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_1_file_checksum

Read 2 Checksum

Name	read_2_file_checksum
Description	Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	a3f4c1b29d8e57fa41b02de6c7f9ab83
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_2_file_checksum

White List Barcode File

Name	white_list_barcode_file
Description	A file containing the known cell barcodes in the dataset.
Example	barcodes.tsv
Reference	#
Namespace	ei:white_list_barcode_file

Expression Data Process Setting

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Expression Data Process Setting ID Required

Name	expression_data_process_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_process_setting_id

Matrix Type

Name	matrix_type
Description	Matrix Type
Example	raw_counts
Reference	#
Namespace	ei:matrix_type
Allowed Values	imputed log1p nomalised pseudobulk raw_counts scaled

Reference Genome Required

Name	reference_genome
Description	The associated reference genome
Example	https://reference-genome-example.com
Reference	#
Regex	^((https?\|ftp):\/\/[^\s\|]+)(\\|((https?\|ftp):\/\/[^\s\|]+))*$
Namespace	ei:reference_genome

Annotation Version

Name	annotation_version
Description	The annotation version of the associated reference genome
Example	GENCODE v44
Reference	#
Namespace	ei:annotation_version

Normalisation Method

Name	normalisation_method
Description	Any normalisation processing performed
Example	Log normalisation
Reference	#
Namespace	ei:normalisation_method
Allowed Values	Library Size Normalisation Log Normalisation SCNorm SCTransform scran

Highly Variable Gene Selection (HVG)

Name	highly_variable_gene_selection
Description	Number of Highly Variable Genes
Example	seurat_v3, n=2000
Reference	#
Namespace	ei:highly_variable_gene_selection

Dimensionality Reduction

Name	dimensionality_reduction
Description	Method used to reduce dimensionality in the expression data
Example	PCA
Reference	#
Namespace	ei:dimensionality_reduction
Allowed Values	Diffusion Map ICA NMF PCA UMAP t-SNE

Number of Nearest Neighbours

Name	n_neighbours
Description	Number of nearest neighbours used to calculate cluster membership
Example	pca:50
Reference	#
Namespace	ei:n_neighbours

Clustering Algorithm

Name	clustering_algorithm
Description	Algorithm used to create clusters
Reference	#
Namespace	ei:clustering_algorithm

Clustering Resolution

Name	clustering_resolution
Description	Resolution parameter
Example	2.5
Reference	#
Regex	^([0-9]*[.])?[0-9]+
Namespace	ei:clustering_resolution

Clustering Distance Metric

Name	clustering_distance_metric
Description	Metic used to calculate a points distance to others
Example	cosine
Reference	#
Namespace	ei:clustering_distance_metric
Allowed Values	cosine euclidean hamming jaccard manhatten mehalanobis

Software Versions

Name	software_versions
Description	Primary software packages used for analysis
Reference	#
Namespace	ei:software_versions

Cell Type Annotation

Name	cell-type annotation
Description	Tools and Databases used for cell annotation
Reference	#
Namespace	ei:cell-type annotation

Generated by Pipeline

Name	generated_by_pipeline
Description	URL of the deposited pipeline used to create this data
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	ei:generated_by_pipeline

Notes

Name	notes
Description	Any other information
Reference	#
Namespace	ei:notes

Expression Data File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	expression_data_file_id
Description	A unique alphanumeric identifier for the expression data file
Example	EXPFILE001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric identifier for library preparation
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Expression Data Process Setting ID Required

Name	expression_data_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_setting_id

File Name Required

Name	expression_data_file
Description	Expression data file name
Example	exp_file.csv
Reference	#
Namespace	ei:expression_data_file

File md5 Checkshum Required

Name	expression_data_file_checksum
Description	calculated md5 checksum for this file
Example	9e4b7a23f6c1d0ab85f29c47e3d8a610
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:expression_data_file_checksum

File Format Required

Name	expression_data_file_format
Description	The format of the expression file, such as h5ad or rds
Example	csv
Reference	#
Namespace	ei:expression_data_file_format
Allowed Values	csv h5ad loom mtx rds

Number of Cells

Name	n_cells
Description	The number of cells represented in the expression data
Example	4
Reference	#
Regex	^\d+$
Namespace	ei:n_cells

Number of Genes

Name	n_genes
Description	The number of genese represented in the expression data
Example	50
Reference	#
Regex	^\d+$
Namespace	ei:n_genes

File Size in Bytes

Name	file_size_bytes
Description	Size of the file recorded in bytes
Example	90
Reference	#
Regex	^\d+$
Namespace	ei:file_size_bytes

Date Generated

Name	date_generated
Description	Approximate date this expression data was generated
Example	2024-10-14
Reference	#
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	ei:date_generated

Study

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Description

Name	description
Description	A detailed description of the project which includes research goals and experimental approach. Project description should be fewer than 300 words, such as an abstract from a grant application or publication.
Example	This project explores the intricate details of single cells in the human body, focusing on their structure, function, and behaviour. By studying individual cells, it aims to uncover how they contribute to overall health, disease progression, and human biology. This research can provide deeper insights into cellular processes, paving the way for advancements in medical treatments and personalised medicine.
Reference	http://purl.org/dc/terms/description
Namespace	dcterms:description

Material Required

Name	material
Description	The type of material being described.
Example	Organism
Reference	#
Namespace	faang:material
Allowed Values	Cell culture Cell line Cell specimen Organism Organoid Pool of Specimens Single cell specimen Specimen from Organism

Project Required

Name	project
Description	State that the project is 'FAANG'.
Example	FAANG
Reference	#
Regex	^FAANG$
Namespace	faang:project

Cell Enrichment Required

Name	cell_enrichment
Description	The method by which specific cell populations are sorted or enriched, e.g. 'fluorescence-activated cell sorting (FACS)'. Please contact FAANG DCC to add more terms.
Example	Fluorescence-activated Cell Sorting (FACS)
Reference	#
Namespace	faang:cell_enrichment
Allowed Values	Bead-based sorting Cell culture Centrifugation Fluorescence-activated Cell Sorting (FACS) Magnetic levitation Raman-spectometry sorting, cell culture

Licence

Name	licence
Description	Specifies the terms under which the data associated with the study can be used, shared, or reused. It informs users how they may legally reference, distribute, or build upon the study. Common licenses include Creative Commons (e.g., CC BY 4.0), which require attribution to the original authors when the data is cited or reused.
Example	MIT
Reference	#
Namespace	ei:licence
Allowed Values	Apache-2.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-3.0-or-later MIT

Person

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Orcid ID

Name	orcid_id
Description	A 16-digit number that uniquely identify researchers.
Example	0000-1234-5678-9012
Reference	#
Regex	^\d{4}-\d{4}-\d{4}-\d{3}[\dX]$
Namespace	ei:orcid_id

First Name Required

Name	givenName
Description	A first name (or given name) is the personal name given to an individual conducting the study.
Example	Jane
Reference	https://schema.org/givenName
Regex	^[A-Za-z]+(?:[-\s][A-Za-z]+)*[a-z]+$
Namespace	schema.org:givenName

Last Name Required

Name	familyName
Description	A last name (or surname) is the family name passed down from one generation to the next for the individual conducting the study.
Example	Doe
Reference	https://schema.org/familyName
Regex	^[A-Za-z]+(-[A-Za-z]+)*[a-z]+$
Namespace	schema.org:familyName

Email Address

Name	email
Description	A unique identifier used to send and receive electronic messages (emails) over the internet.
Example	jane.doe@example.com
Reference	https://schema.org/email
Regex	^(?!.\.{2,})(?!.-{2,})[\w.-]+@[a-zA-Z\d.-]+\.[a-zA-Z]{2,}$
Namespace	schema.org:email

Affiliation or Institution Required

Name	affiliation
Description	An organisation or institution that this person is associated with.
Example	University of Liverpool
Reference	https://schema.org/affiliation
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	schema.org:affiliation

Funder

Name	funder
Description	A person or organization that supports (sponsors) something through some kind of financial contribution.
Example	BBSRC
Reference	https://schema.org/funder
Namespace	schema.org:funder

Grant Award

Name	funding
Description	A grant that directly or indirectly provides funding or sponsorship for the person to conduct the study.
Example	GRAK3489
Reference	https://schema.org/funding
Regex	^[A-Za-z0-9]+(?: [A-Za-z0-9]+)*$
Namespace	schema.org:funding

Sample

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for the study if referring to
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Scientific Name or Organism

Name	scientific_name
Description	The formal Latin name used to identify the organism from which the sample was derived (e.g. Homo sapiens or Arabidopsis thaliana). This name must accurately correspond to the Taxon ID provided to ensure correct taxonomic classification.
Example	Salvelinus alpinus
Reference	http://rs.tdwg.org/dwc/terms/scientificName
Regex	^[A-Za-z]+(?: [A-Za-z]+)*[a-z]+$
Namespace	ontology:scientific_name

Taxon ID Required

Name	taxon_id
Description	A unique identifier (usually from a recognized taxonomy database like NCBI Taxonomy) that corresponds to the organism’s scientific name. It must be accurately matched to the provided scientificName to maintain consistency and traceability in biological records.
Example	8036
Reference	http://rs.tdwg.org/dwc/terms/taxonID
Regex	^[0-9]+$
Namespace	ontology:taxon_id

Biosample Accession Required

Name	biosampleAccession
Description	A unique identifier assigned to a biological sample after it has been submitted to a public database, such as the NCBI BioSample or ENA. It serves as a permanent reference to that specific sample, allowing researchers to retrieve metadata and link it across studies or datasets.
Example	SAMEA12907823
Reference	http://purl.obolibrary.org/obo/T4FS_0000316
Namespace	ontology:biosampleAccession

Dissociation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Protocol Name Required

Name	protocol_name
Description	A descriptive name of the protocol used for single-cell sequencing.
Example	10X Genomics Single Cell 3' Library Prep
Reference	#
Namespace	ei:protocol_name

Enrichment Markers

Name	enrichment_markers
Description	Description of the specificity markers used to isolate cell populations, e.g. 'CD45+'. Please contact FAANG DCC to add more terms.
Example	CD45
Reference	#
Namespace	faang:enrichment_markers

Isolation Kit

Name	isolation_kit
Description	The kit used to isolate the cells.
Example	10x Nuclei Isolation Kit
Reference	#
Namespace	ei:isolation_kit
Allowed Values	10x Nuclei Isolation Kit 3' standard throughput kit Custom

Literature Source Reference

Name	literature_source_reference
Description	Reference to literature sources that describe the protocol or methods used.
Example	Doe et al. (2024), 'Single-cell RNA-seq: A comprehensive overview'
Reference	#
Namespace	ei:literature_source_reference

Protocols IO Reference

Name	protocols_io_reference
Description	Reference link to protocols.io for additional details on the protocol.
Example	https://www.protocols.io/view/sample-protocol-b2ubqesn
Reference	#
Regex	^https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=])+(?: \\| https?:\/\/(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}(?:[-a-zA-Z0-9()@:%_\+.~#?&\/=]))$
Namespace	ei:protocols_io_reference

Single cell isolation protocol Required

Name	single_cell_isolation_protocol
Description	Link to protocol describing how the single cells were separated into a single-cell suspension.
Example	https://api.faang.org/files/protocols/samples/INRAE_SOP_PLUS4PIGS_EMBRYOS_DISSOCIATION_PROTO4_20240710.pdf
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	faang:single_cell_isolation_protocol

Workflowhub Sop Reference

Name	workflow_hub_sop_reference
Description	Reference to the Standard Operating Procedure (SOP) in workflow hub.
Example	https://workflowhub.eu/works/12345
Reference	#
Namespace	ei:workflow_hub_sop_reference

Dissociation Protocol Method

Name	dissociation_protocol_method
Description	The method used to dissociate tissues into single cells.
Example	Mechanical and enzymatic dissociation
Reference	#
Namespace	ei:dissociation_protocol_method

Single Cell Quality Metric

Name	single_cell_quality_metric
Description	Metrics used to assess the quality of single cells before sequencing.
Example	Cell viability percentage
Reference	#
Namespace	ei:single_cell_quality_metric

Cell Type Required

Name	cell_type
Description	Provide a cell type from the CL ontology.
Example	malignant cell
Reference	CL:0000000
Regex	^[A-Za-z\s]*[a-z]+$
Namespace	faang:cell_type

Tissue Dissociation Required

Name	tissue_dissociation
Description	The method by which tissues are dissociated into purified or single cells in suspension. Examples are 'proteolysis', 'mesh passage', 'fine needle trituration'. For blood, milk and other fluids, where there is no tissue dissociation use 'fluids'. Please contact FAANG DCC to add more terms.
Example	Proteolysis
Reference	#
Namespace	faang:tissue_dissociation
Allowed Values	Fine needle trituration Fluids Mechanical dissociation Mesh passage Proteolysis

Derived from Required

Name	derived_from
Description	Sample name or BioSample ID for a specimen record.
Example	SSC_INRAE_GUT_ORGANOID_100I
Reference	#
Regex	^[A-Za-z0-9_]+$
Namespace	faang:derived_from

Cell Suspension

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the sample
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Sample ID Required

Name	sample_id
Description	A unique reference or identifier for the sample associated with the cell suspension. This field must provide a consistent, unambiguous way to identify the sample within and across datasets. It can be a name, code, or accession-like format, as long as it remains unique.
Example	SAMPLE001
Reference	#
Namespace	ei:sample_id

Dissociation Protocol ID Required

Name	dissociation_protocol_id
Description	A unique alphanumeric code for the dissociation protocol in the study
Example	DISSOC001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:dissociation_protocol_id

Suspension Type Required

Name	suspension_type
Description	The type of suspension used to keep cells in solution during processing.
Example	Cell
Reference	#
Namespace	ei:suspension_type
Allowed Values	Cell Nuclei Protoplast

Purification Protocol Required

Name	purification_protocol
Description	Link to protocol describing how the cells were purified.
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	faang:purification_protocol

Cell Count

Name	cell_count
Description	An number representing the number of cells in the sequencing library.
Example	10000
Reference	#
Regex	^\d+$
Namespace	ei:cell_count

Cell Number

Name	cell_number
Description	An number representing the number of cells in the sequencing library.
Example	101-10000
Reference	#
Namespace	tol:cell_number
Allowed Values	1 1000000+ 100001-500000 10001-50000 101-10000 11-50 2-10 500001-1000000 50001-100000 51-100

Cell Viability

Name	cell_viability
Description	The percentage of living cells in a sample, indicating the health and quality of cells for RNA-sequencing analysis.
Example	95
Reference	#
Namespace	ei:cell_viability

Cell Viability Assessment Method

Name	cell_viability_assessment_method
Description	The method used to evaluate the viability of cells in the sample, often involving staining or flow cytometry techniques.
Example	Trypan Blue Exclusion
Reference	#
Namespace	ei:cell_viability_assessment_method

Cell Size

Name	cell_size
Description	The size of the cell, typically measured in micrometres.
Example	10
Reference	#
Namespace	ei:cell_size

Suspension Volume (µL)

Name	suspension_volume_µl
Description	The volume of the cell suspension in microlitres (µL).
Example	100
Reference	#
Namespace	ei:suspension_volume_µl

Suspension Concentration Cells Per µL

Name	suspension_concentration_cells_per_µl
Description	The concentration of cells in the suspension in microlitres (µL).
Example	1000
Reference	#
Namespace	ei:suspension_concentration_cells_per_µl

Suspension Dilution

Name	suspension_dilution
Description	The dilution factor of the cell suspension.
Example	1:10
Reference	#
Namespace	ei:suspension_dilution

Loading Volume Μl

Name	loading_volume_µl
Description	The volume of the cell suspension loaded into the single-cell RNA-sequencing system for analysis.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:loading_volume_µl

Suspension Dilution Buffer

Name	suspension_dilution_buffer
Description	A solution used to dilute cell suspensions to a desired concentration, typically prior to loading cells into a device for single-cell RNA sequencing. It helps maintain cell viability and integrity during processing.
Example	PBS (Phosphate-buffered saline) with 0.04% BSA (Bovine serum albumin)
Reference	#
Namespace	ei:suspension_dilution_buffer

Derived from Required

Name	derived_from
Description	Sample name or BioSample ID for a specimen record.
Example	SAMEA112465628
Reference	#
Regex	^[A-Za-z0-9_]+$
Namespace	faang:derived_from

Library Preparation

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Cell Suspension ID Required

Name	cell_suspension_id
Description	A unique alphanumeric code for the cell suspension for the library preparation.
Example	CELLSUSP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:cell_suspension_id

Library Preparation Kit Required

Name	library_prep_kit
Description	Packaged kits (containing adapters, indexes, enzymes, buffers etc.), tailored for specific sequencing workflows, which allow the simplified preparation of sequencing-ready libraries for small genomes, amplicons, and plasmids
Example	10X Genomics Single Cell 3' v3
Reference	https://w3id.org/mixs/0001145
Namespace	mixs:library_prep_kit

Library Preparation Kit Version Required

Name	library_prep_kit_version
Description	The version number of the library preparation kit used for sequencing.
Example	2
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Regex	^\d+(\.\d+)?$
Namespace	ontology:library_prep_kit_version

Amplification Method

Name	amplification_method
Description	The method used to amplify the Complementary DNA (cDNA).
Example	PCR
Reference	#
Namespace	ei:amplification_method

cDNA Amplification Cycles

Name	cdna_amplification_cycles
Description	The number of cycles used during the Complementary DNA (cDNA) amplification process.
Example	12
Reference	#
Regex	^\d+$
Namespace	ei:cdna_amplification_cycles

Average Size Distribution

Name	average_size_distribution
Description	The average length of RNA fragments in base pairs (BP) after library preparation, indicating the quality and suitability of the RNA for sequencing.
Example	350
Reference	#
Regex	^\d+$
Namespace	ei:average_size_distribution

Library Construction Method

Name	lib_construction_method
Description	The library construction method (including version) that was used.
Example	Smart-Seq2
Reference	#
Namespace	ei:lib_construction_method

Input Molecule

Name	input_molecule
Description	The specific fraction of biological macromolecule from which the sequencing library is derived.
Example	RNA
Reference	#
Namespace	ei:input_molecule

Primer

Name	primer
Description	The type of primer used for reverse transcription. This allows users to identify content of the cDNA library input for mRNA.
Example	Random
Reference	#
Namespace	ei:primer
Allowed Values	Oligo-dT Random

Primeness

Name	primeness
Description	The end from which the molecule was sequenced.
Example	5'
Reference	#
Namespace	ei:primeness
Allowed Values	3' 5' Both

End Bias

Name	end_bias
Description	The end bias of the library.
Example	3
Reference	#
Namespace	ei:end_bias
Allowed Values	3 5

Library Strand

Name	library_strand
Description	The Complementary DNA (cDNA) strand of the library from which the reads derived from - sense (first), antisense (second), both or none.
Example	Antisense
Reference	#
Namespace	ei:library_strand
Allowed Values	Antisense Both Sense Unstranded

Spike In

Name	spike_in
Description	External RNA added to the sample as a control to assess technical variability and normalization in RNA-sequencing. State whether spike-in was used.
Example	Yes
Reference	#
Namespace	ei:spike_in
Allowed Values	No Yes

Spike Type

Name	spike_type
Description	The specific type of external RNA used for spiking in, often indicating the source or nature of the control RNA.
Example	Synthetic RNA
Reference	#
Namespace	ei:spike_type

Spike In Dilution Or Concentration

Name	spike_in_dilution_or_concentration
Description	The final concentration or dilution (for commercial sets) of the spike in mix.
Example	1:1000
Reference	#
Namespace	ei:spike_in_dilution_or_concentration

i5 Index Required

Name	i5_index
Description	Barcode sequence used on the i5 adapter during library preparation for identifying samples in multiplexed single-cell RNA-sequencing.
Example	ATCACG
Reference	#
Namespace	ei:i5_index

i7 Index Required

Name	i7_index
Description	Barcode sequence used on the i7 adapter to distinguish samples in multiplexed sequencing runs.
Example	CGATGT
Reference	#
Namespace	ei:i7_index

Dual or Single Index Required

Name	dual_single_index
Description	Specifies if both i5 and i7 indices (dual) or only one index (single) was used for sample identification during sequencing.
Example	Dual
Reference	#
Namespace	ei:dual_single_index
Allowed Values	Dual Single

I5 Sequence Required

Name	i5_sequence
Description	The nucleotide sequence of the i5 index used in multiplexing during sequencing.
Example	ATCGTAGC
Reference	#
Namespace	ei:i5_sequence

i7 Sequence Required

Name	i7_sequence
Description	The specific nucleotide sequence of the i7 index used for a sample.
Example	TGCATGCA
Reference	#
Namespace	ei:i7_sequence

Plate ID

Name	plate_id
Description	Identifier for the 96-well plate used in sample preparation.
Example	PLT001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:plate_id

Well Row

Name	well_row
Description	The row identifier in a 96-well plate indicating the sample's position.
Example	A
Reference	#
Namespace	ei:well_row

Well Column

Name	well_col
Description	The column identifier in a 96-well plate indicating the sample's position.
Example	5
Reference	#
Regex	^\d+$
Namespace	ei:well_col

Cell Phenotype

Name	cell_phenotype
Description	The cell marker for the Fluorescence-Activated Cell Sorting (FACS) of cells.
Example	CD41-
Reference	#
Namespace	ei:cell_phenotype
Allowed Values	CD41+ CD41-

Pool Creation Date Required

Name	pool_creation_date
Description	Date at which the pool was created.
Example	2025-10-24
Reference	#
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	faang:pool_creation_date

Pool Creation Protocol Required

Name	pool_creation_protocol
Description	A link to the protocol for pool of specimens creation.
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	faang:pool_creation_protocol

Design description

Name	design_description
Description	The design of the library including details of how it was constructed.
Reference	#
Namespace	ei:design_description

Library selection Required

Name	library_selection
Description	The method used to select for or against, enrich, or screen the material being sequenced.
Example	RANDOM PCR
Reference	#
Namespace	ei:library_selection
Allowed Values	5-methylcytidine antibody CAGE ChIP ChIP-Seq Dnase HMPR Hybrid Selection Inverse rRNA Inverse rRNA selection MBD2 protein methyl-CpG binding domain MDA MF MSLL Mnase Oligo-dT PCR PolyA RACE RANDOM RANDOM PCR RT-PCR Reduced Representation Restriction Digest cDNA cDNA_oligo_dT cDNA_randomPriming other padlock probes capture method repeat fractionation size fractionation unspecified

Library source Required

Name	library_source
Description	The type of source material that is being sequenced.
Example	GENOMIC
Reference	#
Namespace	ei:library_source
Allowed Values	GENOMIC GENOMIC SINGLE CELL METAGENOMIC METATRANSCRIPTOMIC OTHER SYNTHETIC TRANSCRIPTOMIC TRANSCRIPTOMIC SINGLE CELL VIRAL RNA

Library strategy Required

Name	library_strategy
Description	The sequencing technique intended for this library.
Example	RNA-Seq
Reference	#
Namespace	ei:library_strategy
Allowed Values	AMPLICON ATAC-seq Bisulfite-Seq CLONE CLONEEND CTS ChIA-PET ChIP-Seq ChM-Seq DNase-Hypersensitivity EST FAIRE-seq FINISHING FL-cDNA GBS Hi-C MBD-Seq MNase-Seq MRE-Seq MeDIP-Seq NOMe-Seq OTHER POOLCLONE RAD-Seq RIP-Seq RNA-Seq Ribo-Seq SELEX Synthetic-Long-Read Targeted-Capture Tethered Chromatin Conformation Capture Tn-Seq VALIDATION WCS WGA WGS WXS miRNA-Seq ncRNA-Seq snRNA-seq ssRNA-seq

Sequencing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	https://w3id.org/mixs/0000016
Regex	^[a-zA-Z0-9]+$
Namespace	ontology:sequencing_id

Sequencing Platform Name Required

Name	sequencing_platform_name
Description	The name of the sequencing platform used for the experiment.
Example	Pacbio
Reference	http://purl.obolibrary.org/obo/NCIT_C172274
Namespace	ontology:sequencing_platform_name

Sequencing Instrument Model Required

Name	sequencing_instrument_model
Description	This refers to the machine or platform used for sequencing, with variations in throughput, read lengths, error rates, and application suitability.
Example	Illumina NovaSeq 6000
Reference	http://purl.obolibrary.org/obo/GENEPIO_0000149
Namespace	ontology:sequencing_instrument_model
Allowed Values	454 GS 454 GS 20 454 GS FLX 454 GS FLX Titanium 454 GS FLX+ 454 GS Junior AB 310 Genetic Analyzer AB 3130 Genetic Analyzer AB 3130xL Genetic Analyzer AB 3500 Genetic Analyzer AB 3500xL Genetic Analyzer AB 3730 Genetic Analyzer AB 3730xL Genetic Analyzer AB 5500 Genetic Analyzer AB 5500xl Genetic Analyzer AB 5500xl-W Genetic Analysis System AB SOLiD 3 Plus System AB SOLiD 4 System AB SOLiD 4hq System AB SOLiD PI System AB SOLiD System AB SOLiD System 2.0 AB SOLiD System 3.0 BGISEQ-50 BGISEQ-500 Complete Genomics DNBSEQ-G400 DNBSEQ-G400 FAST DNBSEQ-G50 DNBSEQ-T10x4RS DNBSEQ-T7 Element AVITI FASTASeq 300 GENIUS GS111 Genapsys Sequencer GenoCare 1600 GenoLab M GridION Illumina Genome Analyzer Illumina Genome Analyzer II Illumina Genome Analyzer IIx Illumina HiScanSQ Illumina HiSeq 1000 Illumina HiSeq 1500 Illumina HiSeq 2000 Illumina HiSeq 2500 Illumina HiSeq 3000 Illumina HiSeq 4000 Illumina HiSeq X Illumina HiSeq X Five Illumina HiSeq X Ten Illumina MiSeq Illumina MiniSeq Illumina NextSeq 500 Illumina NextSeq 550 Illumina NovaSeq 6000 Illumina NovaSeq X Illumina NovaSeq X Plus Illumina iSeq 100 Ion GeneStudio S5 Ion GeneStudio S5 Plus Ion GeneStudio S5 Prime Ion Torrent Genexus Ion Torrent PGM Ion Torrent Proton Ion Torrent S5 Ion Torrent S5 XL MGISEQ-2000RS MinION NextSeq 1000 NextSeq 2000 Onso PacBio RS PacBio RS II PromethION Revio Sentosa SQ301 Sequel Sequel II Sequel IIe Tapestri UG 100

Library Layout Required

Name	lib_layout
Description	Specify whether to expect single, paired, or other configuration of reads for sequencing
Example	Paired
Reference	https://w3id.org/mixs/0000111
Namespace	mixs:lib_layout
Allowed Values	Other Paired Single Vector

UMI Barcode Read

Name	umi_barcode_read
Description	The type of read that contains the Unique Molecular Identifier (UMI) barcode.
Example	index2
Reference	#
Namespace	ei:umi_barcode_read
Allowed Values	index1 index2 read1 read2

UMI Barcode Offset

Name	umi_barcode_offset
Description	The offset in sequence of the Unique Molecular Identifier (UMI) identifying barcode.
Example	0
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_offset

UMI Barcode Size

Name	umi_barcode_size
Description	The size of the Unique Molecular Identifier (UMI) identifying barcode.
Example	10
Reference	#
Regex	^\d+$
Namespace	ei:umi_barcode_size

Cell Barcode Read

Name	cell_barcode_read
Description	The type of read that contains the UMI barcode.
Example	index1
Reference	http://www.ebi.ac.uk/efo/EFO_0010203
Namespace	ontology:cell_barcode_read
Allowed Values	index1 index2 read1 read2

Cell Barcode Offset

Name	cell_barcode_offset
Description	The offset in sequence of the cell identifying barcode.
Example	10
Reference	http://www.ebi.ac.uk/efo/EFO_0010204
Regex	^\d+$
Namespace	ontology:cell_barcode_offset

Cell Barcode Size

Name	cell_barcode_size
Description	The offset in sequence of the cell identifying barcode.
Example	0
Reference	http://www.ebi.ac.uk/efo/EFO_0010205
Regex	^\d+$
Namespace	ontology:cell_barcode_size

cDNA Read Required

Name	cdna_read
Description	The actual nucleotide sequence obtained from Complementary DNA (cDNA) during sequencing.
Example	read1
Reference	http://www.ebi.ac.uk/efo/EFO_0010195
Namespace	ontology:cdna_read
Allowed Values	index1 index2 read1 read2

cDNA Read Offset

Name	cdna_read_offset
Description	The starting position of the Complementary DNA (cDNA) read within the entire sequence, indicating where the read begins after any barcodes or technical sequences.
Example	6
Reference	http://www.ebi.ac.uk/efo/EFO_0010201
Regex	^\d+$
Namespace	ontology:cdna_read_offset

cDNA Read Size

Name	cdna_read_size
Description	The size of the Complementary DNA (cDNA) read.
Example	75
Reference	http://www.ebi.ac.uk/efo/EFO_0010202
Regex	^\d+$
Namespace	ontology:cdna_read_size

Analysis Derived Data

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File Derived From

Name	file_derived_from
Description	The name of the file that was used to generate the analysis derived data.
Example	file1_sequencing.json
Reference	#
Namespace	ei:file_derived_from

Inferred Cell Type

Name	inferred_cell_type
Description	Post analysis cell type or identity declaration based on expression profile or known gene function identified by the performer.
Example	type II bipolar neuron
Reference	#
Namespace	ei:inferred_cell_type

Post Analysis Cell Well Quality

Name	post_analysis_cell_well_quality
Description	Performer defined measure of whether the read output from the cell was included in the sequencing analysis. For example, cells might be excluded if a threshold percentage of reads did not map to the genome or if pre-sequencing quality measures were not passed.
Example	Pass
Reference	#
Namespace	ei:post_analysis_cell_well_quality
Allowed Values	Fail Pass

Other Derived Cell Attributes

Name	other_derived_cell_attributes
Description	Any other cell level measurement or annotation as result of the analysis.
Example	Cluster
Reference	#
Namespace	ei:other_derived_cell_attributes
Allowed Values	Cluster Count Gene UMI tSNE coordinates

Raw Data Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Reference Genome

Name	reference_genome
Description	Indicate version and include stable link to genome data (or attach genome fasta file).
Example	GRCh38, https://example.org/grch38.fa
Reference	#
Namespace	ei:reference_genome

Genome Annotation

Name	genome_annotation
Description	Indicate version and include stable link. Also indicate if any modification to the original annotation has been applied (e.g. 3' UTR extension) and include modified annotation file employed in the analysis.
Example	Ensembl v101, https://example.org/ensembl_v101.gtf
Reference	#
Namespace	ei:genome_annotation

Annotation Filtering

Name	annotation_filtering
Description	Indicate which features were filtered (i.e. protein coding, pseudo-genes, TCRs, etc.)
Example	Filtered to include only protein-coding genes
Reference	#
Namespace	ei:annotation_filtering

Genes vs Exons

Name	genes_vs_exons
Description	Quantification using whole gene intervals or exons.
Example	Exon quantification
Reference	#
Namespace	ei:genes_vs_exons

Library Structure

Name	library_structure
Description	seqspec format
Example	Single-cell 3' library
Reference	#
Namespace	ei:library_structure

Mapping and Demultiplexing Software

Name	mapping_and_demultiplexing_software
Description	Reads/UMI
Example	Cell Ranger 6.0.0
Reference	#
Namespace	ei:mapping_and_demultiplexing_software

Read Mapping Statistics

Name	read_mapping_statistics
Description	Statistics of the Reads or Unique Molecular Identifier (UMI).
Example	80% reads mapped to reference
Reference	#
Namespace	ei:read_mapping_statistics

Sequencing Saturation

Name	sequencing_saturation
Description	Depending on number of cells recovered (not targeted) and technology
Example	95% sequencing saturation
Reference	#
Namespace	ei:sequencing_saturation

UMIs or Barcode Distribution QC

Name	umis_barcode_distribution_qc
Description	Show Unique Molecular Identifiers (UMIs) per barcode distribution and threshold applied
Example	Threshold: 10 UMIs per barcode
Reference	#
Namespace	ei:umis_barcode_distribution_qc

Cell or Non-Cell Filtering Strategy

Name	cell_non_cell_filtering_strategy
Description	Unique Molecular Identifier (UMI) threshold used to discriminate cells from non-cells. Description of algorithm (if any) and parameters used to determine cells or non-cells.
Example	Threshold: 5 UMIs for cell detection
Reference	#
Namespace	ei:cell_non_cell_filtering_strategy

Other Quality Filters Applied

Name	other_quality_filters_applied
Description	Cells/nuclei discarded based on % mitochondrial reads, % rRNA reads, etc.
Example	Cells with >20% mitochondrial reads discarded
Reference	#
Namespace	ei:other_quality_filters_applied

Ambient RNA QC

Name	ambient_rna_qc
Description	Report % UMIs in background cell barcodes, and algorithm (if any) used to remove ambient RNA
Example	Ambient RNA removed if >5% UMIs in background barcodes
Reference	#
Namespace	ei:ambient_rna_qc

Predicted Doublet Rate QC

Name	predicted_doublet_rate_qc
Description	Depending on number of cells recovered (not targeted) and technology
Example	Predicted doublet rate: 1.5%
Reference	#
Namespace	ei:predicted_doublet_rate_qc

Individual Organism SNP Demultiplexing

Name	individual_organism_snp_demultiplexing
Description	If carried out, show SNP partitioning quality (e.g. SNP UMAP embedding or covariance matrix), algorithm used
Example	SNP UMAP embedding using CellSNP
Reference	#
Namespace	ei:individual_organism_snp_demultiplexing

Downstream Processing

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Clustering Algorithm and Version

Name	clustering_algorithm_and_version
Description	If compared/integrated with existing datasets
Example	Louvain 0.8.0
Reference	#
Namespace	ei:clustering_algorithm_and_version

Clustering Parameters

Name	clustering_parameters
Description	If compared/integrated with existing datasets
Example	Resolution: 0.6, K-nearest neighbors: 10
Reference	#
Namespace	ei:clustering_parameters

Integration/Batch Correction

Name	integration_batch_correction
Description	If compared/integrated with existing datasets
Example	Harmony v1.0
Reference	#
Namespace	ei:integration_batch_correction

Data Availability Checklist

Source Code

Name	source_code
Description	If any newly developed code/software has been used in the processing and downstream analysis of the dataset.
Example	Source code is hosted on GitHub and includes custom algorithms for UMI count normalization. The repository can be found at: https://github.com/user/umi-normalization.
Reference	#
Namespace	ei:source_code

UMI Count Matrix

Name	umi_count_matrix
Description	Gene x cell matrix with UMI counts for each gene in each cell.
Example	The UMI count matrix is stored in a CSV file with gene IDs as rows (e.g., ENSG00000139618) and cell barcodes as columns (e.g., Cell_001, Cell_002). The matrix file is available at: https://example.com/umi_count_matrix.csv.
Reference	#
Namespace	ei:umi_count_matrix

Ensembl IDs

Name	ensembl_ids
Description	Gene or transcript names should be listed as Ensembl (or other standardized ID), with gene short names in metadata.
Example	ENSG00000139618
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:ensembl_ids

Functional Gene Annotations

Name	functional_gene_annotations
Description	Any functional annotation generated/used (gene names, GOs, structural domains, etc.).
Example	Functional gene annotations, including Gene Ontology (GO) terms, are provided in the metadata. For example, the gene 'ENSG00000139618' (BRCA1) is annotated with the GO term 'GO:0003674' (DNA binding).
Reference	#
Namespace	ei:functional_gene_annotations

Protein Models

Name	protein_models
Description	FASTA file with (or stable link to) the predicted proteins associated to genes in the UMI count matrix and matching IDs.
Example	The protein sequences for genes are provided in a FASTA file available at: https://example.com/protein_models.fasta, where each protein sequence is linked to the corresponding gene ID.
Reference	#
Namespace	ei:protein_models

Cell Metadata

Name	cell_metadata
Description	Table mapping cell IDs to cluster/cell type/broad cell type annotations.
Example	Cell metadata includes information such as cell type annotations ('Tumor', 'Normal') and experimental conditions ('Control', 'Treatment'). This data is available in a table at: https://example.com/cell_metadata.csv.
Reference	#
Namespace	ei:cell_metadata

Cluster-Level Normalised Expression Tables

Name	cluster_level_normalised_expression_tables
Description	Expression tables that show normalised gene expression at the cluster or cell-type level.
Example	Normalised gene expression data at the cluster level is provided in a tab-delimited text file. For example, gene 'ENSG00000139618' (BRCA1) has expression values for clusters: Cluster_1: 1200, Cluster_2: 900. The full expression table is available at: https://example.com/cluster_level_expression.csv.
Reference	#
Namespace	ei:cluster_level_normalised_expression_tables

Other Resource Files

Name	other_resource_files
Description	Necessary to re-use and interpret the data. E.g. barcode information in complex, serial multiplexing protocols (clicktags).
Example	Barcode information used in multiplexing protocols is provided in a separate file, which can be accessed at: https://example.com/barcode_data.csv.
Reference	#
Namespace	ei:other_resource_files

File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	file_id
Description	A unique alphanumeric identifier for this file
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric reference or identifier for the library preparation protocol used during the sequencing.
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Sequencing ID Required

Name	sequencing_id
Description	A unique alphanumeric reference or identifier for the sequencing protocol.
Example	SEQ001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:sequencing_id

Read 1 File Required

Name	read_1_file
Description	The name or accession of the file that contains read 1.
Example	file1_r1.fastq.gz
Reference	#
Namespace	ei:read_1_file

Read 2 File

Name	read_2_file
Description	The name or accession of the file that contains read 2.
Example	file2_r2.fastq.gz
Reference	#
Namespace	ei:read_2_file

Index 1 File

Name	index_1_file
Description	The name of the file that contains index 1.
Example	file1_i1.fastq.gz
Reference	#
Namespace	ei:index_1_file

Index 2 File

Name	index_2_file
Description	The name of the file that contains index 2.
Example	file2_i2.fastq.gz
Reference	#
Namespace	ei:index_2_file

Read 1 Checksum Required

Name	read_1_file_checksum
Description	Result of a hash function calculated on the content of the read 1 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	f8d29e41a73b5c02de9a6fb314e7c8ad
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_1_file_checksum

Read 2 Checksum

Name	read_2_file_checksum
Description	Result of a hash function calculated on the content of the read 2 file to verify file integrity. Commonly used algorithms include MD5 and SHA-1. The checksums should be separated by a comma (,).
Example	a3f4c1b29d8e57fa41b02de6c7f9ab83
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:read_2_file_checksum

White List Barcode File

Name	white_list_barcode_file
Description	A file containing the known cell barcodes in the dataset.
Example	barcodes.tsv
Reference	#
Namespace	ei:white_list_barcode_file

Expression Data Process Setting

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

Expression Data Process Setting ID Required

Name	expression_data_process_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_process_setting_id

Matrix Type

Name	matrix_type
Description	Matrix Type
Example	raw_counts
Reference	#
Namespace	ei:matrix_type
Allowed Values	imputed log1p nomalised pseudobulk raw_counts scaled

Reference Genome Required

Name	reference_genome
Description	The associated reference genome
Example	https://reference-genome-example.com
Reference	#
Regex	^((https?\|ftp):\/\/[^\s\|]+)(\\|((https?\|ftp):\/\/[^\s\|]+))*$
Namespace	ei:reference_genome

Annotation Version

Name	annotation_version
Description	The annotation version of the associated reference genome
Example	GENCODE v44
Reference	#
Namespace	ei:annotation_version

Normalisation Method

Name	normalisation_method
Description	Any normalisation processing performed
Example	Log normalisation
Reference	#
Namespace	ei:normalisation_method
Allowed Values	Library Size Normalisation Log Normalisation SCNorm SCTransform scran

Highly Variable Gene Selection (HVG)

Name	highly_variable_gene_selection
Description	Number of Highly Variable Genes
Example	seurat_v3, n=2000
Reference	#
Namespace	ei:highly_variable_gene_selection

Dimensionality Reduction

Name	dimensionality_reduction
Description	Method used to reduce dimensionality in the expression data
Example	PCA
Reference	#
Namespace	ei:dimensionality_reduction
Allowed Values	Diffusion Map ICA NMF PCA UMAP t-SNE

Number of Nearest Neighbours

Name	n_neighbours
Description	Number of nearest neighbours used to calculate cluster membership
Example	pca:50
Reference	#
Namespace	ei:n_neighbours

Clustering Algorithm

Name	clustering_algorithm
Description	Algorithm used to create clusters
Reference	#
Namespace	ei:clustering_algorithm

Clustering Resolution

Name	clustering_resolution
Description	Resolution parameter
Example	2.5
Reference	#
Regex	^([0-9]*[.])?[0-9]+
Namespace	ei:clustering_resolution

Clustering Distance Metric

Name	clustering_distance_metric
Description	Metic used to calculate a points distance to others
Example	cosine
Reference	#
Namespace	ei:clustering_distance_metric
Allowed Values	cosine euclidean hamming jaccard manhatten mehalanobis

Software Versions

Name	software_versions
Description	Primary software packages used for analysis
Reference	#
Namespace	ei:software_versions

Cell Type Annotation

Name	cell-type annotation
Description	Tools and Databases used for cell annotation
Reference	#
Namespace	ei:cell-type annotation

Generated by Pipeline

Name	generated_by_pipeline
Description	URL of the deposited pipeline used to create this data
Reference	#
Regex	^(https?\|ftp):\/\/[^\s/$.?#].[^\s]*$
Namespace	ei:generated_by_pipeline

Notes

Name	notes
Description	Any other information
Reference	#
Namespace	ei:notes

Expression Data File

Study ID Required

Name	study_id
Description	A unique alphanumeric identifier for this study
Example	STUDY001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:study_id

File ID Required

Name	expression_data_file_id
Description	A unique alphanumeric identifier for the expression data file
Example	EXPFILE001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_file_id

Library Preparation ID Required

Name	library_prep_id
Description	A unique alphanumeric identifier for library preparation
Example	LIBPREP001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:library_prep_id

Expression Data Process Setting ID Required

Name	expression_data_setting_id
Description	A unique alphanumeric identifier for the expression data process setting
Example	EXPSET001
Reference	#
Regex	^[a-zA-Z0-9]+$
Namespace	ei:expression_data_setting_id

File Name Required

Name	expression_data_file
Description	Expression data file name
Example	exp_file.csv
Reference	#
Namespace	ei:expression_data_file

File md5 Checkshum Required

Name	expression_data_file_checksum
Description	calculated md5 checksum for this file
Example	9e4b7a23f6c1d0ab85f29c47e3d8a610
Reference	#
Regex	^[0-9a-f]{32}$
Namespace	ei:expression_data_file_checksum

File Format Required

Name	expression_data_file_format
Description	The format of the expression file, such as h5ad or rds
Example	csv
Reference	#
Namespace	ei:expression_data_file_format
Allowed Values	csv h5ad loom mtx rds

Number of Cells

Name	n_cells
Description	The number of cells represented in the expression data
Example	4
Reference	#
Regex	^\d+$
Namespace	ei:n_cells

Number of Genes

Name	n_genes
Description	The number of genese represented in the expression data
Example	50
Reference	#
Regex	^\d+$
Namespace	ei:n_genes

File Size in Bytes

Name	file_size_bytes
Description	Size of the file recorded in bytes
Example	90
Reference	#
Regex	^\d+$
Namespace	ei:file_size_bytes

Date Generated

Name	date_generated
Description	Approximate date this expression data was generated
Example	2024-10-14
Reference	#
Regex	^\d{4}-(0[1-9]\|1[0-2])-(0[1-9]\|[12]\d\|3[01])$
Namespace	ei:date_generated