This
page
is
part
of
the
FHIR
Specification
(v4.0.1:
R4
(v5.0.0:
R5
-
Mixed
Normative
and
STU
)
).
This
is
the
current
published
version
in
it's
permanent
home
(it
will
always
be
available
at
this
URL).
The
current
version
which
supercedes
this
version
is
5.0.0
.
For
a
full
list
of
available
versions,
see
the
Directory
of
published
versions
.
Page
versions:
R5
R4B
R4
Raw
data
describing
Representation
of
a
biological
molecular
sequence.
The
Clinical
Genomics
committee
has
identified
overlaps
and
redundancies
between
content
in
the
MolecularSequence
resource
and
content
in
Observation
profiles
in
the
evolving
Implementation
Guide
for
Clinical
Genomics
Reporting
found
here
.
The
committee
is
considering
options
for
modifying
the
resource
and
anticipates
potential
changes
being
brought
forward
in
an
upcoming
ballot.
10.6.1
10.7.1
Scope
and
Usage
The
MolecularSequence
resource
is
designed
to
describe
an
atomic
sequence
which
contains
the
alignment
sequencing
test
result
and
multiple
variations.
Atomic
sequences
can
be
connected
by
link
element
and
they
will
lead
to
sequence
graph.
By
this
method,
a
sequence
for
representing
molecular
sequences.
It
can
be
reported.
Complete
genetic
sequence
information,
of
which
specific
genetic
variations
are
a
part,
is
reported
by
reference
to
the
GA4GH
repository.
Thus,
represent
the
FHIR
MolecularSequence
resource
avoids
large
genomic
payloads
sequence
in
a
manner
analogous
different
ways,
allowing
implementations
to
how
adopt
the
FHIR
ImagingStudy
resource
references
large
images
maintained
in
other
systems.
For
most
effective
one
for
their
use
cases,
details
on
how
this
resource
interact
with
other
Clinical
Genomics
resources
or
profiles,
please
refer
to
implementation
guidance
document
here
.
10.6.1.1
Genetic
Standards
and
Resources
include:
Variant
Databases:
dbSNP
,
ClinVar
,
and
COSMIC
Reference
Sequences:
RefSeq
and
ENSEMBL
This
resource
is
designed
to
describe
sequence
variations
with
clinical
significance
with
information
such
as:
case.
Name
of
the
variation
represented
Type
of
the
variation
Gene
region
occupied
by
the
variation
Tissue
source
used
to
determine
genotype
of
the
variation
Quality
of
the
result
It
is
strongly
encouraged
to
provide
all
available
as
much
information
in
this
resource
for
any
reported
variants,
sequences,
because
receiving
systems
(e.g.
discovery
research,
outcomes
analysis,
and
public
health
reporting)
may
use
this
information
to
normalize
variants
sequences
over
time
or
across
sources.
However,
these
data
should
not
be
used
to
dynamically
correct/change
variant
sequence
representations
for
clinical
use
outside
of
the
laboratory,
due
to
insufficient
information.
Implementers
should
be
aware
The
MolecularSequence
resource
is
designed
to
represent
a
single
sequence
in
an
instance.
Each
sequence
might
have
multiple
representations,
but
implementers
SHALL
ensure
all
representations
are
for
the
same
sequence.
This
means
that
if
a
single
MolecularSequence
instance
contains
a
literal
,
two
formatted
files,
and
a
relative
,
all
four
of
those
representations
must
represent
the
same
sequence.
This
can
be
a
challenge
across
systems,
as
semantic
equivalency
of
results
of
genetic
variants
sequences
cannot
be
guaranteed
unless
there
is
an
agreed
upon
standard
between
sending
and
receiving
systems.
10.6.2
10.7.2
Boundaries
and
Relationships
Focus
of
the
The
MolecularSequence
resource
is
should
only
be
used
to
provide
sequencing
alignment
data
immediately
relevant
capture
a
molecular
sequence.
It
will
not
be
used
for
other
entities
such
as
variant,
variant
annotations,
genotypes,
haplotypes,
etc.
Those
concepts
will
be
captured
in
Observation
profiles
found
in
the
Genomics
Reporting
Implementation
Guide
.
The
sequence
that
was
observed
that
led
to
what
the
interpretation
on
clinical
decision-making
originates
from.
Hence
identification
of
those
concepts
can
be
delivered
with
this
resource,
and
will
be
referenced
by
those
observations.
MolecularSequence
will
not
be
used
to
capture
data
such
as
precise
read
of
DNA
sequences
and
sequence
alignment
are
not
included;
such
data
are
nonetheless
may
be
accessible
through
references
to
GA4GH
(Global
Alliance
for
Genomics
and
Health)
API.
The
MolecularSequence
resource
will
API,
and
may
be
referenced
by
Observation
to
provide
variant
information.
As
clinical
assessments/diagnosis
of
a
patient
are
typically
captured
in
the
Condition
resource
or
the
ClinicalImpression
resource,
the
MolecularSequence
resource
can
be
referenced
by
the
Condition
resource
to
provide
specific
genetic
data
to
support
assertions.
This
is
analogous
to
how
Condition
references
other
resources,
such
as
AllergyIntolerance
,
Procedure
,
and
Questionnaire
resources.
formatted
element.
Base
number
of
coordinate
system
(0
for
0-based
numbering
or
coordinates,
inclusive
start,
exclusive
end,
1
for
1-based
numbering,
inclusive
start,
inclusive
end)
Subject
this
sequence
is
associated
too
A
Embedded
file
or
a
link
(URL)
which
contains
content
to
represent
the
sequence
used
as
reference
+
Rule:
GenomeBuild
and
chromosome
must
be
both
contained
if
either
one
of
them
is
contained
+
Rule:
Have
and
only
have
one
of
the
following
elements
in
referenceSeq
:
1.
genomeBuild
;
2
referenceSeqId;
3.
referenceSeqPointer;
4.
referenceSeqString;
Chromosome
containing
genetic
finding
A
sequence
defined
relative
to
another
sequence
chromosome-human
(
Example
)
genomeBuild
Σ
0..1
string
The
Genome
Build
used
for
reference,
following
GRCh
build
versions
e.g.
'GRCh
37'
orientation
Σ
0..1
code
sense
|
antisense
orientationType
(
Required
)
Reference
identifier
Ways
of
identifying
nucleotides
or
amino
acids
within
a
sequence
Binding:
LL5323-2
E
n
s
e
m
b
l
(
Example
)
referenceSeqPointer
Σ
0..1
Reference
(
MolecularSequence
Extensible
)
A
pointer
to
another
MolecularSequence
entity
as
reference
sequence
referenceSeqString
Σ
0..1
string
A
string
to
represent
reference
sequence
strand
Σ
0..1
code
watson
|
crick
strandType
(
Required
)
windowStart
Σ
0..1
integer
Start
position
of
the
window
on
the
reference
sequence
windowEnd
ordinalPosition
End
position
of
the
window
on
Indicates
the
reference
sequence
variant
Σ
0..*
BackboneElement
Variant
order
in
which
the
sequence
should
be
considered
when
putting
multiple
'relative'
elements
together
start
Σ
0..1
integer
Start
position
of
the
variant
on
the
reference
sequence
End
position
of
the
variant
on
Indicates
the
reference
sequence
observedAllele
Σ
0..1
string
Allele
that
was
observed
referenceAllele
Σ
0..1
string
Allele
nucleotide
range
in
the
reference
composed
sequence
cigar
Σ
0..1
string
when
multiple
'relative'
elements
are
used
together
Extended
CIGAR
string
for
aligning
the
sequence
with
reference
bases
A
sequence
used
as
quality
of
starting
sequence
type
Σ
1..1
+
Rule:
Both
genomeAssembly
and
chromosome
must
be
both
contained
if
either
one
of
them
is
contained
code
+
Rule:
Have
and
only
have
one
of
the
following
elements
in
startingSequence:
1.
genomeAssembly;
2
sequence
indel
|
snp
|
unknown
qualityType
(
Required
)
Standard
sequence
The
genome
assembly
used
for
comparison
starting
sequence,
e.g.
GRCh38
Binding:
LL1040-6
F
d
a-
standard
sequence
(
Example
Extensible
)
start
Σ
0..1
integer
Start
position
of
the
sequence
end
Σ
0..1
integer
End
position
of
the
sequence
score
chromosome
True
positives
from
the
perspective
of
the
query
data
truthFN
Σ
0..1
decimal
False
negatives
queryFP
Σ
0..1
decimal
The
reference
sequence
that
represents
the
starting
sequence
Binding:
Multiple
bindings
acceptable
(NCBI
or
LRG)
(
Example
)
False
positives
gtFP
Σ
0..1
decimal
False
positives
where
the
non-REF
alleles
in
the
Truth
and
Query
Call
Sets
match
precision
Σ
0..1
decimal
Precision
of
comparison
recall
Σ
sequenceCodeableConcept
Average
number
End
position
of
reads
representing
a
given
nucleotide
in
the
reconstructed
window
on
the
starting
sequence
repository
Σ
0..*
BackboneElement
External
repository
which
contains
detailed
report
related
with
observedSeq
in
this
resource
@prefix fhir: <http://hl7.org/fhir/> .[ a fhir:;
[ a fhir:MolecularSequence;
fhir:nodeRole fhir:treeRoot; # if this is the parser root
# from Resource: .id, .meta, .implicitRules, and .language
# from DomainResource: .text, .contained, .extension, and .modifierExtension fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
];
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
], ...;
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
];
], ...;
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
], ...;
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
fhir:
];
fhir:
fhir:
fhir:
];
], ...;
fhir:identifier ( [ Identifier ] ... ) ; # 0..* Unique ID for this particular sequence
fhir:type[ code ] ; # 0..1 aa | dna | rna
fhir:subject[ Reference(BiologicallyDerivedProduct|Group|NutritionProduct|Patient|Substance) ] ; # 0..1 Subject this sequence is associated too
fhir:focus ( [ Reference(Any) ] ... ) ; # 0..* What the molecular sequence is about, when it is not about the subject of record
fhir:specimen[ Reference(Specimen) ] ; # 0..1 Specimen used for sequencing
fhir:device[ Reference(Device) ] ; # 0..1 The method for sequencing
fhir:performer[ Reference(Organization) ] ; # 0..1 Who should be responsible for test result
fhir:literal[ string ] ; # 0..1 Sequence that was observed
fhir:formatted ( [ Attachment ] ... ) ; # 0..* Embedded file or a link (URL) which contains content to represent the sequence
fhir:relative( [ # 0..* A sequence defined relative to another sequence
fhir:coordinateSystem[ CodeableConcept ] ; # 1..1 Ways of identifying nucleotides or amino acids within a sequence
fhir:ordinalPosition[ integer ] ; # 0..1 Indicates the order in which the sequence should be considered when putting multiple 'relative' elements together
fhir:sequenceRange[ Range ] ; # 0..1 Indicates the nucleotide range in the composed sequence when multiple 'relative' elements are used together
fhir:startingSequence[ # 0..1 A sequence used as starting sequence
fhir:genomeAssembly[ CodeableConcept ] ; # 0..1 IThe genome assembly used for starting sequence, e.g. GRCh38
fhir:chromosome[ CodeableConcept ] ; # 0..1 IChromosome Identifier
# sequence[x]: 0..1 IThe reference sequence that represents the starting sequence. One of these 3
fhir:sequence[ a fhir:CodeableConcept ; CodeableConcept ]
fhir:sequence[ a fhir:string ; string ]
fhir:sequence[ a fhir:Reference ; Reference(MolecularSequence) ]
fhir:windowStart[ integer ] ; # 0..1 Start position of the window on the starting sequence
fhir:windowEnd[ integer ] ; # 0..1 End position of the window on the starting sequence
fhir:orientation[ code ] ; # 0..1 sense | antisense
fhir:strand[ code ] ; # 0..1 watson | crick
] ;
fhir:edit( [ # 0..* Changes in sequence from the starting sequence
fhir:start[ integer ] ; # 0..1 Start position of the edit on the starting sequence
fhir:end[ integer ] ; # 0..1 End position of the edit on the starting sequence
fhir:replacementSequence[ string ] ; # 0..1 Allele that was observed
fhir:replacedSequence[ string ] ; # 0..1 Allele in the starting sequence
] ... ) ;
] ... ) ;
]
Changes
since
R3
This
resource
did
not
exist
in
Release
2
This
analysis
is
available
as
XML
or
JSON
.
See
R3
<-->
from
both
R4
Conversion
Maps
(status
=
14
tests
that
all
execute
ok.
All
tests
pass
round-trip
testing
and
all
r3
resources
are
valid.)
Structure
R4B
+
Rule:
Only
0
and
1
are
valid
for
coordinateSystem
Type
Reference:
Added
Target
Types
Group,
Substance,
BiologicallyDerivedProduct,
NutritionProduct
Elements
defined
in
Ancestors:
id
,
meta
,
implicitRules
,
language
,
text
,
contained
,
extension
,
modifierExtension
identifier
Σ
0..*
MolecularSequence.focus
Identifier
Added
Element
Unique
ID
for
this
particular
sequence.
This
is
a
FHIR-defined
id
type
Σ
0..1
MolecularSequence.literal
code
Added
Element
aa
|
dna
|
rna
sequenceType
(
Required
)
coordinateSystem
Σ
MolecularSequence.formatted
1..1
integer
Added
Element
Base
number
of
coordinate
system
(0
for
0-based
numbering
or
coordinates,
inclusive
start,
exclusive
end,
1
for
1-based
numbering,
inclusive
start,
inclusive
end)
patient
MolecularSequence.relative
Σ
0..1
Added
Element
Reference
(
Patient
)
Who
and/or
what
this
is
about
specimen
MolecularSequence.relative.coordinateSystem
Σ
0..1
Added
Mandatory
Element
Reference
(
Specimen
)
Specimen
used
for
sequencing
device
MolecularSequence.relative.ordinalPosition
Σ
0..1
Added
Element
Reference
(
Device
)
The
method
for
sequencing
performer
MolecularSequence.relative.sequenceRange
Σ
0..1
Added
Element
Reference
(
Organization
)
Who
should
be
responsible
for
test
result
quantity
Σ
MolecularSequence.relative.startingSequence
0..1
Quantity
Added
Element
The
number
of
copies
of
the
sequence
of
interest.
(RNASeq)
+
Rule:
GenomeBuild
and
chromosome
must
be
both
contained
if
either
one
of
them
is
contained
Added
Element
+
Rule:
Have
and
only
have
one
of
the
following
elements
in
referenceSeq
:
1.
genomeBuild
;
2
referenceSeqId;
3.
referenceSeqPointer;
4.
referenceSeqString;
Start
position
of
the
window
on
the
reference
sequence
MolecularSequence.quality
windowEnd
Deleted
(Removed
from
the
resource.)
Σ
0..1
MolecularSequence.readCoverage
integer
End
position
of
Deleted
(Removed.
Covered
by
the
window
on
RegionStudied
Profile
in
the
reference
sequence
variant
CG
IG:
http://hl7.org/fhir/uv/genomics-reporting/index.html)
Σ
0..*
MolecularSequence.repository
BackboneElement
Deleted
(->formatted)
Variant
in
sequence
MolecularSequence.pointer
start
Deleted
(->relative)
Σ
0..1
MolecularSequence.structureVariant
integer
Start
position
of
Deleted
(Removed.
Covered
by
the
variant
on
Variant
Profile
in
the
reference
sequence
end
CG
IG:
http://hl7.org/fhir/uv/genomics-reporting/index.html)
Quality
score
for
the
comparison
method
Σ
0..1
CodeableConcept
Method
to
get
quality
F
d
a-
method
Reference
(
Example
Any
)
truthTP
Σ
0..1
decimal
True
positives
from
the
perspective
of
What
the
truth
data
queryTP
Σ
0..1
decimal
True
positives
from
molecular
sequence
is
about,
when
it
is
not
about
the
perspective
subject
of
the
query
data
truthFN
Σ
0..1
decimal
record
Sensitivity
of
the
GQ
score
A
sequence
used
as
starting
sequence
fMeasure
Σ
0..*
decimal
+
Rule:
Both
genomeAssembly
and
chromosome
must
be
both
contained
if
either
one
of
them
is
contained
FScore
+
Rule:
Have
and
only
have
one
of
the
GQ
score
following
elements
in
startingSequence:
1.
genomeAssembly;
2
sequence
readCoverage
Σ
0..1
integer
Average
number
of
reads
representing
a
given
nucleotide
in
the
reconstructed
sequence
repository
genomeAssembly
External
repository
which
contains
detailed
report
related
with
observedSeq
in
this
resource
The
genome
assembly
used
for
starting
sequence,
e.g.
GRCh38
Binding:
LL1040-6
(
Extensible
)
Id
of
the
dataset
that
used
to
call
for
dataset
in
repository
variantsetId
Σ
0..1
string
Id
of
the
variantset
that
used
to
call
for
variantset
in
repository
readsetId
Σ
0..1
string
Id
of
the
read
pointer
sequenceReference
Which
method
is
used
to
get
sequence
quality
method
:
CodeableConcept
[0..1]
«
Changes
from
both
R4
and
R4B
The
method
used
to
evaluate
the
numerical
quality
of
the
observed
sequence.
(Strength=Example)
FDA-Method
??
»
True
positives,
from
the
perspective
of
the
truth
data,
i.e.
the
number
of
sites
in
the
Truth
Call
Set
for
which
there
are
paths
through
the
Query
Call
Set
that
are
consistent
with
all
of
the
alleles
at
this
site,
and
for
which
there
is
an
accurate
genotype
call
for
the
event
truthTP
:
decimal
MolecularSequence
[0..1]
True
positives,
MolecularSequence.subject
Renamed
from
the
perspective
of
the
query
data,
i.e.
the
number
of
sites
in
the
Query
Call
Set
for
which
there
are
paths
through
the
Truth
Call
Set
that
are
consistent
with
all
of
the
alleles
at
this
site,
and
for
which
there
is
an
accurate
genotype
call
for
the
event
queryTP
:
decimal
[0..1]
patient
to
subject
Type
Reference:
Added
Target
Types
Group,
Substance,
BiologicallyDerivedProduct,
NutritionProduct
False
negatives,
i.e.
the
number
of
sites
in
the
Truth
Call
Set
for
which
there
is
no
path
through
the
Query
Call
Set
that
is
consistent
with
all
of
the
alleles
at
this
site,
or
sites
for
which
there
is
an
inaccurate
genotype
call
for
the
event.
Sites
with
correct
variant
but
incorrect
genotype
are
counted
here
truthFN
:
decimal
[0..1]
False
positives,
i.e.
the
number
of
sites
in
the
Query
Call
Set
for
which
there
is
no
path
through
the
Truth
Call
Set
that
is
consistent
with
this
site.
Sites
with
correct
variant
but
incorrect
genotype
are
counted
here
queryFP
:
decimal
[0..1]
MolecularSequence.focus
The
number
of
false
positives
where
the
non-REF
alleles
in
the
Truth
and
Query
Call
Sets
match
(i.e.
cases
where
the
truth
is
1/1
and
the
query
is
0/1
or
similar)
gtFP
:
decimal
[0..1]
Harmonic
mean
of
Recall
and
Precision,
computed
as:
2
*
precision
*
recall
/
(precision
+
recall)
fScore
:
decimal
[0..1]
MolecularSequence.formatted
Roc
Added
Element
Invidual
data
point
representing
the
GQ
(genotype
quality)
score
threshold
score
:
integer
[0..*]
MolecularSequence.relative
The
number
of
true
positives
if
the
GQ
score
threshold
was
set
to
"score"
field
value
numTP
:
integer
[0..*]
Added
Element
The
number
of
false
positives
if
the
GQ
score
threshold
was
set
to
"score"
field
value
numFP
:
integer
[0..*]
The
number
of
false
negatives
if
the
GQ
score
threshold
was
set
to
"score"
field
value
numFN
:
integer
[0..*]
MolecularSequence.relative.coordinateSystem
Calculated
precision
if
the
GQ
score
threshold
was
set
to
"score"
field
value
precision
:
decimal
[0..*]
Added
Mandatory
Element
Calculated
sensitivity
if
the
GQ
score
threshold
was
set
to
"score"
field
value
sensitivity
:
decimal
[0..*]
Calculated
fScore
if
the
GQ
score
threshold
was
set
to
"score"
field
value
fMeasure
:
decimal
[0..*]
MolecularSequence.relative.ordinalPosition
Added
Element
Repository
Click
and
see
/
RESTful
API
/
Need
login
to
see
/
RESTful
API
with
authentication
/
Other
ways
to
see
resource
type
:
code
[1..1]
«
MolecularSequence.relative.sequenceRange
Type
for
access
of
external
URI.
(Strength=Required)
repositoryType
!
»
Added
Element
URI
of
an
external
repository
which
contains
further
details
about
the
genetics
data
url
:
uri
[0..1]
URI
of
an
external
repository
which
contains
further
details
about
the
genetics
data
name
:
string
[0..1]
MolecularSequence.relative.startingSequence
Id
of
the
variant
in
this
external
repository.
The
server
will
understand
how
to
use
this
id
to
call
for
more
info
about
datasets
in
external
repository
datasetId
:
string
[0..1]
Added
Element
Id
of
the
variantset
in
this
external
repository.
The
server
will
understand
how
to
use
this
id
to
call
for
more
info
about
variantsets
in
external
repository
variantsetId
:
string
[0..1]
Id
of
the
read
in
this
external
repository
readsetId
:
string
[0..1]
Structural
variant
outer
start.
If
the
coordinate
system
is
either
0-based
or
1-based,
then
start
position
is
inclusive
start
:
integer
[0..1]
Added
Element
Structural
variant
outer
end.
If
the
coordinate
system
is
0-based
then
end
is
exclusive
and
does
not
include
the
last
position.
If
the
coordinate
system
is
1-base,
then
end
is
inclusive
and
includes
the
last
position
end
:
integer
[0..1]
Structural
variant
inner
end.
If
the
coordinate
system
is
0-based
then
end
is
exclusive
and
does
not
include
the
last
position.
If
the
coordinate
system
is
1-base,
then
end
is
inclusive
and
includes
the
last
position
end
:
integer
[0..1]
A
sequence
that
is
used
as
a
reference
to
describe
variants
that
are
present
in
a
sequence
analyzed
referenceSeq
[0..1]
MolecularSequence.relative.edit
Added
Element
The
definition
of
variant
here
originates
from
Sequence
ontology
([variant_of](http://www.sequenceontology.org/browser/current_svn/term/variant_of)).
This
element
can
represent
amino
acid
or
nucleic
sequence
change(including
insertion,deletion,SNP,etc.)
It
can
represent
some
complex
mutation
or
segment
variation
with
the
assist
of
CIGAR
string
variant
MolecularSequence.relative.edit.start
[0..*]
Added
Element
MolecularSequence.relative.edit.end
Receiver
Operator
Characteristic
(ROC)
Curve
to
give
sensitivity/specificity
tradeoff
roc
An
experimental
feature
attribute
that
defines
the
quality
of
MolecularSequence.relative.edit.replacedSequence
Added
Element
MolecularSequence.coordinateSystem
Deleted
(>relative.coordinateSystem)
MolecularSequence.quantity
Deleted
(Removed.
Covered
by
the
feature
Variant
Profile
in
a
quantitative
way,
such
as
a
phred
quality
score
([SO:0001686](http://www.sequenceontology.org/browser/current_svn/term/SO:0001686))
quality
the
CG
IG:
http://hl7.org/fhir/uv/genomics-reporting/index.html)
[0..*]
MolecularSequence.referenceSeq
Deleted
(->relative.startingSequence.sequence[x])
Configurations
of
MolecularSequence.variant
Deleted
(Removed.
Covered
by
the
external
repository.
The
repository
shall
store
target's
observedSeq
or
records
related
with
target's
observedSeq
repository
Variant
Profile
in
the
CG
IG:
http://hl7.org/fhir/uv/genomics-reporting/index.html)
Have
and
only
have
one
of
the
following
elements
in
referenceSeq
:
startingSequence:
1.
genomeBuild
;
genomeAssembly;
2
referenceSeqId;
3.
referenceSeqPointer;
4.
referenceSeqString;
sequence
10.6.4.1
10.7.5.1
MolecularSequence
Coordinate
System
Representing
the
Sequence
When
saving
the
variant
information,
the
nucleic
acid
will
be
numbered
with
order.
Some
files
are
using
0-based
coordinates
(e.g.
BCD
This
resource
supports
three
patterns
for
representing
a
sequence
of
interest:
By
providing
a
literal
string
of
IUPAC
codes
representing
nucleotides
or
amino
acids.
By
linking
to
a
formatted
file
format)
while
some
files
are
using
1-based
coordinates
or
link
containing
the
sequence
information
(e.g.
VCF
FASTA
file
format).
or
GA4GH
sequence
repository).
By
providing
a
list
of
edits
from
a
starting
sequence.
The
element
coordinateSystem
in
MolecularSequence
resource
contains
this
information.
is
designed
to
represent
a
single
sequence
in
an
instance.
Each
sequence
might
have
multiple
representations,
but
implementers
SHALL
ensure
all
representations
are
for
the
same
sequence.
10.7.5.1.1
Sequence
as
a
literal
string
MolecularSequence.coordinateSystem
constraints
within
two
possible
values:
0
for
0-based
system,
which
will
mark
literal
:
This
string
element
can
be
used
to
hold
the
sequence
as
a
string
of
characters.
10.7.5.1.2
Sequence
as
a
file
or
URL
formatted
:
This
Attachment
is
used
to
refer
to
the
sequence
as
embedded
file
content
or
via
a
URL
reference.
This
method
can
be
used
to
refer
to
sequence
data
from
number
0,
while
1
for
1-based
system,
which
will
begin
marking
in
an
external
source.
If
the
first
position
with
number
1.
The
significant
difference
between
two
system
sequence
is
referring
to
a
GA4GH
repository,
the
end
position.
In
0-based
system,
the
end
position
formatted.url
should
refer
to
a
GA4GH
compliant
endpoint
that
conforms
to
GA4GH
data
models.
10.7.5.1.3
Sequence
as
a
series
of
edits
from
a
known
sequence
relative
:
This
complex
element
is
exclusive
,
which
means
used
for
encoding
sequence.
When
the
last
position
information
of
starting
sequence
and
edits
are
provided,
the
observed
sequence
will
not
be
contained
derived.
Here
is
a
picture
below:
10.7.5.1.3.1
Composing
multiple
relative
sequences
into
one
new
sequence
relative.ordinalPosition
:
Indicates
the
order
in
which
the
sequence
window
while
should
be
considered
when
putting
multiple
relative
instances
together.
relative.sequenceRange
:
Indicates
the
nucleotide
range
in
1-based
system,
the
end
position
composed
sequence
when
multiple
relative
instances
are
used
together.
These
attributes
help
to
clarify
what
sequence
is
inclusive
,
which
means
being
represented
with
less
computation/inference
on
the
last
position
recipient
side.
Implementers
SHOULD
use
sequenceRange
first
to
determine
order
as
the
most
reliable.
If
sequenceRange
is
included
not
present
then
ordinalPosition
SHOULD
be
used.
Finally,
if
both
sequenceRange
and
ordinalPosition
are
absent,
then
the
order
of
the
relative
data
elements
SHOULD
be
used
to
calculate
a
composition.
It
is
the
responsibility
of
the
data
sender
to
ensure
the
message
can
be
consistently
understood.
Additionally,
gaps
in
sequenceRange
are
considered
intentional
(i.e.
the
composed
sequence
window.
Note
both
systems
has
an
inclusive
start
position.
contains
a
sequence
of
N's,
the
placeholder
nucleotide,
for
the
gap
range).
For
example,
ACGTGCAT
will
In
a
FGFR2:MET
Fusion
use
case,
where
the
fusion
was
uncovered
through
RNA
sequencing,
a
partial
representation
can
be
numbered
from
1
found
here
.
10.7.5.1.3.2
Representing
the
Starting
Sequence
relative.startingSequence
:
There
are
four
optional
ways
to
8
represent
a
starting
sequence
in
1-based
system
MolecularSequence
resource:
relative.startingSequence.sequenceCodeableConcept
:
Starting
sequence
id
in
public
database;
relative.startingSequence.sequenceReference
:
Reference
to
starting
sequence
stored
in
another
sequence
entity;
relative.startingSequence.genomeAssembly
,
relative.startingSequence.chromosome
:
The
combination
of
genome
assembly
and
chromosome.
The
relative.startingSequence.windowStart
and
relative.startingSequence.windowEnd
defines
a
range
from
the
starting
sequence
that
is
used
to
define
a
subsequence
used
as
the
starting
sequence.
10.7.5.1.3.3
Coordinate
System
When
saving
the
sequence
information,
the
nucleic
acid
will
be
numbered
from
0
to
8
in
with
order.
Some
representations
use
a
0-based
system
to
mark
flanks
(i.e.
place
between
two
Nucleotide).
So
the
interval
[3,5]
in
(e.g.
GA4GH
API,
BAM
files)
while
some
use
a
1-based
system
is
GTG
while
interval
[2,5)
in
(e.g.
VCF
file
format).
The
element
coordinateSystem
contains
this
information.
relative.coordinateSystem
binds
to
a
LOINC
answer
list,
please
review
those
answers
here
as
well
as
the
detailed
description
found
here
.
There
are
lots
of
definition
many
considerations
concerning
with
the
Directionality
directionality
of
DNA
or
RNA.
Here
we
are
using
referenceSeq.orientation
relative.startingSequence.orientation
and
relative.startingSequence.strand
.
referenceSeq.strand
.
orientation
Orientation
represents
the
sense
of
the
sequence,
which
has
different
meanings
depending
on
the
type
.
MolecularSequence.type
.
strand
Strand
represents
the
sequence
writing
order.
Watson
strand
refers
to
5'
to
3'
top
strand
(5'
->
3'),
whereas
Crick
strand
refers
to
5'
to
3'
bottom
strand
(3'
<-
5').
10.6.4.3
String
usage
for
reference
sequence
and
observed
sequence
We
hope
that
string
of
observedSeq
Only
two
possible
values
can
be
constrained
more
than
just
any
normal
string
but
with
notation
tables.
Here
we
present
what
made
by
strand,
watson
and
crick
.
Since
the
nucleotide
acid
directionality
of
the
sequence
string
should
might
be
constrained
within
the
range:
represented
in
different
ways
in
different
omics
scenario,
below
are
examples
of
how
to
map
other
expressions
into
its
correlated
value:
A
-->
adenosine
M
-->
A
C
(amino)
U
-->
uridine
H
-->
A
C
T
V
-->
G
C
A
Watson
Crick
C
-->
cytidine
S
-->
G
C
(strong)
D
-->
G
A
T
5′-to-3′
direction
K
-->
G
T
(keto)
3′-to-5′
direction
G
-->
guanine
W
-->
A
T
(weak)
R
-->
G
A
(purine)
+1
N
-->
A
G
C
T
(any)
-1
T
-->
thymidine
Sense
B
-->
G
T
C
Antisense
Y
-->
T
C
(pyrimidine)
Positive
-
-->
gap
of
indeterminate
length
Negative
while
10.7.5.2
Character
usage
for
sequence
as
strings
There
are
attributes
where
the
amino
acid
sequence
is
represented
as
a
string
of
characters.
relative.startingSequence.sequenceString
relative.edit.replacementSequence
relative.edit.replacedSequence
literal
The
characters
used
in
these
string
representations
of
a
sequence
should
be
constrained
within
to
the
range:
A
alanine
P
proline
B
aspartate
or
asparagine
Q
glutamine
C
cystine
R
arginine
D
aspartate
S
serine
E
glutamate
T
threonine
F
phenylalanine
U
selenocysteine
G
glycine
V
valine
H
histidine
W
tryptophan
I
isoleucine
Y
tyrosine
K
lysine
Z
glutamate
or
glutamine
L
leucine
X
any
M
methionine
*
translation
stop
N
asparagine
-
gap
of
indeterminate
length
IUPAC
codes
found
here
https://www.bioinformatics.org/sms2/iupac.html
.
Chromosome
number
of
the
reference
sequence
MolecularSequence.referenceSeq.chromosome
chromosome-variant-coordinate
composite
reference
Search
parameter
by
chromosome
and
variant
coordinate.
This
will
refer
to
part
of
a
locus
or
part
of
a
gene
where
search
region
will
be
represented
in
1-based
system.
Since
the
coordinateSystem
can
either
be
0-based
or
1-based,
this
search
query
will
include
the
result
of
both
coordinateSystem
that
contains
the
equivalent
segment
of
What
the
gene
or
whole
genome
sequence.
For
example,
a
search
for
molecular
sequence
can
be
represented
as
`chromosome-variant-coordinate=1$lt345$gt123`,
this
means
is
about,
when
it
will
search
for
the
MolecularSequence
resource
with
variants
on
chromosome
1
and
with
position
>123
and
<345,
where
in
1-based
system
resource,
all
strings
within
region
1:124-344
will
be
revealed,
while
in
0-based
system
resource,
all
strings
within
region
1:123-344
will
be
revealed.
You
may
want
to
check
detail
is
not
about
0-based
v.s.
1-based
above.
On
MolecularSequence.variant:
chromosome:
%resource.referenceSeq.chromosome
variant-start:
start
variant-end:
end
chromosome-window-coordinate
composite
Search
parameter
by
chromosome
and
window.
This
will
refer
to
part
of
a
locus
or
part
of
a
gene
where
search
region
will
be
represented
in
1-based
system.
Since
the
coordinateSystem
can
either
be
0-based
or
1-based,
this
search
query
will
include
the
result
of
both
coordinateSystem
that
contains
the
equivalent
segment
subject
of
the
gene
or
whole
genome
sequence.
For
example,
a
search
for
sequence
can
be
represented
as
`chromosome-window-coordinate=1$lt345$gt123`,
this
means
it
will
search
for
the
MolecularSequence
resource
with
a
window
on
chromosome
1
and
with
position
>123
and
<345,
where
in
1-based
system
resource,
all
strings
within
region
1:124-344
will
be
revealed,
while
in
0-based
system
resource,
all
strings
within
region
1:123-344
will
be
revealed.
You
may
want
to
check
detail
about
0-based
v.s.
1-based
above.
record
On
MolecularSequence.referenceSeq:
chromosome:
chromosome
window-start:
windowStart
MolecularSequence.focus
window-end:
windowEnd
(Any)
Search
parameter
by
reference
sequence
and
variant
coordinate.
This
will
refer
to
part
of
a
locus
or
part
of
a
gene
where
search
region
will
be
represented
in
1-based
system.
Since
the
coordinateSystem
can
either
be
0-based
or
1-based,
this
search
query
will
include
the
result
of
both
coordinateSystem
that
contains
the
equivalent
segment
of
the
gene
or
whole
genome
sequence.
For
example,
a
search
for
sequence
can
be
represented
as
`referenceSeqId-variant-coordinate=NC_000001.11$lt345$gt123`,
this
means
it
will
search
for
the
MolecularSequence
resource
with
variants
on
NC_000001.11
and
with
position
>123
and
<345,
where
in
1-based
system
resource,
all
strings
within
region
NC_000001.11:124-344
will
be
revealed,
while
in
0-based
system
resource,
all
strings
within
region
NC_000001.11:123-344
will
be
revealed.
You
may
want
to
check
detail
about
0-based
v.s.
1-based
above.
On
MolecularSequence.variant:
referenceseqid:
%resource.referenceSeq.referenceSeqId
variant-start:
start
variant-end:
end
referenceseqid-window-coordinate
composite
reference
Search
parameter
by
reference
sequence
and
window.
This
will
refer
to
part
of
a
locus
or
part
of
a
gene
where
search
region
will
be
represented
in
1-based
system.
Since
the
coordinateSystem
can
either
be
0-based
or
1-based,
this
search
query
will
include
the
result
of
both
coordinateSystem
The
subject
that
contains
the
equivalent
segment
of
the
gene
or
whole
genome
sequence.
For
example,
a
search
for
sequence
can
be
represented
as
`referenceSeqId-window-coordinate=NC_000001.11$lt345$gt123`,
this
means
it
will
search
for
the
MolecularSequence
resource
with
a
window
on
NC_000001.11
and
with
position
>123
and
<345,
where
in
1-based
system
resource,
all
strings
within
region
NC_000001.11:124-344
will
be
revealed,
while
in
0-based
system
resource,
all
strings
within
region
NC_000001.11:123-344
will
be
revealed.
You
may
want
to
check
detail
is
about
0-based
v.s.
1-based
above.
variant-end
number
End
position
(0-based
exclusive,
which
menas
the
acid
at
this
position
will
not
be
included,
1-based
inclusive,
which
means
the
acid
at
this
position
will
be
included)
of
the
variant.
MolecularSequence.variant.end
variant-start
number
Start
position
(0-based
inclusive,
1-based
inclusive,
that
means
the
nucleic
acid
or
amino
acid
at
this
position
will
be
included)
of
the
variant.
MolecularSequence.variant.start
window-end
number
End
position
(0-based
exclusive,
which
menas
the
acid
at
this
position
will
not
be
included,
1-based
inclusive,
which
means
the
acid
at
this
position
will
be
included)
of
the
reference
sequence.
MolecularSequence.referenceSeq.windowEnd
window-start
number
11
Resources
Start
position
(0-based
inclusive,
1-based
inclusive,
that
means
the
nucleic
acid
or
amino
acid
at
this
position
will
be
included)
of
the
reference
sequence.
MolecularSequence.referenceSeq.windowStart