Curate cell markers

Before ingesting data, let’s curate cell markers in your files so that they can be linked to a standard reference!

Note

Currently, lnschema-covim only has utility functions and a guide for cell markers. However, we expect the curation of other measurements such as genes and proteins to also be necessary. Therefore, lnschema-covim will be expanded in the near future with more curation utility functions as needed.

import lamindb as ln
import bionty as bt
import lnschema_covim as cv
 connected lamindb: anonymous/testdata

Let’s look at a sample .fcs file:

fcs_path = cv.datasets.files_mock_bonn()[0]
fcs_path
'/opt/hostedtoolcache/Python/3.13.5/x64/lib/python3.13/site-packages/lnschema_covim/datasets/Bonn_Mockfile_1.fcs'

You can load FCS file as an AnnData object using readfcs.read:

import readfcs

adata = readfcs.read(fcs_path)

View channels and markers:

adata.var
n channel marker PnB PnR PnG PnE
FSC-A 1 FSC-A 32 1 1 0,0
FSC-H 2 FSC-H 32 262144 1 0,0
FSC-W 3 FSC-W 32 262144 1 0,0
SSC-A 4 SSC-A 32 3 1 0,0
SSC-H 5 SSC-H 32 262144 1 0,0
SSC-W 6 SSC-W 32 262144 1 0,0
CD14 7 FJComp-Blue B 710_50-A CD14 32 262144 1 0,0
CD66b 8 FJComp-Blue E 530_30-A CD66b 32 262144 1 0,0
CD19 9 FJComp-Red A 780_60-A CD19 32 262144 1 0,0
CD1c 10 FJComp-Red B 730_45-A CD1c 32 262144 1 0,0
CD203c 11 FJComp-Red C 670_30-A CD203c 32 262144 1 0,0
CD8 12 FJComp-Vio A 780_60-A CD8 32 262144 1 0,0
CD45 13 FJComp-Vio C 710_20-A CD45 32 262144 1 0,0
CD16 14 FJComp-Vio E 605_40-A CD16 32 262144 1 0,0
LD 15 FJComp-Vio F 586_15-A LD 32 262144 1 0,0
CD4 16 FJComp-Vio G 525_50-A CD4 32 262144 1 0,0
HLA-DR 17 FJComp-Vio H 431_28-A HLA-DR 32 262144 1 0,0
Siglec8 18 FJComp-YG A 780_60-A Siglec8 32 262144 1 0,0
CD11c 19 FJComp-YG C 670_30-A CD11c 32 262144 1 0,0
CD3 20 FJComp-YG D 610_20-A CD3 32 262144 1 0,0
CD56 21 FJComp-YG E 586_15-A CD56 32 262144 1 0,0
Time 22 Time 32 127 0.01 0,0

Validate and standardize cell markers

Let’s validate cell markers in the adata.var:

Tip

Pass custom curate_function to pre-process data before inspection.

cv.inspect_markers(adata)
 validated: ['CD66b', 'CD1c', 'CD203c', 'CD8', 'CD45', 'CD16', 'Siglec8', 'CD11c', 'CD3', 'CD56']
! non-validated: ['CD14', 'CD19', 'LD', 'CD4', 'HLA-DR']

Standardize markers in adata.var:

cv.standardize_markers(adata)
! using default organism = human
! using default organism = human
 standardized 4 markers: {'CD14': 'Cd14', 'CD19': 'Cd19', 'CD4': 'Cd4', 'HLA-DR': 'HLADR'}
! found 1 new marker: {'LD'}
   → add marker manually to the registry or set add_new=True
adata.var
Hide code cell output
n channel marker PnB PnR PnG PnE
FSC-A 1 FSC-A 32 1 1 0,0
FSC-H 2 FSC-H 32 262144 1 0,0
FSC-W 3 FSC-W 32 262144 1 0,0
SSC-A 4 SSC-A 32 3 1 0,0
SSC-H 5 SSC-H 32 262144 1 0,0
SSC-W 6 SSC-W 32 262144 1 0,0
Cd14 7 FJComp-Blue B 710_50-A Cd14 32 262144 1 0,0
CD66b 8 FJComp-Blue E 530_30-A CD66b 32 262144 1 0,0
Cd19 9 FJComp-Red A 780_60-A Cd19 32 262144 1 0,0
CD1c 10 FJComp-Red B 730_45-A CD1c 32 262144 1 0,0
CD203c 11 FJComp-Red C 670_30-A CD203c 32 262144 1 0,0
CD8 12 FJComp-Vio A 780_60-A CD8 32 262144 1 0,0
CD45 13 FJComp-Vio C 710_20-A CD45 32 262144 1 0,0
CD16 14 FJComp-Vio E 605_40-A CD16 32 262144 1 0,0
LD 15 FJComp-Vio F 586_15-A LD 32 262144 1 0,0
Cd4 16 FJComp-Vio G 525_50-A Cd4 32 262144 1 0,0
HLADR 17 FJComp-Vio H 431_28-A HLADR 32 262144 1 0,0
Siglec8 18 FJComp-YG A 780_60-A Siglec8 32 262144 1 0,0
CD11c 19 FJComp-YG C 670_30-A CD11c 32 262144 1 0,0
CD3 20 FJComp-YG D 610_20-A CD3 32 262144 1 0,0
CD56 21 FJComp-YG E 586_15-A CD56 32 262144 1 0,0
Time 22 Time 32 127 0.01 0,0

You can also do a prerun to inspect markers after standardization and curation (this won’t modify the input adata):

def curate_function(adata):
    pass


cv.inspect_markers(fcs_path, standardize=True, curate_function=curate_function)
! using default organism = human
! using default organism = human
 standardized 4 markers: {'CD14': 'Cd14', 'CD19': 'Cd19', 'CD4': 'Cd4', 'HLA-DR': 'HLADR'}
! found 1 new marker: {'LD'}
   → add marker manually to the registry or set add_new=True
 validated: ['Cd14', 'CD66b', 'Cd19', 'CD1c', 'CD203c', 'CD8', 'CD45', 'CD16', 'Cd4', 'HLADR', 'Siglec8', 'CD11c', 'CD3', 'CD56']
! non-validated: ['LD']

For the LD (live/dead) channel, we can register it by passing adda_new=True:

(Or manually via: bt.CellMarker(name="LD").save())

cv.standardize_markers(adata, add_new=True)
! using default organism = human
! using default organism = human
 added 1 new marker: {'LD'}

Map and add synonyms of cell markers

Now you can inspect cell markers again:

cv.inspect_markers(adata)
 All markers are validated!

If new synonyms come up, add them to the corresponding record:

ld_record = bt.CellMarker.filter(name="LD").one()
ld_record.add_synonym("live/dead")

Our public reference: the CellMarker ontology

You can look up and search cell markers from the Bionty public reference

Let’s load the public CellMarker ontology:

public = bt.CellMarker.public()
public
PublicOntology
Entity: CellMarker
Organism: human
Source: cellmarker, 2.0
#terms: 15466

Reference table of the public ontology

The underlying table has 15k terms:

df = public.df()
df.shape
(15466, 5)

Here are the first few of them:

df.head()
name synonyms gene_symbol ncbi_gene_id uniprotkb_id
0 A1BG A1BG 1 P04217
1 A2M A2M 3494 None
2 A2ML1 A2ML1 144568 A8K2U0
3 A4GALT A4GALT 53947 A0A0S2Z5J1
4 AADAC AADAC 13 P22760

Look up or search a cell marker in the public reference

public.search("ccr7").head()
name synonyms gene_symbol ncbi_gene_id uniprotkb_id
1817 Ccr7 CCR7 1236 P32248
1818 CD197 CCR7 1236 P32248

You can look up a specific cell marker using autocompletion on a Lookup object:

lookup = public.lookup()
lookup.ccr7
CellMarker(name='Ccr7', synonyms='', gene_symbol='CCR7', ncbi_gene_id='1236', uniprotkb_id='P32248')

Or using a dictionary:

lookup.dict()["Ccr7"]
CellMarker(name='Ccr7', synonyms='', gene_symbol='CCR7', ncbi_gene_id='1236', uniprotkb_id='P32248')