Curate cell markers¶
Before ingesting data, let’s curate cell markers in your files so that they can be linked to a standard reference!
Note
Currently, lnschema-covim only has utility functions and a guide for cell markers.
However, we expect the curation of other measurements such as genes and proteins to also be necessary.
Therefore, lnschema-covim will be expanded in the near future with more curation utility functions as needed.
import lamindb as ln
import bionty as bt
import lnschema_covim as cv
→ connected lamindb: anonymous/testdata
Let’s look at a sample .fcs file:
fcs_path = cv.datasets.files_mock_bonn()[0]
fcs_path
'/opt/hostedtoolcache/Python/3.13.5/x64/lib/python3.13/site-packages/lnschema_covim/datasets/Bonn_Mockfile_1.fcs'
You can load FCS file as an AnnData object using readfcs.read:
import readfcs
adata = readfcs.read(fcs_path)
View channels and markers:
adata.var
| n | channel | marker | PnB | PnR | PnG | PnE | |
|---|---|---|---|---|---|---|---|
| FSC-A | 1 | FSC-A | 32 | 1 | 1 | 0,0 | |
| FSC-H | 2 | FSC-H | 32 | 262144 | 1 | 0,0 | |
| FSC-W | 3 | FSC-W | 32 | 262144 | 1 | 0,0 | |
| SSC-A | 4 | SSC-A | 32 | 3 | 1 | 0,0 | |
| SSC-H | 5 | SSC-H | 32 | 262144 | 1 | 0,0 | |
| SSC-W | 6 | SSC-W | 32 | 262144 | 1 | 0,0 | |
| CD14 | 7 | FJComp-Blue B 710_50-A | CD14 | 32 | 262144 | 1 | 0,0 |
| CD66b | 8 | FJComp-Blue E 530_30-A | CD66b | 32 | 262144 | 1 | 0,0 |
| CD19 | 9 | FJComp-Red A 780_60-A | CD19 | 32 | 262144 | 1 | 0,0 |
| CD1c | 10 | FJComp-Red B 730_45-A | CD1c | 32 | 262144 | 1 | 0,0 |
| CD203c | 11 | FJComp-Red C 670_30-A | CD203c | 32 | 262144 | 1 | 0,0 |
| CD8 | 12 | FJComp-Vio A 780_60-A | CD8 | 32 | 262144 | 1 | 0,0 |
| CD45 | 13 | FJComp-Vio C 710_20-A | CD45 | 32 | 262144 | 1 | 0,0 |
| CD16 | 14 | FJComp-Vio E 605_40-A | CD16 | 32 | 262144 | 1 | 0,0 |
| LD | 15 | FJComp-Vio F 586_15-A | LD | 32 | 262144 | 1 | 0,0 |
| CD4 | 16 | FJComp-Vio G 525_50-A | CD4 | 32 | 262144 | 1 | 0,0 |
| HLA-DR | 17 | FJComp-Vio H 431_28-A | HLA-DR | 32 | 262144 | 1 | 0,0 |
| Siglec8 | 18 | FJComp-YG A 780_60-A | Siglec8 | 32 | 262144 | 1 | 0,0 |
| CD11c | 19 | FJComp-YG C 670_30-A | CD11c | 32 | 262144 | 1 | 0,0 |
| CD3 | 20 | FJComp-YG D 610_20-A | CD3 | 32 | 262144 | 1 | 0,0 |
| CD56 | 21 | FJComp-YG E 586_15-A | CD56 | 32 | 262144 | 1 | 0,0 |
| Time | 22 | Time | 32 | 127 | 0.01 | 0,0 |
Validate and standardize cell markers¶
Let’s validate cell markers in the adata.var:
Tip
Pass custom curate_function to pre-process data before inspection.
cv.inspect_markers(adata)
→ validated: ['CD66b', 'CD1c', 'CD203c', 'CD8', 'CD45', 'CD16', 'Siglec8', 'CD11c', 'CD3', 'CD56']
! non-validated: ['CD14', 'CD19', 'LD', 'CD4', 'HLA-DR']
Standardize markers in adata.var:
cv.standardize_markers(adata)
! using default organism = human
! using default organism = human
✓ standardized 4 markers: {'CD14': 'Cd14', 'CD19': 'Cd19', 'CD4': 'Cd4', 'HLA-DR': 'HLADR'}
! found 1 new marker: {'LD'}
→ add marker manually to the registry or set add_new=True
adata.var
Show code cell output
| n | channel | marker | PnB | PnR | PnG | PnE | |
|---|---|---|---|---|---|---|---|
| FSC-A | 1 | FSC-A | 32 | 1 | 1 | 0,0 | |
| FSC-H | 2 | FSC-H | 32 | 262144 | 1 | 0,0 | |
| FSC-W | 3 | FSC-W | 32 | 262144 | 1 | 0,0 | |
| SSC-A | 4 | SSC-A | 32 | 3 | 1 | 0,0 | |
| SSC-H | 5 | SSC-H | 32 | 262144 | 1 | 0,0 | |
| SSC-W | 6 | SSC-W | 32 | 262144 | 1 | 0,0 | |
| Cd14 | 7 | FJComp-Blue B 710_50-A | Cd14 | 32 | 262144 | 1 | 0,0 |
| CD66b | 8 | FJComp-Blue E 530_30-A | CD66b | 32 | 262144 | 1 | 0,0 |
| Cd19 | 9 | FJComp-Red A 780_60-A | Cd19 | 32 | 262144 | 1 | 0,0 |
| CD1c | 10 | FJComp-Red B 730_45-A | CD1c | 32 | 262144 | 1 | 0,0 |
| CD203c | 11 | FJComp-Red C 670_30-A | CD203c | 32 | 262144 | 1 | 0,0 |
| CD8 | 12 | FJComp-Vio A 780_60-A | CD8 | 32 | 262144 | 1 | 0,0 |
| CD45 | 13 | FJComp-Vio C 710_20-A | CD45 | 32 | 262144 | 1 | 0,0 |
| CD16 | 14 | FJComp-Vio E 605_40-A | CD16 | 32 | 262144 | 1 | 0,0 |
| LD | 15 | FJComp-Vio F 586_15-A | LD | 32 | 262144 | 1 | 0,0 |
| Cd4 | 16 | FJComp-Vio G 525_50-A | Cd4 | 32 | 262144 | 1 | 0,0 |
| HLADR | 17 | FJComp-Vio H 431_28-A | HLADR | 32 | 262144 | 1 | 0,0 |
| Siglec8 | 18 | FJComp-YG A 780_60-A | Siglec8 | 32 | 262144 | 1 | 0,0 |
| CD11c | 19 | FJComp-YG C 670_30-A | CD11c | 32 | 262144 | 1 | 0,0 |
| CD3 | 20 | FJComp-YG D 610_20-A | CD3 | 32 | 262144 | 1 | 0,0 |
| CD56 | 21 | FJComp-YG E 586_15-A | CD56 | 32 | 262144 | 1 | 0,0 |
| Time | 22 | Time | 32 | 127 | 0.01 | 0,0 |
You can also do a prerun to inspect markers after standardization and curation (this won’t modify the input adata):
def curate_function(adata):
pass
cv.inspect_markers(fcs_path, standardize=True, curate_function=curate_function)
! using default organism = human
! using default organism = human
✓ standardized 4 markers: {'CD14': 'Cd14', 'CD19': 'Cd19', 'CD4': 'Cd4', 'HLA-DR': 'HLADR'}
! found 1 new marker: {'LD'}
→ add marker manually to the registry or set add_new=True
→ validated: ['Cd14', 'CD66b', 'Cd19', 'CD1c', 'CD203c', 'CD8', 'CD45', 'CD16', 'Cd4', 'HLADR', 'Siglec8', 'CD11c', 'CD3', 'CD56']
! non-validated: ['LD']
For the LD (live/dead) channel, we can register it by passing adda_new=True:
(Or manually via: bt.CellMarker(name="LD").save())
cv.standardize_markers(adata, add_new=True)
! using default organism = human
! using default organism = human
✓ added 1 new marker: {'LD'}
Map and add synonyms of cell markers¶
Now you can inspect cell markers again:
cv.inspect_markers(adata)
→ All markers are validated!
If new synonyms come up, add them to the corresponding record:
ld_record = bt.CellMarker.filter(name="LD").one()
ld_record.add_synonym("live/dead")
Our public reference: the CellMarker ontology¶
You can look up and search cell markers from the Bionty public reference
Let’s load the public CellMarker ontology:
public = bt.CellMarker.public()
public
PublicOntology
Entity: CellMarker
Organism: human
Source: cellmarker, 2.0
#terms: 15466
Reference table of the public ontology¶
The underlying table has 15k terms:
df = public.df()
df.shape
(15466, 5)
Here are the first few of them:
df.head()
| name | synonyms | gene_symbol | ncbi_gene_id | uniprotkb_id | |
|---|---|---|---|---|---|
| 0 | A1BG | A1BG | 1 | P04217 | |
| 1 | A2M | A2M | 3494 | None | |
| 2 | A2ML1 | A2ML1 | 144568 | A8K2U0 | |
| 3 | A4GALT | A4GALT | 53947 | A0A0S2Z5J1 | |
| 4 | AADAC | AADAC | 13 | P22760 |
Look up or search a cell marker in the public reference¶
public.search("ccr7").head()
| name | synonyms | gene_symbol | ncbi_gene_id | uniprotkb_id | |
|---|---|---|---|---|---|
| 1817 | Ccr7 | CCR7 | 1236 | P32248 | |
| 1818 | CD197 | CCR7 | 1236 | P32248 |
You can look up a specific cell marker using autocompletion on a Lookup object:
lookup = public.lookup()
lookup.ccr7
CellMarker(name='Ccr7', synonyms='', gene_symbol='CCR7', ncbi_gene_id='1236', uniprotkb_id='P32248')
Or using a dictionary:
lookup.dict()["Ccr7"]
CellMarker(name='Ccr7', synonyms='', gene_symbol='CCR7', ncbi_gene_id='1236', uniprotkb_id='P32248')