006 · hm3_snplist

Source

HapMap3 SNP allele list from the Zenodo record 10.5281/zenodo.7773502.

The source file is the gzipped w_hm3.snplist.gz copy of the Alkes Group LDSC HapMap3 SNP list. The dataset is arranged at variant resolution: one row per SNP.

The local generation script downloads the Zenodo gzip with evanverse::download_url(), removes the source header row, and writes the CSV used by this site.

Overview

Field	Value
Category	bioinformatics
Rows	1,217,311
Columns	3
Key variable	`snp`
File	`toy/006_hm3_snplist.qmd`

Variables

Column	Description
`snp`	dbSNP rs identifier
`a1`	First allele in the source list
`a2`	Second allele in the source list

Preview

Head Rows

snp	a1	a2
rs3094315	G	A
rs3131972	A	G
rs3131969	A	G
rs1048488	C	T
rs3115850	T	C
rs2286139	C	T
rs12562034	A	G
rs4040617	G	A
rs2980300	T	C
rs2519031	G	A
rs4970383	A	C
rs4475691	T	C

Allele Counts

allele	count
A	608004
C	609517
G	607794
T	609307

Allele Pair Counts

a1	a2	snps
T	C	265756
A	G	264726
C	T	229683
G	A	229200
A	C	59852
T	G	59630
G	T	54238
C	A	54226

Identifier Checks

Check	Value
Unique SNP IDs	1,217,311
Duplicated SNP IDs	0
Allele alphabet	A, C, G, T

Plot Notes

Plot structure	Use
Ranked bar chart	Count SNPs by allele pair
Tile heatmap	Show `a1` by `a2` allele-pair frequencies
QC table	Confirm unique SNP IDs and allele alphabet

Convert to Snplist

LDSC-style w_hm3.snplist files are plain text tables with three columns and no header. To recreate that format from the CSV, write snp, a1, and a2 as tab-separated columns without row names, quotes, or a header line.

R

hm3 <- read.csv("assets/toy/bioinformatics/hm3_snplist.csv")

write.table(
  hm3[c("snp", "a1", "a2")],
  file = "w_hm3.snplist",
  sep = "\t",
  row.names = FALSE,
  col.names = FALSE,
  quote = FALSE,
  na = ""
)

Python

import pandas as pd

hm3 = pd.read_csv("assets/toy/bioinformatics/hm3_snplist.csv")

hm3[["snp", "a1", "a2"]].to_csv(
    "w_hm3.snplist",
    sep="\t",
    index=False,
    header=False,
)

Download

hm3_snplist.csv