006 · hm3_snplist

Bioinformatics

Source

HapMap3 SNP allele list from the Zenodo record 10.5281/zenodo.7773502.

The source file is the gzipped w_hm3.snplist.gz copy of the Alkes Group LDSC HapMap3 SNP list. The dataset is arranged at variant resolution: one row per SNP.

The local generation script downloads the Zenodo gzip with evanverse::download_url(), removes the source header row, and writes the CSV used by this site.

Overview

Field Value
Category bioinformatics
Rows 1,217,311
Columns 3
Key variable `snp`
File `toy/006_hm3_snplist.qmd`

Variables

Column Description
snp dbSNP rs identifier
a1 First allele in the source list
a2 Second allele in the source list

Preview

Head Rows

snp a1 a2
rs3094315 G A
rs3131972 A G
rs3131969 A G
rs1048488 C T
rs3115850 T C
rs2286139 C T
rs12562034 A G
rs4040617 G A
rs2980300 T C
rs2519031 G A
rs4970383 A C
rs4475691 T C

Allele Counts

allele count
A 608004
C 609517
G 607794
T 609307

Allele Pair Counts

a1 a2 snps
T C 265756
A G 264726
C T 229683
G A 229200
A C 59852
T G 59630
G T 54238
C A 54226

Identifier Checks

Check Value
Unique SNP IDs 1,217,311
Duplicated SNP IDs 0
Allele alphabet A, C, G, T

Plot Notes

Plot structure Use
Ranked bar chart Count SNPs by allele pair
Tile heatmap Show a1 by a2 allele-pair frequencies
QC table Confirm unique SNP IDs and allele alphabet

Convert to Snplist

LDSC-style w_hm3.snplist files are plain text tables with three columns and no header. To recreate that format from the CSV, write snp, a1, and a2 as tab-separated columns without row names, quotes, or a header line.

R

hm3 <- read.csv("assets/toy/bioinformatics/hm3_snplist.csv")

write.table(
  hm3[c("snp", "a1", "a2")],
  file = "w_hm3.snplist",
  sep = "\t",
  row.names = FALSE,
  col.names = FALSE,
  quote = FALSE,
  na = ""
)

Python

import pandas as pd

hm3 = pd.read_csv("assets/toy/bioinformatics/hm3_snplist.csv")

hm3[["snp", "a1", "a2"]].to_csv(
    "w_hm3.snplist",
    sep="\t",
    index=False,
    header=False,
)

Download

hm3_snplist.csv