How to sequence your DNA for <$2k

maxlangenkamp.substack.com

157 points by yichab0d 12 hours ago

teekert 10 hours ago

We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.

Yes it requires chopping the genome opening small(er) pieces (than with Nanopore sequencing) and then reconstructing the genome based on a reference (and this has its issues). But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).

Nanopore devices are truly cool, small and comparatively cheap though, and you can compensate for the error rate by just sequence everything multiple times. I’m not too familiar with the economics of this approach though.

With sbs technology you could probably sequence your whole genome 30 times (a normal “coverage”) for below 1000€/$ with a reputable company. I’ve seen 180$, but not sure if I’d trust that.

Metacelsus 10 hours ago

>you can compensate for the error rate by just sequence everything multiple times.
Usually, but sometimes the errors are correlated.
Overall I agree, short read sequencing is a lot more cost effective. Doing an Illumina whole genome sequence for cell line quality control (at my startup) costs $260 in total.
bonsai_spool 9 hours ago

> But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
There is no reason for Nanopore to supplant sequencing-by-synthesis for short reads - that's largely solved and getting cheaper all the while.
The future clinical utility will be in medium- and large-scale variation. We don't understand this in the clinical setting nearly as well as we understand SNPs. So Nanopore is being used in the research setting and to diagnose individuals with very rare genetic disorders.
(edit)
> We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
I also strongly disagree.
SBS is very reliable but it's common (if Toyota is the most popular car, does that mean we're in the Toyota internal combustion era? Or can Waymo still matter despite its small footprint?).
Novelty in sequencing is coming from ML approaches, RNA-DNA analysis, and combining long- and short-read technologies.
- teekert 9 hours ago
  
  I agree with you. Long reads lead to new insights and over time to better diagnoses by providing better understanding of large(r) scale aberrations, and as the tech gets better will be able to do so more easily. But is really not there yet. It’s mostly research and somehow it’s not really improving as much as hoped, I get the feeling.
Danjoe4 5 hours ago

Nanopore is good for hybrid sequencing. You can align the higher quality illumina reads against its longer contiguous reads
BobbyTables2 7 hours ago

I’ve always wondered how the reconstruction works.
It would be difficult to break a modest program into basic blocks and then reconstruct it. Same with paragraphs in a book.
How does this work with DNA?
- jakobnissen an hour ago
  
  There are two ways: Assembly by mapping and de Novo assembly.
  If you already have a human genome file, you can take each DNA piece and map it to its closest match in the genome. If you can cover the whole genome this way, you are done.
  The alternative way is to exploit overlaps between DNA fragments. If two 1000 bp pieces overlap with 900 basepairs, that's probably because they come from two 1000 regions of your genome that overlap by 900 baswpairs. You can then merge the pieces. By iteratively merging millions of fragments you can reconstruct the original genome.
  Both these approaches are surprisingly and delightfully deep computational problems that have been researched for decades.
- vintermann 2 hours ago
  
  They exploit the fact that so much of our DNA is the same. They basically have the book with no typos, or rather with only the typos they've decided to call canonical.
  So given a short sentence excerpt, even with a few errors thrown in, partial string matching is usually able to figure out where in the book it was likely from. Sometimes there may be more possibilities, but then you can look at overlaps and count how many times a particular variant appears in one context vs. another.
  One problem is, DNA contains a lot of copies and repetitive stretches, as if the book had "all work and no play makes jack a dull boy" repeated end to end for a couple of pages. Then it can hard to place where the variant actually is. Longer reads helps with this.
- bonsai_spool 6 hours ago
  
  This is very easily googled. There are new algorithmic advances for new kinds of sequencing data but this is the key (from the 70s)
  https://en.wikipedia.org/wiki/Burrows–Wheeler_transform
Onavo 8 hours ago

You can get it pretty damn cheap if you are willing to send your biological data overseas. Nebula genomics and a lot of other biotechs do this by essentially outsourcing to China. There's no particular technology secret, just cheaper labor and materials.
- vintermann 2 hours ago
  
  Can you trust it though? It'd be trivially easy to do a 1x read, maybe 2x, and then fake the other 28 reads. And it'd be hard to catch someone doing this without doing another 30x read from someone you trust. There's famously a lot of cheating in medical research, it would be odd if everyone stopped the moment they left academia (there have been scandals with forensic labs cheating too, now that I think about it).

Aurornis 11 hours ago

Interesting concept, but between the broken hardware and the way they gave up before getting anything useful this article was rather disappointing:

> Another problem was our flow cell was malfunctioning from the start — only 623 out of 2048 pores were working.

Is this normal for the machine? Is there a better write up somewhere where they didn’t give up immediately after one attempt?

jakobnissen 35 minutes ago

I suspect the authors read the number of active pores during sequencing and then wrongly assumed that the non-active ones had a manufacturing defect.
In my experience, most inactive pores are due to a poorly prepared sample. I don't know why, but maybe it blocks or jams the pores.
When I analyzed Oxford nano pore data a few years ago, I found it to be very sensitive to skilled sample preparation. The data quality varied so much that I could tell which of my laborant co-workers (the experienced one or the new one) had prepared the sample by analyzing the data. So I expect that the authors garage sample prep maybe wasn't great.
Coincidentally, I had a colleague who worked on building a portable sequencing lab powered by a car battery. The purpose was to be able to identify viruses by DNA from a van in rural Central Africa or wherever. Last I talked to her, the technical bottleneck was sample prep - the computational part of the van lab wasn't too hard.
homeless_engi 10 hours ago

Hi, believe it or not, I have actually done what the authors were attempting. I used saliva rather than blood as a source of DNA and extracted it using a Qiagen kit.
My Nanopore flow cell had nearly every pore working from the start. So I would say that is not normal. Maybe it was stored incorrectly.
- LolWolf 4 hours ago
  
  Do you have a write up somewhere? If not, it would be amazing if you wrote one!
  I was planning on doing a similar thing (also with saliva) once I finished moving in and had a bit more time after conferences. (But, of course, I’d have to go through and actually figure out all of the mechanics and so on.)
MillironX 6 hours ago

> Is this normal for the machine?
No, it's not "normal," but it is fairly common. When I worked in NGS, nearly 1/4 of flow cells were duds. ONT used to have a policy where you could return the cell and get a new one if it failed its self-test.
vintermann 2 hours ago

I think it was pretty interesting in a "what would likely happen if you tried this" way. Negative results are good. A lot of technical problems is what I'd expected though, from my little experience in genetic genealogy.
sbassi 11 hours ago

it depends of the sample. usually you have at least 1200, with a guaranteed of at least 800, so maybe he could ask for a refund.
refurb 6 hours ago

Like most analytical methods, the preparation of the sample is key. High quality output comes with careful sample prep so that the analytical process can run optimally.

ml_basics an hour ago

The graph at the beginning showing the cost of sequencing over time falling faster than Moore's law stops in 2015. Would love to see how things have progressed since then. Casually googling i only saw plots up to 2021 but looks to me like progress is now slower than Moore's law since ~2015. Maybe things will change when Nanopore gets more reliable

dust42 39 minutes ago

The graph also only starts in 2001. I worked as a student at the EMBL (European Molecular Biology Laboratory) in the bio-physics instrumentation group in the mid 90ies. The lab group was developing prototypes of thin-film electrophoresis DNA sequencers. Pharmacia Biotech then bought some of the tech and brought it to market. AFAIR at that time it were some of the fastest sequencers but we are talking of low 100s of base pairs per day.

dunk010 11 hours ago

Nebula and Dante will do this for like $300, and you can get 30x coverage at every base or even 100x coverage if you pay a little more. The $1000 genome was here more than a decade ago.

zaptheimpaler 11 hours ago

I wanted to try this, but I looked into Nebula a bit more.
Nebula is facing a class action for apparently disclosing detailed genomic data to Meta, Microsoft & Google. The subreddit is also full of reports of people who never received their results years after sending their kits back. There are also concerns about the quality of sequencing and false positives in all DTC genomics testing. Given what happened with 23andme as well and all of this stuff, I'm wary of sending my genetic data to any private company.
- mquander 10 hours ago
  
  I was interested to read this because some time ago I had my genome sequenced by Nebula. If you look at the lawsuit you can see that what Nebula did was use off-the-shelf third-party analytics products on their website, including recording analytics pings when users buy a kit, and pings when users use the Nebula website to browse Nebula's high-level analysis of their traits (leaking that the user has those traits to the analytics provider.)
  This behavior represents a contemptible lack of respect for users' privacy, but it's important to distinguish it from Nebula selling access to users' genomes.
  https://www.classaction.org/media/portillov-nebula-genomics-...
  - vintermann 2 hours ago
    
    Another point is that Wojicki's big idea that all this genetic data would be useful to sell to business, didn't work out so well. For an advertiser, it's a lot more useful to know if you're a smoker, than to know that you have a 40% higher chance of being a smoker.
  - zaptheimpaler 9 hours ago
    
    That's a good clarification. I read through some of that link, and it does look relatively benign - Meta & Google pixels might see when you buy a kit but nothing more, but on page 21 they directly leaked genetic information to Microsoft via their Clarity tracker. Not intentionally maybe, questionable if it can be linked to a person specifically instead of just an advertising ID but they did leak that. I think the lawsuit says that even disclosing whether a person has undergone genetic testing is in violation of GIPA, so the information they sent to all 3 is enough to violate that.
    I don't have any evidence they're selling anything but that lawsuit shows pretty sloppy behaviour for a company that should be thinking very deeply about privacy. I guess that's about what you said though :)
  - busterarm 9 hours ago
    
    The point isn't what they are doing with your data now, but that they retain your data and what might happen in the future. Someone with malicious designs on your DNA might buy Nebula tomorrow and there's nothing you can do about it.
    
    mquander 7 hours ago
    
    Actually, the main reason I used Nebula was that they advertised a credible-to-me promise that you could download and permanently delete your data upon request. That was some years ago, so I don't know if I would trust them today. But that was their claim, and I have no reason to believe they didn't delete my data.
    
    vintermann 2 hours ago
    
    That's a legal requirement in the EU and many US states. Some of the genetic genealogy companies actually play fast and loose with it though - not the deletion, which I trust, but the data portability and reasons to store PI parts.
- Aurornis 11 hours ago
  
  > There are also concerns about the quality of sequencing and false positives in all DTC genomics testing.
  Even when the raw results are accurate there is a cottage industry of consultants and snake-oil sellers pushing bad science based on genetic testing results.
  Outside of a few rare mutations, most people find their genetic testing results underwhelming or hard to interpret. Many of the SNPs come with mild correlations like “1.3X more likely to get this rare condition” which is extremely alarming to people who don’t understand that 1.3 times a very small number is still a very small number.
  The worst are the consultants and websites that take your files and claim to interpret everything about your life or illness based on a couple SNPs. Usually it’s the famous MTHFR variants, most of which have no actual impact on your life because they’re so common. Yet there are numerous Facebook groups and subreddits telling you to spend $100 on some automated website or consultant who will tell you that your MTHFR and COMT SNPs explain everything about you and your ills, along with which supplements you need to take (through their personal branded supplement web shop or affiliate links, of course).
- phyzome 11 hours ago
  
  Yeah, the only way I would ever do DNA sequencing is anonymously...
  - jjallen 9 hours ago
    
    Because of public family trees potentially linking a genome to a family, no dna is fully anonymous these days.
otherme123 27 minutes ago

Note the $2,000 bill includes the DNA extraction machinery and the sequencer itself. The sequencers that Nebula et al use are probably over 1 million $.
If you want to go even cheaper and depending of what you want, you can go for an exome instead of a WGS. And a lot of people are sequencing when they really want genotyping.
But I would not be surprised if someone is already getting $100 WGS.
sroussey 2 hours ago

What about sequencing.com?
freehorse 11 hours ago

Yeah but then basically somebody else gets ownership of your genetic data and gets the right to do anything with it in the context of their "legitimate interests". Not to mention to probability of that company getting hacked or sold, as it has already happened with some.
sbassi 11 hours ago

yes, the difference here is that the $1000 tag is "at-scale price". You reach that price point by running multiple sequencing with a set of reactive.
subroutine 10 hours ago

Does Nebula or Dante provide BAM or just VCF?
- Metacelsus 10 hours ago
  
  Both do. I got mine through Dante, my wife through Nebula.
- conradev 10 hours ago
  
  Dante includes a BAM

jasongill 11 hours ago

Unfortunately, the "MinION Starter Kit" for $1000 appears to no longer be available; the link in the article to the kit goes to a 404 page, and the cheapest MinION device with flow cells is now $4950 USD

jolmg 11 hours ago

Article was posted 2 days ago...
- greazy 10 hours ago
  
  The article author probably bought the starter kit a while ago. It might explain why the pore count was low. It's a biological product so it degrades over time.
- numpad0 9 hours ago
  
  These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug.
  The brochures always showed it next to a completely non-sterile laptop, but it never made sense. It's fundamentally a bio lab equipment, just small. You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
  - bonsai_spool 9 hours ago
    
    > The brochures always showed it next to a completely non-sterile laptop
    This can be done in the field (read near a lot of dirt). This does not require sterility at all. The main problems with this are keeping your prep clean (which is different from sterile; primarily involves not getting bubbles where they shouldn't be etc.) and temperature/salt handling.
    > These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug. > You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
    The consumable product is what needs to be stored carefully. Its delivered DNA-free; no disinfectant is needed. It's actually hard for accidental DNA to be introduced at the sequencing step; that would usually reflect poor practices earlier on.

arjie 6 hours ago

I used Nebula (seems to be rebranded and more expensive now) for my wife and me, and for my parents and brother, and it was pretty straightforward. I paid for the 'lifetime' plan but they removed it before we did it for anyone else and it was pretty reasonable. I downloaded the FASTQ files and stuck it in an R2 bucket for myself. Nebula cost about $250 and there's a monthly $50 or something plan that's compulsory but you can cancel it right away.

If you're curious about my genome, here are my VCF files https://my.pgp-hms.org/profile/hu81A8CC

If you want to indulge your curiosity some more:

     $ rg "20189511" /Users/george/tmp/genome/nebula_roshan_NG1AW8W7PU.mm2.sortdup.bqsr.hc.vcf
     3499829:chr13 20189511 rs104894396 C T 252.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.54;ClippingRankSum=0.00;DB;DP=25;ExcessHet=3.0103;FS=4.008;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=10.11;ReadPosRankSum=0.666;SOR=0.160 GT:AD:DP:GQ:PL 0/1:15,10:25:99:281,0,436

Put that into an LLM or look it up here https://www.snpedia.com/index.php/Rs104894396 to find out which pathogenic mutation I am heterozygous for.

In practice, when my wife and I did carrier screening we didn't do it with Nebula, but carrier screening also confirmed that we had GJB2-related hearing loss genes in common. The embryos of our prospective children were also sequenced so that we could have a child without the condition.

Anyway, if you'd like a test file of a real human to play with, there's mine (from Nebula) for you to take a look at. If you use an LLM you can have some fun looking at this stuff (you can see I'm a man because there are chrY variants in there).

I also used Dante because I wanted to compare the results of their sequencing and variant calling. Unfortunately, they have a different way to tie the sequence back to the user (you take the code they have and keep it safe, nebula has you put the stuff in a labeled container so it's already mapped by them) and I was in a hurry with other stuff. They never responded to me with any assistance on the subject - not even to refuse the request to get the code for that address - so I have no idea how they work.

The nanopore stuff is very cool, but I heard (on Twitter) there were quality control issues with the devices. I'd love to try it some time later just to line it up with my daughter's genome.

greazy 10 hours ago

The thermocycler replacement using an electric kettle is hilarious. Thats how old school dna amplification would happen before the invention of thermocyclers.

OP you'd get better results of you centrifuge your blood, extract the white blood cells and sequence those instead of whole blood. Thats a bit tricky with a lance and a tiny device though...

kyriakos 4 hours ago

What is the practical use of having your dna sequenced?

cmrx64 3 hours ago

it revealed I am comparatively insensitive to Ritalin and guided ADHD medication choices

optionalsquid 10 hours ago

It's cool that nanopore technologies are getting this affordable, but keep in mind that these technologies (to my knowledge) still have very high error rates compared older sequencing techniques. Both in terms of individual nucleotides (A, C, G, and Ts) being misread, but also in terms of stretches of nucleotides being mistakenly added to or removed from the resulting sequences (indels).

So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results

greazy 10 hours ago

With the recent R10 flow cells the error rate has improved. The basecalling models have also been steadily improving and therefore reducing the error rate.
For assembling a bacterial genome the consensus error rate is as low or in some cases better than Illumina.
Nanopore platform has its usecases that Illumina falls short on.
> So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results
Agreed, any at home sequencing should not be used to draw any conclusions.
Ovah 10 hours ago

That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
- optionalsquid 10 hours ago
  
  The author got less than 1x coverage for their efforts. To get the kind of coverage required for reliable base-calls, you need significantly higher coverage, and therefore a significantly higher spend
- bonsai_spool 9 hours ago
  
  > That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
  That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
  - jakobnissen 29 minutes ago
    
    I worked with Nanopore data about four years ago, and I found that that's mostly true, but for some reason at some sites, there was systematic errors where more than half of reads were wrong.
    I can't 100% prove it wasn't a legit mutation but our lab did several tests where we sequenced the same sample with both Illumina and Nanopore, and found Nanopore to be less than perfect even with exteme depth. Like, out depth was so high we routinely experienced overflow bugs in the assembly software because it stored the depth in a UInt16.
  - optionalsquid 9 hours ago
    
    > That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
    Whole-genome shotgun sequencing is pretty cheap these days.
    The person you are replying to doesn't give any specific numbers, but in my experience, you aim for 5-20x average coverage for population level studies, depending on the number of samples and what you are looking for, and 30x or higher for studies where individuals are important.
    For context, coverage refers to the (average) number of resulting DNA sequences that cover a given position in the target genome. Though there is of course variation in local coverage, regardless of your average coverage, and that can result in individual base-calls being being more or less reliable
    
    bonsai_spool 9 hours ago
    
    I’m referring to the experiment done in the OP - the most I’ve read about from an minION flow cell is 8 Gb (and this is from cell line preps with tons of DNA, so the coverage isn’t great).
    You need multiple flow cells or a higher capacity flow cell to get anything close to 1X on an unselected genome prep.
    Shotgun sequencing isn’t probably what you meant to say - this is all enzymatic or, if it’s sonicated, gets size selected.
    
    optionalsquid 8 hours ago
    
    What the person you replied to described read like short read sequencing with PCR amplification to me ("each segment of DNA is read many times over"), rather than nanopore sequencing. My reply to you was written based on that (possibly false) assumption.
    But if we are talking nanopore sequencing, then yes, you need multiple flowcells. Which is not a problem if you are not a private person attempting to sequence your own genome on the cheap
    
    bonsai_spool 7 hours ago
    
    There wasn’t enough information to tell (on my 1 minute scan) which nanopore kit was used, but the presence of PCR does not imply short reads.
    You can do nanopore PCR/cDNA workflows right up to the largest known mRNAs (13kb).
    Edit:
    I’m not sure if you’re saying that you can’t do a 5/20/30X genome on nanopore - that’s also not true. It only makes sense in particular research settings, of course.

FL33TW00D 9 hours ago

Dante and Nebula have a bad reputation. ySeq has an 8 month wait list. This guys Nanopore sequencer doesn’t work.

It is quite hard to get yourself sequenced in EU in 2025.

IceHegel 10 hours ago

Who can do this with good data controls? I don't want to have to dig through the fine print of some Terms of Service page to figure out if a sequencing company is going to save a copy of my genetic code for possible future use.

greazy 10 hours ago

I sequences my genome about 10 year's ago using illumina platform for ~1200AUD. We used a university sequencing facility. They were happy to extract and sequence the dna using a shotgun approach. Depth was 5x and I think we achieved about 90% coverage. It was just for fun.
The issue with this approach is that you'll receive raw data that needs to be processed. Even after processing you'll need to do further analysis to answer your questions. After all this, I'd be suspicious of the results and seek a medical councellor to discuss and perform further tests.
I'd advise on thinking what questions you want answered. 'Sequencing your genome' sounds amazing but imo you're better off with seeking accredited tests with acrionable results.
- otherme123 21 minutes ago
  
  The really difficult part of sequencing is after the Vcf. Before that is trivial, you can plug your fastq raw data in some Nextflow pipelines and in a couple of days of computing get plenty of "results" to be busy exploring for years.

coppa 11 hours ago

Speaking of which I would advise : Svante Pääbo Neanderthal Man: In Search of Lost Genomes then even better imho The Naked Neanderthal by Ludovic Slimak. After these books I spent many hours listening to the full courses of Jean-Jacques Hublin, chaire Paléoanthropologie in college de France ( in french but probably translatable now with automatic features ?). This was an unexpected and wonderful path.

nashashmi 10 hours ago

If I have my genome dna data, where can I get it analyzed? For ancestry? For health info? Etc. of course With privacy!

Real_S 8 hours ago

Take a look at Monadic DNA:
https://monadicdna.com/
They are building Fully Homomorphic Encryption (FHE) and Multiparty Computation (MPC) tools for genetic data. Your data format may need to be modified. They currently focus on the SNP results from places like Ancestry.
Some HN posts from their CEO:
https://news.ycombinator.com/submitted?id=vishakh82
isbvhodnvemrwvn 10 hours ago

Forget any use for ancestry with privacy guarantees. All you'll get is magic "ethnicity" percentages, kind of astrology of genealogy. For it to be useful in genealogy context you need to rely on matching and analyzing common ancestors, this will inherently lead to your data being shared in one way or another and possibly your identity being revealed.
cariaso 7 hours ago

https://patientuser.com

shevy-java 8 hours ago

Wasn't the cost a few years ago below 1000 already?

jaberjaber23 9 hours ago

Nanopore’s getting closer

7e 11 hours ago

Just wait for the Nebula Black Friday sale.

pixelpoet 11 hours ago

> ‘Sequencing by synthesis’. instead of chopping up and separating each base pair through a gel lattice, we [cuts off]

> 200 µL of blood (about ⅕ of a ml)

"About"? Anyway, thanks for the clarification.

NuclearPM 10 hours ago

Maybe the “about” was supposed to cover the 200 µL as well.