We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
Yes it requires chopping the genome opening small(er) pieces (than with Nanopore sequencing) and then reconstructing the genome based on a reference (and this has its issues). But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
Nanopore devices are truly cool, small and comparatively cheap though, and you can compensate for the error rate by just sequence everything multiple times. I’m not too familiar with the economics of this approach though.
With sbs technology you could probably sequence your whole genome 30 times (a normal “coverage”) for below 1000€/$ with a reputable company. I’ve seen 180$, but not sure if I’d trust that.
>you can compensate for the error rate by just sequence everything multiple times.
Usually, but sometimes the errors are correlated.
Overall I agree, short read sequencing is a lot more cost effective. Doing an Illumina whole genome sequence for cell line quality control (at my startup) costs $260 in total.
> But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
There is no reason for Nanopore to supplant sequencing-by-synthesis for short reads - that's largely solved and getting cheaper all the while.
The future clinical utility will be in medium- and large-scale variation. We don't understand this in the clinical setting nearly as well as we understand SNPs. So Nanopore is being used in the research setting and to diagnose individuals with very rare genetic disorders.
(edit)
> We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
I also strongly disagree.
SBS is very reliable but it's common (if Toyota is the most popular car, does that mean we're in the Toyota internal combustion era? Or can Waymo still matter despite its small footprint?).
Novelty in sequencing is coming from ML approaches, RNA-DNA analysis, and combining long- and short-read technologies.
I agree with you. Long reads lead to new insights and over time to better diagnoses by providing better understanding of large(r) scale aberrations, and as the tech gets better will be able to do so more easily. But is really not there yet. It’s mostly research and somehow it’s not really improving as much as hoped, I get the feeling.
There are two ways: Assembly by mapping and de Novo assembly.
If you already have a human genome file, you can take each DNA piece and map it to its closest match in the genome. If you can cover the whole genome this way, you are done.
The alternative way is to exploit overlaps between DNA fragments. If two 1000 bp pieces overlap with 900 basepairs, that's probably because they come from two 1000 regions of your genome that overlap by 900 baswpairs. You can then merge the pieces. By iteratively merging millions of fragments you can reconstruct the original genome.
Both these approaches are surprisingly and delightfully deep computational problems that have been researched for decades.
They exploit the fact that so much of our DNA is the same. They basically have the book with no typos, or rather with only the typos they've decided to call canonical.
So given a short sentence excerpt, even with a few errors thrown in, partial string matching is usually able to figure out where in the book it was likely from. Sometimes there may be more possibilities, but then you can look at overlaps and count how many times a particular variant appears in one context vs. another.
One problem is, DNA contains a lot of copies and repetitive stretches, as if the book had "all work and no play makes jack a dull boy" repeated end to end for a couple of pages. Then it can hard to place where the variant actually is. Longer reads helps with this.
You can get it pretty damn cheap if you are willing to send your biological data overseas. Nebula genomics and a lot of other biotechs do this by essentially outsourcing to China. There's no particular technology secret, just cheaper labor and materials.
Can you trust it though? It'd be trivially easy to do a 1x read, maybe 2x, and then fake the other 28 reads. And it'd be hard to catch someone doing this without doing another 30x read from someone you trust. There's famously a lot of cheating in medical research, it would be odd if everyone stopped the moment they left academia (there have been scandals with forensic labs cheating too, now that I think about it).
I suspect the authors read the number of active pores during sequencing and then wrongly assumed that the non-active ones had a manufacturing defect.
In my experience, most inactive pores are due to a poorly prepared sample. I don't know why, but maybe it blocks or jams the pores.
When I analyzed Oxford nano pore data a few years ago, I found it to be very sensitive to skilled sample preparation. The data quality varied so much that I could tell which of my laborant co-workers (the experienced one or the new one) had prepared the sample by analyzing the data. So I expect that the authors garage sample prep maybe wasn't great.
Coincidentally, I had a colleague who worked on building a portable sequencing lab powered by a car battery. The purpose was to be able to identify viruses by DNA from a van in rural Central Africa or wherever. Last I talked to her, the technical bottleneck was sample prep - the computational part of the van lab wasn't too hard.
Hi, believe it or not, I have actually done what the authors were attempting. I used saliva rather than blood as a source of DNA and extracted it using a Qiagen kit.
My Nanopore flow cell had nearly every pore working from the start. So I would say that is not normal. Maybe it was stored incorrectly.
Do you have a write up somewhere? If not, it would be amazing if you wrote one!
I was planning on doing a similar thing (also with saliva) once I finished moving in and had a bit more time after conferences. (But, of course, I’d have to go through and actually figure out all of the mechanics and so on.)
No, it's not "normal," but it is fairly common. When I worked in NGS, nearly 1/4 of flow cells were duds. ONT used to have a policy where you could return the cell and get a new one if it failed its self-test.
I think it was pretty interesting in a "what would likely happen if you tried this" way. Negative results are good. A lot of technical problems is what I'd expected though, from my little experience in genetic genealogy.
Like most analytical methods, the preparation of the sample is key. High quality output comes with careful sample prep so that the analytical process can run optimally.
The graph at the beginning showing the cost of sequencing over time falling faster than Moore's law stops in 2015. Would love to see how things have progressed since then. Casually googling i only saw plots up to 2021 but looks to me like progress is now slower than Moore's law since ~2015. Maybe things will change when Nanopore gets more reliable
The graph also only starts in 2001. I worked as a student at the EMBL (European Molecular Biology Laboratory) in the bio-physics instrumentation group in the mid 90ies. The lab group was developing prototypes of thin-film electrophoresis DNA sequencers. Pharmacia Biotech then bought some of the tech and brought it to market. AFAIR at that time it were some of the fastest sequencers but we are talking of low 100s of base pairs per day.
Nebula and Dante will do this for like $300, and you can get 30x coverage at every base or even 100x coverage if you pay a little more. The $1000 genome was here more than a decade ago.
I wanted to try this, but I looked into Nebula a bit more.
Nebula is facing a class action for apparently disclosing detailed genomic data to Meta, Microsoft & Google. The subreddit is also full of reports of people who never received their results years after sending their kits back. There are also concerns about the quality of sequencing and false positives in all DTC genomics testing. Given what happened with 23andme as well and all of this stuff, I'm wary of sending my genetic data to any private company.
I was interested to read this because some time ago I had my genome sequenced by Nebula. If you look at the lawsuit you can see that what Nebula did was use off-the-shelf third-party analytics products on their website, including recording analytics pings when users buy a kit, and pings when users use the Nebula website to browse Nebula's high-level analysis of their traits (leaking that the user has those traits to the analytics provider.)
This behavior represents a contemptible lack of respect for users' privacy, but it's important to distinguish it from Nebula selling access to users' genomes.
Another point is that Wojicki's big idea that all this genetic data would be useful to sell to business, didn't work out so well. For an advertiser, it's a lot more useful to know if you're a smoker, than to know that you have a 40% higher chance of being a smoker.
That's a good clarification. I read through some of that link, and it does look relatively benign - Meta & Google pixels might see when you buy a kit but nothing more, but on page 21 they directly leaked genetic information to Microsoft via their Clarity tracker. Not intentionally maybe, questionable if it can be linked to a person specifically instead of just an advertising ID but they did leak that. I think the lawsuit says that even disclosing whether a person has undergone genetic testing is in violation of GIPA, so the information they sent to all 3 is enough to violate that.
I don't have any evidence they're selling anything but that lawsuit shows pretty sloppy behaviour for a company that should be thinking very deeply about privacy. I guess that's about what you said though :)
The point isn't what they are doing with your data now, but that they retain your data and what might happen in the future. Someone with malicious designs on your DNA might buy Nebula tomorrow and there's nothing you can do about it.
Actually, the main reason I used Nebula was that they advertised a credible-to-me promise that you could download and permanently delete your data upon request. That was some years ago, so I don't know if I would trust them today. But that was their claim, and I have no reason to believe they didn't delete my data.
That's a legal requirement in the EU and many US states. Some of the genetic genealogy companies actually play fast and loose with it though - not the deletion, which I trust, but the data portability and reasons to store PI parts.
> There are also concerns about the quality of sequencing and false positives in all DTC genomics testing.
Even when the raw results are accurate there is a cottage industry of consultants and snake-oil sellers pushing bad science based on genetic testing results.
Outside of a few rare mutations, most people find their genetic testing results underwhelming or hard to interpret. Many of the SNPs come with mild correlations like “1.3X more likely to get this rare condition” which is extremely alarming to people who don’t understand that 1.3 times a very small number is still a very small number.
The worst are the consultants and websites that take your files and claim to interpret everything about your life or illness based on a couple SNPs. Usually it’s the famous MTHFR variants, most of which have no actual impact on your life because they’re so common. Yet there are numerous Facebook groups and subreddits telling you to spend $100 on some automated website or consultant who will tell you that your MTHFR and COMT SNPs explain everything about you and your ills, along with which supplements you need to take (through their personal branded supplement web shop or affiliate links, of course).
Note the $2,000 bill includes the DNA extraction machinery and the sequencer itself. The sequencers that Nebula et al use are probably over 1 million $.
If you want to go even cheaper and depending of what you want, you can go for an exome instead of a WGS. And a lot of people are sequencing when they really want genotyping.
But I would not be surprised if someone is already getting $100 WGS.
Yeah but then basically somebody else gets ownership of your genetic data and gets the right to do anything with it in the context of their "legitimate interests". Not to mention to probability of that company getting hacked or sold, as it has already happened with some.
Unfortunately, the "MinION Starter Kit" for $1000 appears to no longer be available; the link in the article to the kit goes to a 404 page, and the cheapest MinION device with flow cells is now $4950 USD
The article author probably bought the starter kit a while ago. It might explain why the pore count was low. It's a biological product so it degrades over time.
These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug.
The brochures always showed it next to a completely non-sterile laptop, but it never made sense. It's fundamentally a bio lab equipment, just small. You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
> The brochures always showed it next to a completely non-sterile laptop
This can be done in the field (read near a lot of dirt). This does not require sterility at all. The main problems with this are keeping your prep clean (which is different from sterile; primarily involves not getting bubbles where they shouldn't be etc.) and temperature/salt handling.
> These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug.
> You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
The consumable product is what needs to be stored carefully. Its delivered DNA-free; no disinfectant is needed. It's actually hard for accidental DNA to be introduced at the sequencing step; that would usually reflect poor practices earlier on.
I used Nebula (seems to be rebranded and more expensive now) for my wife and me, and for my parents and brother, and it was pretty straightforward. I paid for the 'lifetime' plan but they removed it before we did it for anyone else and it was pretty reasonable. I downloaded the FASTQ files and stuck it in an R2 bucket for myself. Nebula cost about $250 and there's a monthly $50 or something plan that's compulsory but you can cancel it right away.
In practice, when my wife and I did carrier screening we didn't do it with Nebula, but carrier screening also confirmed that we had GJB2-related hearing loss genes in common. The embryos of our prospective children were also sequenced so that we could have a child without the condition.
Anyway, if you'd like a test file of a real human to play with, there's mine (from Nebula) for you to take a look at. If you use an LLM you can have some fun looking at this stuff (you can see I'm a man because there are chrY variants in there).
I also used Dante because I wanted to compare the results of their sequencing and variant calling. Unfortunately, they have a different way to tie the sequence back to the user (you take the code they have and keep it safe, nebula has you put the stuff in a labeled container so it's already mapped by them) and I was in a hurry with other stuff. They never responded to me with any assistance on the subject - not even to refuse the request to get the code for that address - so I have no idea how they work.
The nanopore stuff is very cool, but I heard (on Twitter) there were quality control issues with the devices. I'd love to try it some time later just to line it up with my daughter's genome.
The thermocycler replacement using an electric kettle is hilarious. Thats how old school dna amplification would happen before the invention of thermocyclers.
OP you'd get better results of you centrifuge your blood, extract the white blood cells and sequence those instead of whole blood. Thats a bit tricky with a lance and a tiny device though...
It's cool that nanopore technologies are getting this affordable, but keep in mind that these technologies (to my knowledge) still have very high error rates compared older sequencing techniques. Both in terms of individual nucleotides (A, C, G, and Ts) being misread, but also in terms of stretches of nucleotides being mistakenly added to or removed from the resulting sequences (indels).
So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results
With the recent R10 flow cells the error rate has improved. The basecalling models have also been steadily improving and therefore reducing the error rate.
For assembling a bacterial genome the consensus error rate is as low or in some cases better than Illumina.
Nanopore platform has its usecases that Illumina falls short on.
> So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results
Agreed, any at home sequencing should not be used to draw any conclusions.
That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
The author got less than 1x coverage for their efforts. To get the kind of coverage required for reliable base-calls, you need significantly higher coverage, and therefore a significantly higher spend
> That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
I worked with Nanopore data about four years ago, and I found that that's mostly true, but for some reason at some sites, there was systematic errors where more than half of reads were wrong.
I can't 100% prove it wasn't a legit mutation but our lab did several tests where we sequenced the same sample with both Illumina and Nanopore, and found Nanopore to be less than perfect even with exteme depth. Like, out depth was so high we routinely experienced overflow bugs in the assembly software because it stored the depth in a UInt16.
> That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
Whole-genome shotgun sequencing is pretty cheap these days.
The person you are replying to doesn't give any specific numbers, but in my experience, you aim for 5-20x average coverage for population level studies, depending on the number of samples and what you are looking for, and 30x or higher for studies where individuals are important.
For context, coverage refers to the (average) number of resulting DNA sequences that cover a given position in the target genome. Though there is of course variation in local coverage, regardless of your average coverage, and that can result in individual base-calls being being more or less reliable
I’m referring to the experiment done in the OP - the most I’ve read about from an minION flow cell is 8 Gb (and this is from cell line preps with tons of DNA, so the coverage isn’t great).
You need multiple flow cells or a higher capacity flow cell to get anything close to 1X on an unselected genome prep.
Shotgun sequencing isn’t probably what you meant to say - this is all enzymatic or, if it’s sonicated, gets size selected.
What the person you replied to described read like short read sequencing with PCR amplification to me ("each segment of DNA is read many times over"), rather than nanopore sequencing. My reply to you was written based on that (possibly false) assumption.
But if we are talking nanopore sequencing, then yes, you need multiple flowcells. Which is not a problem if you are not a private person attempting to sequence your own genome on the cheap
There wasn’t enough information to tell (on my 1 minute scan) which nanopore kit was used, but the presence of PCR does not imply short reads.
You can do nanopore PCR/cDNA workflows right up to the largest known mRNAs (13kb).
Edit:
I’m not sure if you’re saying that you can’t do a 5/20/30X genome on nanopore - that’s also not true. It only makes sense in particular research settings, of course.
Who can do this with good data controls? I don't want to have to dig through the fine print of some Terms of Service page to figure out if a sequencing company is going to save a copy of my genetic code for possible future use.
I sequences my genome about 10 year's ago using illumina platform for ~1200AUD. We used a university sequencing facility. They were happy to extract and sequence the dna using a shotgun approach. Depth was 5x and I think we achieved about 90% coverage. It was just for fun.
The issue with this approach is that you'll receive raw data that needs to be processed. Even after processing you'll need to do further analysis to answer your questions. After all this, I'd be suspicious of the results and seek a medical councellor to discuss and perform further tests.
I'd advise on thinking what questions you want answered. 'Sequencing your genome' sounds amazing but imo you're better off with seeking accredited tests with acrionable results.
The really difficult part of sequencing is after the Vcf. Before that is trivial, you can plug your fastq raw data in some Nextflow pipelines and in a couple of days of computing get plenty of "results" to be busy exploring for years.
Speaking of which I would advise : Svante Pääbo
Neanderthal Man: In Search of Lost Genomes then even better imho The Naked Neanderthal by Ludovic Slimak. After these books I spent many hours listening to the full courses of Jean-Jacques Hublin, chaire Paléoanthropologie in college de France ( in french but probably translatable now with automatic features ?). This was an unexpected and wonderful path.
They are building Fully Homomorphic Encryption (FHE) and Multiparty Computation (MPC) tools for genetic data. Your data format may need to be modified. They currently focus on the SNP results from places like Ancestry.
Forget any use for ancestry with privacy guarantees. All you'll get is magic "ethnicity" percentages, kind of astrology of genealogy. For it to be useful in genealogy context you need to rely on matching and analyzing common ancestors, this will inherently lead to your data being shared in one way or another and possibly your identity being revealed.
We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
Yes it requires chopping the genome opening small(er) pieces (than with Nanopore sequencing) and then reconstructing the genome based on a reference (and this has its issues). But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
Nanopore devices are truly cool, small and comparatively cheap though, and you can compensate for the error rate by just sequence everything multiple times. I’m not too familiar with the economics of this approach though.
With sbs technology you could probably sequence your whole genome 30 times (a normal “coverage”) for below 1000€/$ with a reputable company. I’ve seen 180$, but not sure if I’d trust that.
>you can compensate for the error rate by just sequence everything multiple times.
Usually, but sometimes the errors are correlated.
Overall I agree, short read sequencing is a lot more cost effective. Doing an Illumina whole genome sequence for cell line quality control (at my startup) costs $260 in total.
> But Nanopore sequencing is still far from perfect due to its high error rate. Any clinical sequencing is still done using sequencing by synthesis (at which Illumina has gotten very good over the past decade).
There is no reason for Nanopore to supplant sequencing-by-synthesis for short reads - that's largely solved and getting cheaper all the while.
The future clinical utility will be in medium- and large-scale variation. We don't understand this in the clinical setting nearly as well as we understand SNPs. So Nanopore is being used in the research setting and to diagnose individuals with very rare genetic disorders.
(edit)
> We are not “in the nanopore era of sequencing”. We are (still) firmly in the sequencing by synthesis era.
I also strongly disagree.
SBS is very reliable but it's common (if Toyota is the most popular car, does that mean we're in the Toyota internal combustion era? Or can Waymo still matter despite its small footprint?).
Novelty in sequencing is coming from ML approaches, RNA-DNA analysis, and combining long- and short-read technologies.
I agree with you. Long reads lead to new insights and over time to better diagnoses by providing better understanding of large(r) scale aberrations, and as the tech gets better will be able to do so more easily. But is really not there yet. It’s mostly research and somehow it’s not really improving as much as hoped, I get the feeling.
Nanopore is good for hybrid sequencing. You can align the higher quality illumina reads against its longer contiguous reads
I’ve always wondered how the reconstruction works.
It would be difficult to break a modest program into basic blocks and then reconstruct it. Same with paragraphs in a book.
How does this work with DNA?
There are two ways: Assembly by mapping and de Novo assembly.
If you already have a human genome file, you can take each DNA piece and map it to its closest match in the genome. If you can cover the whole genome this way, you are done.
The alternative way is to exploit overlaps between DNA fragments. If two 1000 bp pieces overlap with 900 basepairs, that's probably because they come from two 1000 regions of your genome that overlap by 900 baswpairs. You can then merge the pieces. By iteratively merging millions of fragments you can reconstruct the original genome.
Both these approaches are surprisingly and delightfully deep computational problems that have been researched for decades.
They exploit the fact that so much of our DNA is the same. They basically have the book with no typos, or rather with only the typos they've decided to call canonical.
So given a short sentence excerpt, even with a few errors thrown in, partial string matching is usually able to figure out where in the book it was likely from. Sometimes there may be more possibilities, but then you can look at overlaps and count how many times a particular variant appears in one context vs. another.
One problem is, DNA contains a lot of copies and repetitive stretches, as if the book had "all work and no play makes jack a dull boy" repeated end to end for a couple of pages. Then it can hard to place where the variant actually is. Longer reads helps with this.
This is very easily googled. There are new algorithmic advances for new kinds of sequencing data but this is the key (from the 70s)
https://en.wikipedia.org/wiki/Burrows–Wheeler_transform
You can get it pretty damn cheap if you are willing to send your biological data overseas. Nebula genomics and a lot of other biotechs do this by essentially outsourcing to China. There's no particular technology secret, just cheaper labor and materials.
Can you trust it though? It'd be trivially easy to do a 1x read, maybe 2x, and then fake the other 28 reads. And it'd be hard to catch someone doing this without doing another 30x read from someone you trust. There's famously a lot of cheating in medical research, it would be odd if everyone stopped the moment they left academia (there have been scandals with forensic labs cheating too, now that I think about it).
Interesting concept, but between the broken hardware and the way they gave up before getting anything useful this article was rather disappointing:
> Another problem was our flow cell was malfunctioning from the start — only 623 out of 2048 pores were working.
Is this normal for the machine? Is there a better write up somewhere where they didn’t give up immediately after one attempt?
I suspect the authors read the number of active pores during sequencing and then wrongly assumed that the non-active ones had a manufacturing defect.
In my experience, most inactive pores are due to a poorly prepared sample. I don't know why, but maybe it blocks or jams the pores.
When I analyzed Oxford nano pore data a few years ago, I found it to be very sensitive to skilled sample preparation. The data quality varied so much that I could tell which of my laborant co-workers (the experienced one or the new one) had prepared the sample by analyzing the data. So I expect that the authors garage sample prep maybe wasn't great.
Coincidentally, I had a colleague who worked on building a portable sequencing lab powered by a car battery. The purpose was to be able to identify viruses by DNA from a van in rural Central Africa or wherever. Last I talked to her, the technical bottleneck was sample prep - the computational part of the van lab wasn't too hard.
Hi, believe it or not, I have actually done what the authors were attempting. I used saliva rather than blood as a source of DNA and extracted it using a Qiagen kit.
My Nanopore flow cell had nearly every pore working from the start. So I would say that is not normal. Maybe it was stored incorrectly.
Do you have a write up somewhere? If not, it would be amazing if you wrote one!
I was planning on doing a similar thing (also with saliva) once I finished moving in and had a bit more time after conferences. (But, of course, I’d have to go through and actually figure out all of the mechanics and so on.)
> Is this normal for the machine?
No, it's not "normal," but it is fairly common. When I worked in NGS, nearly 1/4 of flow cells were duds. ONT used to have a policy where you could return the cell and get a new one if it failed its self-test.
I think it was pretty interesting in a "what would likely happen if you tried this" way. Negative results are good. A lot of technical problems is what I'd expected though, from my little experience in genetic genealogy.
it depends of the sample. usually you have at least 1200, with a guaranteed of at least 800, so maybe he could ask for a refund.
Like most analytical methods, the preparation of the sample is key. High quality output comes with careful sample prep so that the analytical process can run optimally.
The graph at the beginning showing the cost of sequencing over time falling faster than Moore's law stops in 2015. Would love to see how things have progressed since then. Casually googling i only saw plots up to 2021 but looks to me like progress is now slower than Moore's law since ~2015. Maybe things will change when Nanopore gets more reliable
The graph also only starts in 2001. I worked as a student at the EMBL (European Molecular Biology Laboratory) in the bio-physics instrumentation group in the mid 90ies. The lab group was developing prototypes of thin-film electrophoresis DNA sequencers. Pharmacia Biotech then bought some of the tech and brought it to market. AFAIR at that time it were some of the fastest sequencers but we are talking of low 100s of base pairs per day.
Nebula and Dante will do this for like $300, and you can get 30x coverage at every base or even 100x coverage if you pay a little more. The $1000 genome was here more than a decade ago.
I wanted to try this, but I looked into Nebula a bit more.
Nebula is facing a class action for apparently disclosing detailed genomic data to Meta, Microsoft & Google. The subreddit is also full of reports of people who never received their results years after sending their kits back. There are also concerns about the quality of sequencing and false positives in all DTC genomics testing. Given what happened with 23andme as well and all of this stuff, I'm wary of sending my genetic data to any private company.
I was interested to read this because some time ago I had my genome sequenced by Nebula. If you look at the lawsuit you can see that what Nebula did was use off-the-shelf third-party analytics products on their website, including recording analytics pings when users buy a kit, and pings when users use the Nebula website to browse Nebula's high-level analysis of their traits (leaking that the user has those traits to the analytics provider.)
This behavior represents a contemptible lack of respect for users' privacy, but it's important to distinguish it from Nebula selling access to users' genomes.
https://www.classaction.org/media/portillov-nebula-genomics-...
Another point is that Wojicki's big idea that all this genetic data would be useful to sell to business, didn't work out so well. For an advertiser, it's a lot more useful to know if you're a smoker, than to know that you have a 40% higher chance of being a smoker.
That's a good clarification. I read through some of that link, and it does look relatively benign - Meta & Google pixels might see when you buy a kit but nothing more, but on page 21 they directly leaked genetic information to Microsoft via their Clarity tracker. Not intentionally maybe, questionable if it can be linked to a person specifically instead of just an advertising ID but they did leak that. I think the lawsuit says that even disclosing whether a person has undergone genetic testing is in violation of GIPA, so the information they sent to all 3 is enough to violate that.
I don't have any evidence they're selling anything but that lawsuit shows pretty sloppy behaviour for a company that should be thinking very deeply about privacy. I guess that's about what you said though :)
The point isn't what they are doing with your data now, but that they retain your data and what might happen in the future. Someone with malicious designs on your DNA might buy Nebula tomorrow and there's nothing you can do about it.
Actually, the main reason I used Nebula was that they advertised a credible-to-me promise that you could download and permanently delete your data upon request. That was some years ago, so I don't know if I would trust them today. But that was their claim, and I have no reason to believe they didn't delete my data.
That's a legal requirement in the EU and many US states. Some of the genetic genealogy companies actually play fast and loose with it though - not the deletion, which I trust, but the data portability and reasons to store PI parts.
> There are also concerns about the quality of sequencing and false positives in all DTC genomics testing.
Even when the raw results are accurate there is a cottage industry of consultants and snake-oil sellers pushing bad science based on genetic testing results.
Outside of a few rare mutations, most people find their genetic testing results underwhelming or hard to interpret. Many of the SNPs come with mild correlations like “1.3X more likely to get this rare condition” which is extremely alarming to people who don’t understand that 1.3 times a very small number is still a very small number.
The worst are the consultants and websites that take your files and claim to interpret everything about your life or illness based on a couple SNPs. Usually it’s the famous MTHFR variants, most of which have no actual impact on your life because they’re so common. Yet there are numerous Facebook groups and subreddits telling you to spend $100 on some automated website or consultant who will tell you that your MTHFR and COMT SNPs explain everything about you and your ills, along with which supplements you need to take (through their personal branded supplement web shop or affiliate links, of course).
Yeah, the only way I would ever do DNA sequencing is anonymously...
Because of public family trees potentially linking a genome to a family, no dna is fully anonymous these days.
Note the $2,000 bill includes the DNA extraction machinery and the sequencer itself. The sequencers that Nebula et al use are probably over 1 million $.
If you want to go even cheaper and depending of what you want, you can go for an exome instead of a WGS. And a lot of people are sequencing when they really want genotyping.
But I would not be surprised if someone is already getting $100 WGS.
What about sequencing.com?
Yeah but then basically somebody else gets ownership of your genetic data and gets the right to do anything with it in the context of their "legitimate interests". Not to mention to probability of that company getting hacked or sold, as it has already happened with some.
yes, the difference here is that the $1000 tag is "at-scale price". You reach that price point by running multiple sequencing with a set of reactive.
Does Nebula or Dante provide BAM or just VCF?
Both do. I got mine through Dante, my wife through Nebula.
Dante includes a BAM
Unfortunately, the "MinION Starter Kit" for $1000 appears to no longer be available; the link in the article to the kit goes to a 404 page, and the cheapest MinION device with flow cells is now $4950 USD
Article was posted 2 days ago...
The article author probably bought the starter kit a while ago. It might explain why the pore count was low. It's a biological product so it degrades over time.
These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug.
The brochures always showed it next to a completely non-sterile laptop, but it never made sense. It's fundamentally a bio lab equipment, just small. You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
> The brochures always showed it next to a completely non-sterile laptop
This can be done in the field (read near a lot of dirt). This does not require sterility at all. The main problems with this are keeping your prep clean (which is different from sterile; primarily involves not getting bubbles where they shouldn't be etc.) and temperature/salt handling.
> These are by no means a new product. I think the early prototypes for these possibly predate the microUSB plug. > You probably should be wiping the package with disinfectant, use DNA-cides as needed, or follow whatever bioscience people consider the basic common sense hygiene standards.
The consumable product is what needs to be stored carefully. Its delivered DNA-free; no disinfectant is needed. It's actually hard for accidental DNA to be introduced at the sequencing step; that would usually reflect poor practices earlier on.
I used Nebula (seems to be rebranded and more expensive now) for my wife and me, and for my parents and brother, and it was pretty straightforward. I paid for the 'lifetime' plan but they removed it before we did it for anyone else and it was pretty reasonable. I downloaded the FASTQ files and stuck it in an R2 bucket for myself. Nebula cost about $250 and there's a monthly $50 or something plan that's compulsory but you can cancel it right away.
If you're curious about my genome, here are my VCF files https://my.pgp-hms.org/profile/hu81A8CC
If you want to indulge your curiosity some more:
Put that into an LLM or look it up here https://www.snpedia.com/index.php/Rs104894396 to find out which pathogenic mutation I am heterozygous for.In practice, when my wife and I did carrier screening we didn't do it with Nebula, but carrier screening also confirmed that we had GJB2-related hearing loss genes in common. The embryos of our prospective children were also sequenced so that we could have a child without the condition.
Anyway, if you'd like a test file of a real human to play with, there's mine (from Nebula) for you to take a look at. If you use an LLM you can have some fun looking at this stuff (you can see I'm a man because there are chrY variants in there).
I also used Dante because I wanted to compare the results of their sequencing and variant calling. Unfortunately, they have a different way to tie the sequence back to the user (you take the code they have and keep it safe, nebula has you put the stuff in a labeled container so it's already mapped by them) and I was in a hurry with other stuff. They never responded to me with any assistance on the subject - not even to refuse the request to get the code for that address - so I have no idea how they work.
The nanopore stuff is very cool, but I heard (on Twitter) there were quality control issues with the devices. I'd love to try it some time later just to line it up with my daughter's genome.
The thermocycler replacement using an electric kettle is hilarious. Thats how old school dna amplification would happen before the invention of thermocyclers.
OP you'd get better results of you centrifuge your blood, extract the white blood cells and sequence those instead of whole blood. Thats a bit tricky with a lance and a tiny device though...
What is the practical use of having your dna sequenced?
it revealed I am comparatively insensitive to Ritalin and guided ADHD medication choices
It's cool that nanopore technologies are getting this affordable, but keep in mind that these technologies (to my knowledge) still have very high error rates compared older sequencing techniques. Both in terms of individual nucleotides (A, C, G, and Ts) being misread, but also in terms of stretches of nucleotides being mistakenly added to or removed from the resulting sequences (indels).
So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results
With the recent R10 flow cells the error rate has improved. The basecalling models have also been steadily improving and therefore reducing the error rate.
For assembling a bacterial genome the consensus error rate is as low or in some cases better than Illumina.
Nanopore platform has its usecases that Illumina falls short on.
> So, yes, you can sequence your genome relatively cheaply using these technologies at home, but you won't be able to draw any conclusions from the results
Agreed, any at home sequencing should not be used to draw any conclusions.
That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
The author got less than 1x coverage for their efforts. To get the kind of coverage required for reliable base-calls, you need significantly higher coverage, and therefore a significantly higher spend
> That's a prevalent misconception even in the scientific community. Sure, each read has 1% incorrect bases (0.01). But each segment of DNA is read many times over. More or less 0.01^(many times) ≈ 0 incorrect bases.
That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
I worked with Nanopore data about four years ago, and I found that that's mostly true, but for some reason at some sites, there was systematic errors where more than half of reads were wrong.
I can't 100% prove it wasn't a legit mutation but our lab did several tests where we sequenced the same sample with both Illumina and Nanopore, and found Nanopore to be less than perfect even with exteme depth. Like, out depth was so high we routinely experienced overflow bugs in the assembly software because it stored the depth in a UInt16.
> That's true in targeted sequencing, but when you try to sequence a whole genome, this is unlikely.
Whole-genome shotgun sequencing is pretty cheap these days.
The person you are replying to doesn't give any specific numbers, but in my experience, you aim for 5-20x average coverage for population level studies, depending on the number of samples and what you are looking for, and 30x or higher for studies where individuals are important.
For context, coverage refers to the (average) number of resulting DNA sequences that cover a given position in the target genome. Though there is of course variation in local coverage, regardless of your average coverage, and that can result in individual base-calls being being more or less reliable
I’m referring to the experiment done in the OP - the most I’ve read about from an minION flow cell is 8 Gb (and this is from cell line preps with tons of DNA, so the coverage isn’t great).
You need multiple flow cells or a higher capacity flow cell to get anything close to 1X on an unselected genome prep.
Shotgun sequencing isn’t probably what you meant to say - this is all enzymatic or, if it’s sonicated, gets size selected.
What the person you replied to described read like short read sequencing with PCR amplification to me ("each segment of DNA is read many times over"), rather than nanopore sequencing. My reply to you was written based on that (possibly false) assumption.
But if we are talking nanopore sequencing, then yes, you need multiple flowcells. Which is not a problem if you are not a private person attempting to sequence your own genome on the cheap
There wasn’t enough information to tell (on my 1 minute scan) which nanopore kit was used, but the presence of PCR does not imply short reads.
You can do nanopore PCR/cDNA workflows right up to the largest known mRNAs (13kb).
Edit:
I’m not sure if you’re saying that you can’t do a 5/20/30X genome on nanopore - that’s also not true. It only makes sense in particular research settings, of course.
Dante and Nebula have a bad reputation. ySeq has an 8 month wait list. This guys Nanopore sequencer doesn’t work.
It is quite hard to get yourself sequenced in EU in 2025.
Who can do this with good data controls? I don't want to have to dig through the fine print of some Terms of Service page to figure out if a sequencing company is going to save a copy of my genetic code for possible future use.
I sequences my genome about 10 year's ago using illumina platform for ~1200AUD. We used a university sequencing facility. They were happy to extract and sequence the dna using a shotgun approach. Depth was 5x and I think we achieved about 90% coverage. It was just for fun.
The issue with this approach is that you'll receive raw data that needs to be processed. Even after processing you'll need to do further analysis to answer your questions. After all this, I'd be suspicious of the results and seek a medical councellor to discuss and perform further tests.
I'd advise on thinking what questions you want answered. 'Sequencing your genome' sounds amazing but imo you're better off with seeking accredited tests with acrionable results.
The really difficult part of sequencing is after the Vcf. Before that is trivial, you can plug your fastq raw data in some Nextflow pipelines and in a couple of days of computing get plenty of "results" to be busy exploring for years.
Speaking of which I would advise : Svante Pääbo Neanderthal Man: In Search of Lost Genomes then even better imho The Naked Neanderthal by Ludovic Slimak. After these books I spent many hours listening to the full courses of Jean-Jacques Hublin, chaire Paléoanthropologie in college de France ( in french but probably translatable now with automatic features ?). This was an unexpected and wonderful path.
If I have my genome dna data, where can I get it analyzed? For ancestry? For health info? Etc. of course With privacy!
Take a look at Monadic DNA:
https://monadicdna.com/
They are building Fully Homomorphic Encryption (FHE) and Multiparty Computation (MPC) tools for genetic data. Your data format may need to be modified. They currently focus on the SNP results from places like Ancestry.
Some HN posts from their CEO:
https://news.ycombinator.com/submitted?id=vishakh82
Forget any use for ancestry with privacy guarantees. All you'll get is magic "ethnicity" percentages, kind of astrology of genealogy. For it to be useful in genealogy context you need to rely on matching and analyzing common ancestors, this will inherently lead to your data being shared in one way or another and possibly your identity being revealed.
https://patientuser.com
Wasn't the cost a few years ago below 1000 already?
Nanopore’s getting closer
Just wait for the Nebula Black Friday sale.
> ‘Sequencing by synthesis’. instead of chopping up and separating each base pair through a gel lattice, we [cuts off]
k
> 200 µL of blood (about ⅕ of a ml)
"About"? Anyway, thanks for the clarification.
Maybe the “about” was supposed to cover the 200 µL as well.