tag:blogger.com,1999:blog-239886752010-03-15T17:25:15.240-07:00fejes.caAnthony's new blog can be found at http://blogs.nature.com/fejes
While writing this blog, Anthony was a PhD Candidate at the University of British Columbia working at the BC Cancer Agency's Genome Sciences Centre. His area of interest is in second generation sequencing applications and bioinformatics algorithm development.Anthony Fejesapfejes@gmail.comBlogger381125tag:blogger.com,1999:blog-23988675.post-32136716162838076582010-03-15T16:45:00.003-07:002010-03-15T17:04:21.368-07:00Last PostI'm not sure where "here" really is... but here we are, at the last post of my fejes.ca blog. My 391st blog post! That's actually not so bad for 3 and a half years, now that I think of it.<br /><br />Anyhow, with Google cutting off ftp publishing, it was time for a move to greener pastures, which in this case is Nature Blogs. I intend to continue discussing the same topics and to continue posting stuff relevant to next-gen sequencing.... or 2nd or 3rd generation, which ever one we're now in. I hope people who've been reading my blog come along and continue commenting at the new site. <br /><br />For the moment, the new site is a bit plain - Nature is still setting things up, and I'm sure it'll take some time for things to become comfortable over there. I'm going to miss my visitor map, my tag cloud and my many anonymous comments - but these are things I can push for at Nature. They've been fairly responsive to my requests, and hopefully Nature blogs will continue to evolve. Despite the plain looks, I'm expecting great things.<br /><br />Before I start waxing sentimental about leaving fejes.ca, I'll just invite you to come join me over at my new blog: <a href="http://blogs.nature.com/fejes">http://blogs.nature.com/fejes</a>.<br /><br />Finally, a few last words with which I would like to end this blog:<br /><br />Thanks to everyone who's read my blog, thanks to those who commented on my posts, and most especially, thanks to everyone who supported me and encouraged me to continue posting. All of which have meant a lot to me. (=<br /><br />Cheers,<br /><br />Anthony Fejes<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-3213671616283807658?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-35799147025274402832010-03-11T11:27:00.003-08:002010-03-11T11:45:40.969-08:00Wolfram Alpha recreates ensembl?Ok, this might not be my most coherent post - I'm finally getting better from being sick for a whole week, which has left my brain felling somewhat... spongy. Several of us the AGBT-ers have come down with something after getting back, and I have a theory that it was something in the food we were given.... Maybe someone slipped something into the food to slow down research at the GSC??? (-; [insert conspiracy theory here.]<br /><br />Anyhow, I just received a link [aka spam] from Wolfram Alpha, via a posting on linked in, letting me know all about their great new product: Wolfram Alpha now has genome information! <br /><br />Somehow, looking at their quick demo, I'm somewhat less than impressed. Here's the link, if you'd like to check it out yourself: <a href="http://blog.wolframalpha.com/2010/03/10/did-you-know-that-wolframalpha-knows-your-dna/">Wolfram Alpha Blog Post (Genome)</a><br /><br />I'm unimpressed for two reasons: the first is that there are TONS of other resources that do this - and apparently do it better, from the little I've seen on the blog. For the moment, they have 11 genomes in there, which they hope to expand in the future. I'm going to have to look more closely, if I find the motivation, as I might be missing something, but I really don't see much that I can't do in the <a href="http://genome.ucsc.edu/">UCSC genome browser</a> or the <a href="http://www.ensembl.org/">Ensembl</a> web page. The second thing is that I'm still unimpressed by Wolfram Alpha's insistence that it's more than just a search engine, and that if you use it to answer a question, you need to cite it.<br /><br />I'm all in favour of using really cool algorithms and searches are no exception. [I don't think I've mentioned this to anyone yet, but if you get a chance check out <a href="http://unlimiteddetailtechnology.com/">Unlimited Detail</a>'s use of search engine optimization to do <span style="font-weight: bold; font-style: italic;">unbelievable</span> 3D graphics in real time.] However, if you're going to send links boasting about what you can do with your technology, do something other people can't do - and be clear what it is. From what I can tell, this is just a mash-up meta analysis of a few small publicly available resources. It's not like we don't have other engines that do the same thing, so I'm wondering what it is that they think they do that makes it worth going there for... anyone?<br /><br />Worst of all, I'm not sure where they get their information from... where do they get their SNP calls from? How can you trust that, when you can't even trust dbSNP?<br /><br />Anyhow, for the moment, I'll keep using resources that I can cite specifically, instead of just citing Wolfram Alpha... I don't know how reviewers would take it if I cured cancer... and cited Wolfram as my source.<br /><br />Happy searching, people!<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-3579914702527440283?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com2tag:blogger.com,1999:blog-23988675.post-73700114274269575892010-03-01T14:54:00.003-08:002010-03-01T14:59:35.220-08:00Link to all my AGBT 2010 Notes.One last blog for today.<br /><br />If you're looking for the complete list of my AGBT 2010 notes, look no further. The link below has the full list of talks and workshops I attended. I haven't indexed it, but if you search for "AGBT 2010" within the page, it should take you to the next header/footer in reverse chronological order of the notes I took. Cheers!<br /><br /><a href="http://fejes.ca/labels/AGBT%202010.html">AGBT 2010 notes.</a><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-7370011427426957589?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com1tag:blogger.com,1999:blog-23988675.post-38285143427334265702010-03-01T13:31:00.003-08:002010-03-01T14:50:33.388-08:00AGBT wrap up.So, everyone else has weighed in with their reviews of AGBT 2010 already, and as usual, I'm probably one of the last to write anything down. Perhaps the extreme carpel tunnel syndrome I've exposed myself to by typing out my notes should suffice as an excuse...<br /><br />Anyhow, I wanted to put down a few thoughts on what I saw, heard and discussed before I forget what I wanted to say.<br /><br />First off, I know everyone has commented on the new technologies already. I'm very disappointed that I wasn't able to see the Ion Torrent presentation, and that I missed the presentation from Life Technologies. Those were two of the biggest hits, and I didn't see either of them. While I did get a quick introduction on the Life Technologies platform from a rep in the Life Tech suite, it's not quite the same.<br /><br />However, I was there for several of the other workshops and launches, and in particular, the Pacific Biosciences workshop. In general, I think Pac Bio has been served up a lot of criticism for failing to disclose the exact error rate of their Single Molecule Real-Time (SMRT) sequencing platform, as well as for some of the problems they face. Personally, I'm not inclined to think of any of that as a failure - simply as engineering problems. Having worked on early 454 data, there were flaws that were equally disastrous as the challenges that Pac Bio now faces. Much of the criticism is simply directed at the fact that this is measuring single molecules of DNA, and not clusters. Clearly, there are will be challenges for them to overcome: The most obvious are that PacBio will have to lower the wattage of their light source and they'll likely have to do some directed evolution (or even rational design) to lower the frequency at which bases are incorporated too quickly to be read, or possibly come up with a chemistry solution. (More viscous solutions? who knows.) All of the 2nd generation platforms were launched with problems - and Pac Bio certainly isn't the exception to it. Each one gets better over time, and I'm certain PacBio will continue to improve. For the moment, they've suggested protocols like sequencing circular DNA that dramatically reduces the error rate, these issues aren't nearly as big as the hype makes them out to be.<br /><br />Just to finish off on the subject of SMRT sequencing, I think Elaine Mardis' presentation on the results obtained with PacBio weren't outstanding. Normally, I get really jealous about PacBio results, and wish that I could get my hands on some of them - but this time, I was left a little flat. While there are really neat applications for single molecule sequencing, Human SNPs really aren't one of them. Why they chose to present that particular problem is somewhat beyond me. Not that the presentation was bad, but it failed to really showcase what the platform can be used for, IMHO. Their other presentations (SMRT Biology, for example), were pretty damn cool.<br /><br />There has also been much talk about Complete Genomics, and how they're not going to make it, which I've already written up in the previous post. I see that as a failure to understand their business model and to understand who they're competing with (ie, not the other sequencing companies.) I expect that they'll be the microarrays of the future - cheap diagnostic tools, with even better repeatability than your average microarray. I don't think they should be written off just quite yet.<br /><br />Finally, there has been much ado about the HiSeq 2000(tm), released by Illumina. While I have nothing against it (and am even looking forward to it), I don't see it as much except for an upgraded version of their last machine, the GAIIx. They've changed the form factor and the shape of the flow cell, and then enabled some things that were previously disabled (such as two sided tile scanning), it's really just an evolutionary change in a new box, which will allow them more room to grow the platform. Fair enough, really - I don't know how many more upgrades you could put into one of their original boxes, but there's nothing really new here that would have me running after them to get one. I should mention, however, that increased throughput and lower cost ARE significant and a good thing - they just don't appeal to my geeky fascination for new technology.<br /><br />Another criticism I heard was that these companies shouldn't be calling their tech "3rd generation." Frankly, I've been advocating since last year that they SHOULD be called 3rd generation, so that criticism seems silly, to say the least. Pyrosequencing is clearly synonymous with the 2nd generation of sequencing technologies, while Sanger sequencing is clearly first generation, and hybridization is kind of zero-th generation (although you could make a case for SOLiD being 2nd generation, which would also drag Complete Genomics into that group as well, then). However, the defining characteristic of 3rd generation, to me, is the move away from sequencing ensembles of molecules. An auxiliary definition is that it's also the application of enzymes to do the sequencing itself. So, I'm just going to have to laugh at those who claim that 2nd and 3rd generation are all generically "next-generation" sequencing. There is a clear boundary between the two sets of technologies.<br /><br />A topic I also wanted to mention was the use of technology at AGBT this year. Frankly, I was blown away by the coverage of all of the events through twitter. I enjoyed at least one talk where I left twitter open beside my text editor, and tried to keep notes while listening to the speaker had to say, while watching the audience's comments. If I hadn't been blogging, I think that would be the best way to engage. Insightful comments and questions were plentiful, and having people I respect discuss the topic was akin to having other scientists leave comments in the margins of a paper you're reading. [Somewhat like reading Sun Tsu's Art of War, where there are more annotations than original material, at some points.] Alas, it was too distracting to compile notes while reading comments, but it was really cool. Unfortunately, Internet coverage was spotty at best, and in some rooms, I wasn't able to get any signal at all. The venue is great, but just not equipped for the 21st century scientist. Had I been there at the end of the conference, I would have suggested that perhaps it's time to identify an alternate venue that can handle the larger crowds, as well as the technological demands of an audience that has 300+ laptop computers going at once. (Don't get me started on electrical outlets.)<br /><br />I'd like to end on a few good points. <br /><br />The poster session was excellent - too short, as always, but the quality of the posters were outstanding, and I had fantastic conversations with a lot of scientists. I won't mention them by name, but I'm sure they know who they are. I saw several tools I'll try to follow up on. (By the way, if anyone was looking for me, I spent less than 20 minutes by my poster throughout the conference. There just wasn't enough time to read all of them and still answer questions and absorb everything out there. Sorry about that - feel free to email me if you have questions.)<br /><br />I should also mention that the vendors were all very hospitable. One of my enduring memories of this year will be Life Technologies allowing the Canadians to crash their suite and use one of their Demo TV's to watch the semi-final Olympic hockey game. (Canada vs. Slovakia.) We were desperately outnumbered by non-Canadians, but they tolerated our screaming pretty well. (A few of them even seemed curious about this weird sport played on ice...) And, of course, anyone who saw my tweets knows about PacBio and the hawaiian shirt, just to name a few examples (-;<br /><br />So, again, I think AGBT was a great success and I enjoyed it tremendously. Rarely in my life do I get to pack so many talks, discussions and networking into such a short period of time. It may have left me looking somewhat like a deer caught in the headlights, but unquestionably I'm already looking forward to what will be revealed next year.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-3828514342733426570?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com8tag:blogger.com,1999:blog-23988675.post-64410078050113758062010-02-28T09:04:00.002-08:002010-03-02T10:48:40.754-08:00Complete Genomics, Revisited (Feb 2010)While I'm writing up my notes on my way back to Vancouver, I thought I'd include one more set of notes - the ones I took while talking to the Complete Genomics team.<br /><br />Before launching into my notes (which won't really be in note form), I should give the backstory on how this came to be. Normally, I don't do interviews, and I was very hesitant about doing one this time. In fact, the format came out more like a chat, so I don't mind discussing it - with Complete Genomic's permission.<br /><br />Going back about a week or so, I received an email from someone working on PR for Complete Genomics, inviting me to come talk with them at AGBT. They were aware of my blog post from last year, written after discussing some aspects of their company with several members of the Complete Genomics team. <br /><br />I suppose in the world of marketing, any publicity is good publicity, and perhaps they were looking for an update for the blog entry. Either way, I was excited to have an opportunity to speak with them again, and I'm definitely happy to write what I learned. I won't have much to contribute beyond what they've discussed elsewhere, but hey, not everything has to be new, right?<br /><br />In the world of sequencing, who is Complete Genomics? They're clearly not 2nd generation technology. Frankly, their technology is the dinosaur in the room. While everyone else is working on single molecule sequencing, Complete Genomics is using technology from the stone age of sequencing - and making it work. <br /><br />Their technology doesn't have any bells and whistles - and in fact, the first time I saw their ideas, I was fairly convinced that it wouldn't be able to compete in the world of the Illuminas and Pac Bios... and all the rest. Actually, I think I was right. What I didn't know at the time was that they don't need to compete. They're clearly in their own niche - and they have the potential to become the 300 pound gorilla. <br /><br />While they're never going to be the nimble or agile technology developers, they do have a shot at dominating the market they've picked: Low cost, standardized genomics. As long as they stick with this plan - and manage to keep their cost lower than everyone else, they've got a shot... Only time will tell.<br /><br />A lot of my conversation with Complete Genomics revolved around the status of their technology - what it is that they're offering to their customers. That's old hat, though. You can look through their web page and get all of the information - you'll probably even get more up to date information - so go check it out.<br /><br />What is important is that their company is based on well developed technology. Nothing that they're doing is bleeding edge, nothing is going to be a surprise show stopper: of all of the companies doing genomics, they're the only one that can accurately chart the path ahead with clear vision. Pac bio may never solve their missing base problem, Illumina may never get their reads past 100bp, Life Tech may never solve their dark base problem, and Ion Torrent may never have a viable product. You never know... but Complete Genomics is the least likely to hit a snag in their plans.<br /><br />That's really the key to their future fate - there are no bottle necks to scaling up their technology. We'll all watch as they bring down the distance between the spots on their chips, lower the amount of reagent required, and continue to automate their technology. It's not rocket science - it's just engineering. Each time they drop the scale of their technology down, they also drop the cost of the genome. That's clearly the point - low cost.<br /><br />The other interesting thing about their company is that they've really put an emphasis on automation and value-added services. Their process is one of the more hands off processes out there. It's an intriguing concept. You fed-ex the DNA to them, and you get back a report. Done.<br /><br />Of course, I have to say that while this may be their strength, it's probably also one of their weaknesses. As a scientist, I don't know that the bioinformatics of the field are well enough developed yet that I trust someone to do everything from alignment to analysis on a sample for me. I've seen aligners come and go so many times in the last 3 years that I really believe that there is value in having the latest modifications. <br />What you're getting from Complete Genomics is a snapshot of where their technology is at the moment you (figuratively) click the "go" button. Researchers like do play with their data, revisit it, optimize it and squeeze every last drop out of it - something that is not going to be easy with a Complete Genomics dataset. (They aren't sharing their tools..) However, as I said earlier, they're not in the business of competing with the other sequencing companies - so really, they may be able to side step this weakness entirely by just not targeting those people who feel this way about genomic data.<br /><br />And that also brings me to their second weakness - they are fixated on doing one thing, and doing it well. That's often the sign of a good start-up company: a dogged pursuit of a single goal of excellence in one endeavour. However, in this one case, I disagree with Dr. Dramanac. Providing complete genomes is only part of the picture. In the long run, genomic information will have to be placed in the context of epigenetics, and so I wonder if this is an avenue that they'll be forced to travel in the future. For the moment, Dr. Drmanac insists that this is not something they'll do. If they haven't put any thought into it, when it does become necessary, it's something that will drive customers towards a company that can provide that information. Not all research questions can be solved by gazing into genomic sequences, and that's a reality that could bite them hard.<br /><br />For the moment, at least, Complete Genomics is well positioned to work well with researchers who don't want to do the lab and bioinformatics tweaking themselves. You can't ask a microbiology lab to give up their PCR machine, and sequencing centres will never drop the 2nd (and now 3rd) generation technology lab to jump on board the 1st generation sequencing provided by Complete Genomics. Despite the few centres that have ordered a few genomes (wow.. I can't just believe I said "a few genomes"), I don't see any of them committing to it in the long run for all of the reasons I've pointed out above. <br /><br />However, Complete Genomics could take over genomic testing for pharma or hospital diagnostics. Whoever is best able to identify variations (structural or otherwise) in genomes for the lowest cost will be the best bet to do cohort studies for patient stratification studies - and hey, maybe they'll be the back end for the next 23andMe.<br /><br />So, to conclude, Complete Genomics has impressed me with their business model, and they have come to know themselves well. I'll never understand why they think AGBT is the right conference to showcase their company, when it's not likely to yield that many customers in the long run. But, I'm glad I've had the chance to watch them grow. Although they may be a dinosaur in the technology race, the T-Rex is still a fearsome beast, and I'd hate to meet one in a dark alley.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6441007805011375806?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com2tag:blogger.com,1999:blog-23988675.post-4986765723583213482010-02-27T14:11:00.001-08:002010-02-27T14:14:43.455-08:00AGBT 2010 - Illumina Workshop[I took these notes on a scrap of paper, when my laptop was starting to run low on batteries. They're less complete than most of the other talks I've taken notes on, but should still give the gist of the talks. Besides, now that I'm at the airport, it's nice to be able to lose a few pieces of scrap paper.]<br /><br /><br /><span style="font-weight:bold;">Introducing the HiSeq 2000(tm)</span><br /> - redefining the trajectory of sequencing<br /><br /><span style="font-weight:bold;">First presentation:<br /> - Jared from Marketing</span><br /> <br />Overview of machine.<br /> - real data of Genome and transcriptome<br /> - more than 2 billion base pairs per run<br /> - more than 25Gb per day<br /> - uses line scanning (scan in rows, like a photocopier, instead of a whole picture at once, like a camera)<br /> - now uses "dual surface engineering": image both the top and bottom surface, which means you have twice as much area to form clusters<br /> - Machine holds two individual flow cells<br /> - flow cells are held in by a vacuum <br /> - simple insertion - just toggle a switch through three positions - an LED lights up when you've turned it on.<br /> - preconfigured reagenets - bottles all stacked together: just push in the rack<br /> - touch screen user interface<br /> - "wizard" like set up for runs<br /> - realtime metrics available on interface - even an ipod app (available for ipad too..)<br /> - multimedia help will walk you through things you may not understand.<br /> - major focus on ease of use<br /> - it has the "simplest workflow" of any of the sequencing machines available<br /> - tile size reduced [that's what I wrote but I seem to recall him saying that the number of tiles is smaller, but the tiles themselves are larger?]<br /> - 1 run can now do a 30x coverage for a cancer and a normal (one in each flow cell.)<br /> - 2 methylomes can be done in a week<br /> - you could do 20 RNA-Seq experiments in 4 days.<br /><br />Next up:<br /> <span style="font-weight:bold;">David Bently<br /></span><br />Major points:<br /> - error rates and feel of data are similar if not identical to the GAIIx.<br /> - from a small sampling of experiments shown it looks like error rate is very slightly higher<br /> - Demonstrated 300Gb/run, more than 25Gb per day at release<br /> - PET 2x100 supported.<br /> - Software is same for GAII [Although somewhere in the presentation, I heard that they are working on a new version of the pipeline (v 1.6?)... no details on it, tho.]<br /><br />Next up:<br /><span style="font-weight:bold;">Eliot Margulies, NHGRI/NIH Sequencin</span>g<br /> - talking about projects today for the undiagnosed disease program<br /><br />work flow<br /> - basically same as in his earlier talk [notes are already posted.]<br /> - use cross match to do realignment of reads that don't map first time<br /> - use MPG scores<br /><br />[In a technology talk, I didn't want to take notes on the experiment itself... mainly points are on the HiSeq data.<br /><br />Data set: concordance with SNP Chips was in the range of 98% for each flow cell, 99% when both are combined (72x coverage)<br /><br />Impressions:<br /> - Speed: Increased throughput<br /> - more focus on biology rather than on tweaking pipelines and bioinformatic processing. (eg, biological analysis takes front seat.)<br /><br /><br />Next Up:<br /><span style="font-weight:bold;">Gary Schroth</span><br /><br />Working on a project for Body Map 2.0 : Total human transcriptome<br /> - 16 tissues, each PET 2x50bp, 1x75bp<br />Cost:<br /> - $8,900 for 1x50bp<br /> - multiplexing will reduce cost further.<br /> - if you only need 7M reads, you could mutliplex 192 samples (on both cells, I assume), and the cost would be $46. (including seqeuncing, not sample prep.<br /><br />[which just makes the whole cost equation that much more vague in my mind... Wouldn't it be nice to know how much it costs to do the whole process?]<br /><br />[Many examples of how RNA-seq looks on HiSeq 2000 (tm)]<br /><br />Summary:<br /> - output has 5 billion reads, 300Gb of data.<br /><br />Next up:<br />David Bently<br /><br />Present a graph<br /> - amount of sequence per run. <br /> - looks like a "hockey stick graph"<br /><br />[Shouldn't it be sequence per machine per day? It'd still look good - and wouldn't totally shortchange the work done on the human genome project. This is really a bad graph.... at least put it on a log scale.]<br /><br />In the past 5 years:<br /> - 10^4 scale in throughput<br /> - 10^7 scale up in parallelizations<br /><br />Buzzwords about the future of the technology:<br /> - "Democratizating sequencing" <br /> - "putting it to work"<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-498676572358321348?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-82278819524781059022010-02-27T10:47:00.000-08:002010-02-27T10:48:27.997-08:00AGBT 2010 - Complete Genomics Workshop<span style="font-weight:bold;">Complete Genomics CEO: </span><br /><br />Mission: <br /> - sequence only human genomes - 1 Million genomes in the next 5 years<br /> - build out tools to gain a good undertanding of the human genome<br /> - done 50 genomes last year<br /> - Recent Science publication <br /> - expect to do 500 genomes/month<br /><br />Lots of Customers.<br /> - Deep projects<br /><br />Techology <br /> - don't waste pixels, <br /> - use ligases to read<br /> - very high quality reads - low cost reagents<br /> - provide all bioinformatics to customers<br /><br />Business <br /> - don't sell technology, just results.<br /> - just return all the processed calls (snps, snv, sv, etc)<br /> - more efficient to outsource the "engineering" for groups who just want to do biology<br /> - fedex sample, get back results.<br /> - high throughput "on demand" sequencing<br /> - 10 centres around the world<br /> - Sequence 1 Million genomes to "break the back" of the research problem<br /><br />Value add<br /> - they do the bioinformatics<br /><br />Waves:<br /> - first wave: understand functional genomics<br /> - second wave: pharmaceutical - patientient stratification<br /> - third wave: personal genomics - use that for treatment<br /><br />Focus on research community<br /><br />Two customers to present results: <br />First Customer:<br /><br /><span style="font-weight:bold;">Jared Roach, Senior Research Sceintist, Institute for Systems Biology (Rare Genetic disease study)</span><br /><br />Miller Syndrome<br /> - studied coverage in four genomes<br /> - 85-92% of genome<br /> - 96% coverage in at least one individual<br /> - Excellent coverage in unique regions.<br /><br />Breakpoint resolution <br /> - within 25bp, and some places down to 10bp<br /> - identified 125 breakpoints<br /> - 90/125 occur at hotspots<br /> - can reconstruct breakpoints in the family<br /><br />Since they have twins, they can do some nice tests<br /> - infer error rate: 1x10^-5 <br /> - excluded regions with compression blocks (error goes up to 1.1^-5)<br /> - Homozygous only: 8.0x10^-6 (greater than 90% of genome)<br /> - Heterozygous only: 1.7x10^-4<br /><br />[Discussion of genes found - no names, so there's no point in taking notes. They claim they get results that make sense.]<br /><br />[Time's up - on to next speaker.<br /><br />Second Customer:<br /><span style="font-weight:bold;">Zemin Zhang, Senior Scientist, Genentech/Roche (Lung Cancer Study)</span><br /><br />Cancer and Mutations <br />[Skipping overview of what cancer is.... I think that's been well covered elsewhere.]<br /><br />Objective:<br /> - lung cancer is the leading cause of cancer related mortality worldwide...<br /> - significant unmet need for treatment<br /><br />Start with one patient<br /> - non small cell lung adenocarcinoma.<br /> - 25 cigarettes/day<br /> - tumour: 95% cancer cells<br /><br />Genomic characterization on Affy and Agilent arrays<br /> - lots of CNV and LOH<br /> - circos diagrams!<br /><br /><br /> - 131GB mapped sequence in normal, 171Gb mapped seq in tumour<br /> - 46x coverage normal, 60x tumour<br />[Skipping some info on coverage...]<br /><br />KRAS G12C mutation<br /><br />what about rest of 2.7M SNVs?<br /> - SomaticScore predicts SNV validation rates<br /> - 67% are somatic by prediction<br /> - more than 50,000 somatic SNV are projected<br /><br />Selection and bias observed in the lung cancer genome by comparing somatic and germline mutations<br /><br />GC to TA changes: Tobacco-associated DNA damage signature<br /><br />Protection against mutations in coding and promoter regions. <br /> - look at coding regions only - mutations are dramatically less than expected - there is probably strong selection pressure and/or repair<br /><br />Fewer mutations in expressed genes.<br /> - expressed genes have fewer mutations even lower in transcribed strand<br /> - non-expressed genes have mutation rate similar to non-genic regions<br /><br /> Positive selection in subsets of genes<br /> - KRAS is the only previously known mutation<br /> - Genes also mutated in other lung cancers...<br /> - etc<br /><br />Finding structural variation by paired end reads<br /> - median dist between pairs 300bp.<br /> - distance almost never goes beyond 1kb.<br /><br />Look for clusters of sequence reads where one arm is on a different chromosome or more than 1kb away<br /> - small number of reads<br /> - 23 inter-chr<br /> - 56 intra-chr<br /> - use fish + pcr<br /> - validate results<br /> - 43/65 test cases are found to be somatic and have nucleotide level breakpoint junctions<br /> - chr 4 to 9 translocation<br /> - 50% of cells showed this fusion (FISH)<br /><br />Possible scenario of Chr15 inversion and deletion investigated.<br />[got distracted, missed point.. oops.]<br /><br />Genomic landscape: <br /> - very nice Circos diagram<br /> - > 1 mutation for every 3 cigarettes<br /><br />In the process of doing more work with Complete Genomics<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-8227881952478105902?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com1tag:blogger.com,1999:blog-23988675.post-27921054897823504252010-02-27T10:05:00.001-08:002010-02-27T10:06:33.050-08:00AGBT 2010 - Yardena Samuels - NHGRI<span style="font-weight:bold;">Mutational Analysis of the Melanoma Genome</span><br /><br />Histological progression of Melanocyte Transformation<br /> - too much detail to copy down<br /><br />Goals: <br /> - mutational analysis of signal transduction gene families in genome<br /> - evaluate most highly mutated gene family members<br /> - translational<br /> <br />Somatic mutation analysis.<br /> - matched tumor normal <br /> - make cell lines<br /> <br />Tumor Bank establishment<br /> - 100 tumor normal samles<br /> - also have original OCT blocks<br /> - have clinical information<br /> - do SNP detection for matching normal/tumor<br /> - 75% of cells are cancer<br /> - look for highly mutated oncogenes<br /><br />Start looking for somatic mutations<br /> - looking at TK family (kinome)<br /> - known to be frequently mutated by cancer<br /><br />Sanger did this in the past, but only did 6 melanomas<br /> - two phases: discovery, validation<br /> - started with 29 samples - all kinase domains<br /> - looked for somatic mutations<br /> - move on to sequence all domains...<br /><br /> - 99 NS mutations<br /> - 19 genes<br /><br />[She's talking fast, and running through the slides fast! I can't keep up no matter how fast I type.]<br /><br />Somatic mutations in ERBB4 - 19% in total<br /> - one alteration was known in lung cancer<br /> <br />[Pathway diagram - running through the members VERY quickly] (Hynes and Lane, Nature Reviews)<br /><br />Which mutation to investigate? Able to use crystal structure to identify location of mutations. Select for the ones that were previously found in EGFR1 and (something else?)<br /><br />Picked 7 mutations, cloned and over-expressed - basic biochemistry followed.<br /><br />[Insert westerns here - pricket et al Nature Genetics 41, 2009]<br /><br />ERBB4 mutations have increased basal activity - also seen in melanoma cells<br /><br />Mutant ERBB4 promotes NIH3T3 Transformation<br /><br />Expression of Mutant ERBB4 Provides an Essential cell Survival Signal in Melanoma<br /> - oncogene addiction<br /> <br />Is this a good target in the clinic.<br /> - used lapatinib.<br /> - showed that it also works here in melanoma. Mutant ERBB4 sensitizes cells to lapatinib<br /> - mechanism is apoptosis<br /> - it does not kill 100% of cells - may be necessary to combine it with other drugs.<br /><br />conclusions<br /> - ERBB4 is mutated in 19% of melanomas<br /> - reiterate poitns<br /> - new oncogene in melanoma<br /> - can use lapatinib<br />[only got 4 of the 8 or 9]<br /><br />Future studies<br /> - maybe use in clinics - trying a clinical trial.<br /> - will isolated tumor dna w ICM<br />... test several hypotheses.<br /> - sensitivity to lapatinib<br /><br />What else should be sequenced? not taking into account whole genome sequencing.<br /> - look at crosstalk to get good targets<br /> - List of targets. (mainly transduction genes)<br /><br />Want to look at other cancers, where whole exome was done.<br /> - revealed : few gene alterations in majority of cancers. Limited number of siganlling pathways. Pathway oriented models will work better than Gene oriented models<br /><br />[ chart that looks like london subway system... have no idea what it was.]<br /><br />Personalized Medicine<br /> - their next goal.<br /><br />[great talk - way too fast, and is cool, but no NGS tie in. Seems odd that she's picking targets this way - WGSS would make sense, and narrow things down faster.]<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-2792105489782350425?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-81851073760549654762010-02-27T08:35:00.001-08:002010-02-27T08:43:07.074-08:00AGBT 2010 - Joseph Puglisi - Stanford University School of Meicine<span style="font-weight:bold;">The Molecular Choreography of Translation</span><br /><br />Questions have made the same, despite recent advances - we still want to understand how the molecular machines work. We always have snapshots that capture the element of motion, but we want animation, not snapshots<br /><br />Translation<br /> - Converting nucleotides to amino acids. <br /> - ribosome 1-20 aa/s<br /> - 1/10^4 errors<br /> - very complex process (tons of proteins factors, etc, required for the process)<br /> -requires micro-molar concentrations of each component<br /><br />Ribosome<br /> - we now know the structure of the ribosome <br /> - nobel prize given for it.<br /> - 2 subunits. (50S & 30S)<br /> - 3 sites, E, P & A<br /> - image 3 trna's to a ribosome - in the 3 sites...<br /> - all our shots are static - no animated<br /> - The Ribosome selects tRNA for Catalysis - must be correct, and incorrect must be rapidly rejected<br /> - EFTu involved in rejection<br /><br />[Walking us through how ribosomes work - there are better sources for this on the web, so I'm not going to copy it.]<br /><br />Basic questions:<br /> = timing of factor<br /> - initiation pathway<br /> - origins of translational fidelity<br /> - mechanisms<br /><br />Look at it as a high dynamic process<br /> - flux of tRNAs<br /> - movements of the ribosome (internal and external)<br /> - much slower than photosynthesis, so easier to observe.<br /><br />Can we track this process in real time?<br /> - Try: Label the ligand involved in translation.<br /> - Problem: solution averaging destroys signal (many copies of ribosome get out of sync FAST.) would require single molecule monitoring<br /> - Solution: immobilization of single molecule - also allows us to watch for a long time <br /><br />Single molecule real time translation<br /> - Functional fluorescent labeling of tRNAs ribosomes and factors<br /> - surface immobilization retains function.<br /> - observation of translation at micromolar conc. fluorescent components<br /> - instrumentation required to resolve multiple colors<br /> - yes, it does work.<br /> - you can tether with biotin-streptavidin, instead of fixing to surface<br /> - immobilization does not modify kinetics<br /><br />Tried this before talking to Pac Bio - It was a disaster. Worst experiments they'd ever tried.<br /><br />Solution: <br /> - use PAcBio ZMW to do this experiment.<br /> - has multiple colour resolution required<br /> - 10ms time resolution<br /><br />Can you put a 20nm ribosome into a 120nm hole? Use biotin tethering - Yes<br /><br />Can consecutive tRNA binding be observed in real time? Yes<br /><br />Flourescence doesn't leave after... they overlap because the labeled tRNA must transit through the ribosome.<br /> - at low nanomolar sigals, you can see the signals move through individual<br /> - works at higher conc.<br /> - if you leave EF-G out, you get binding, but no transit - then photobleaching.<br /> - demonstrate Lys-tRNA<br /> - 3 three labeled dyes (M, F, K)... you can see it work. <br /> - timing isn't always the same (pulse length)<br /> -missing stop coding - so you see really long stall with labeled dye... and then sampling, as other tRNAs try to fit.<br /> - you can also sequence as you code. [neat]<br /> <br />Decreased tRNA transit time at higher EF-G concentrations<br /> - if you translocate faster, pulses are faster<br /> - you can titrate to get the speed you'd like.<br /> - translation is slowest for first couple of codons, but then speeds up. This may have to do with settling the reading frame? Much work to do here.<br /><br />Ribosome is a target for antibiotics<br /> - eg. erythromycin<br /> - peptides exit through a channel in the 50S subunit.<br /> - macrolide antibiotics block this channel by binding inside at narrowest point.<br /> - They kill peptide chains at 6 bases. Are able to demonstrate this using the system.<br /><br />Which model of tRNA dissociation during translation is correct<br /> - tRNA arrival dependent model<br /> - Translocate dependent model<br /><br />Post syncrhonization of number of tRNA occupancy<br /> - "remix our data"<br /> - data can then be set up to synchronize an activity - eg, the 2nd binding.<br /><br />Fusidic acid allows the translocation but blocks arrival of subsequent tRNA to A site.<br /> - has no effect on departure rate of tRNA.<br /> <br />only ever 2 trnas at once on Ribosome. - it can happen, but not normally<br /><br />Translocation dependent model is correct.<br /><br />Correlating ribosome and tRNA dynamics<br /> - towards true molecular movies<br /> - label tRNAs... monitor fluctuation and movement<br /><br />Translational processes are highly regulated<br /> - regulation of initiation (51 and 3` UTR)<br /> - endpoint in signallig pathways (mTOR, PKR)<br /> - programmed changes in reading frames (frameshifts)<br /> - control of translation mode (IRES, nromal)<br /> - target of therapeutics (PTC124 [ribosome doesn't respect stop codons] and antibiotics)<br /><br />Summary:<br /> - directly track in real time<br /> - tRNAs dissociate from the E site post translocation and no correlation...<br /><br />Paper is in Nature today.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-8185107376054965476?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-83137775147301108242010-02-27T08:35:00.000-08:002010-02-27T08:40:29.702-08:00AGBT 2010 - Joseph Puglisi - Stanford University School of Meicine<span style="font-weight:bold;">The Molecular Choreography of Translation</span><br /><br />Questions have made the same, despite recent advances - we still want to understand how the molecular machines work. We always have snapshots that capture the element of motion, but we want animation, not snapshots<br /><br />Translation<br /> - Converting nucleotides to amino acids. <br /> - ribosome 1-20 aa/s<br /> - 1/10^4 errors<br /> - very complex process (tons of proteins factors, etc, required for the process)<br /> -requires micro-molar concentrations of each component<br /><br />Ribosome<br /> - we now know the structure of the ribosome <br /> - nobel prize given for it.<br /> - 2 subunits. (50S & 30S)<br /> - 3 sites, E, P & A<br /> - image 3 trna's to a ribosome - in the 3 sites...<br /> - all our shots are static - no animated<br /> - The Ribosome selects tRNA for Catalysis - must be correct, and incorrect must be rapidly rejected<br /> - EFTu involved in rejection<br /><br />[Walking us through how ribosomes work - there are better sources for this on the web, so I'm not going to copy it.]<br /><br />Basic questions:<br /> = timing of factor<br /> - initiation pathway<br /> - origins of translational fidelity<br /> - mechanisms<br /><br />Look at it as a high dynamic process<br /> - flux of tRNAs<br /> - movements of the ribosome (internal and external)<br /> - much slower than photosynthesis, so easier to observe.<br /><br />Can we track this process in real time?<br /> - Try: Label the ligand involved in translation.<br /> - Problem: solution averaging destroys signal (many copies of ribosome get out of sync FAST.) would require single molecule monitoring<br /> - Solution: immobilization of single molecule - also allows us to watch for a long time <br /><br />Single molecule real time translation<br /> - Functional fluorescent labeling of tRNAs ribosomes and factors<br /> - surface immobilization retains function.<br /> - observation of translation at micromolar conc. fluorescent components<br /> - instrumentation required to resolve multiple colors<br /> - yes, it does work.<br /> - you can tether with biotin-streptavidin, instead of fixing to surface<br /> - immobilization does not modify kinetics<br /><br />Tried this before talking to Pac Bio - It was a disaster. Worst experiments they'd ever tried.<br /><br />Solution: <br /> - use PAcBio ZMW to do this experiment.<br /> - has multiple colour resolution required<br /> - 10ms time resolution<br /><br />Can you put a 20nm ribosome into a 120nm hole? Use biotin tethering - Yes<br /><br />Can consecutive tRNA binding be observed in real time? Yes<br /><br />Flourescence doesn't leave after... they overlap because the labeled tRNA must transit through the ribosome.<br /> - at low nanomolar sigals, you can see the signals move through individual<br /> - works at higher conc.<br /> - if you leave EF-G out, you get binding, but no transit - then photobleaching.<br /> - demonstrate Lys-tRNA<br /> - 3 three labeled dyes (M, F, K)... you can see it work. <br /> - timing isn't always the same (pulse length)<br /> -missing stop coding - so you see really long stall with labeled dye... and then sampling, as other tRNAs try to fit.<br /> - you can also sequence as you code. [neat]<br /> <br />Decreased tRNA transit time at higher EF-G concentrations<br /> - if you translocate faster, pulses are faster<br /> - you can titrate to get the speed you'd like.<br /> - translation is slowest for first couple of codons, but then speeds up. This may have to do with settling the reading frame? Much work to do here.<br /><br />Ribosome is a target for antibiotics<br /> - eg. erythromycin<br /> - peptides exit through a channel in the 50S subunit.<br /> - macrolide antibiotics block this channel by binding inside at narrowest point.<br /> - They kill peptide chains at 6 bases. Are able to demonstrate this using the system.<br /><br />Which model of tRNA dissociation during translation is correct<br /> - tRNA arrival dependent model<br /> - Translocate dependent model<br /><br />Post syncrhonization of number of tRNA occupancy<br /> - "remix our data"<br /> - data can then be set up to synchronize an activity - eg, the 2nd binding.<br /><br />Fusidic acid allows the translocation but blocks arrival of subsequent tRNA to A site.<br /> - has no effect on departure rate of tRNA.<br /> <br />only ever 2 trnas at once on Ribosome. - it can happen, but not normally<br /><br />Translocation dependent model is correct.<br /><br />Correlating ribosome and tRNA dynamics<br /> - towards true molecular movies<br /> - label tRNAs... monitor fluctuation and movement<br /><br />Translational processes are highly regulated<br /> - regulation of initiation (51 and 3` UTR)<br /> - endpoint in signallig pathways (mTOR, PKR)<br /> - programmed changes in reading frames (frameshifts)<br /> - control of translation mode (IRES, nromal)<br /> - target of therapeutics (PTC124 [ribosome doesn't respect stop codons] and antibiotics)<br /><br />Summary:<br /> - directly track in real time<br /> - tRNAs dissociate from the E site post translocation and no correlation...<br /><br />Paper is in Nature today.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-8313777514730110824?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-90077047634635167742010-02-27T07:33:00.000-08:002010-02-27T07:34:05.834-08:00AGBT 2010 - Bing Ren - UCSD<span style="font-weight:bold;">Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells</span><br /><br />Sequencing of the human genome has led to<br /> * identification of disease causing genes<br /> * Personalized medicine<br /> * advanced sequencing technologies<br /> * Foundation for understanding the construction of human beings<br /> <br />But DNA is only half the story <br /> * variations in DNA alone not account for all variations in phenotypic traits<br /> * organisms with identical DNA often exhibit distinct phenotypes (eg plants, insects, mammals)<br /> * Epigenetic changes contribute to human diseases, phenotypes, etc<br /><br />We know about the mechanisms<br /> * DNA is wrapped around histone proteins which can be modified<br /> * DNA is itself modified (methylation)<br /><br />[paraphrased] DNA is hardware, epigenome is the software (Duke university quote... missed author's name)<br /><br />Challenges<br /> * very complex<br /> * varies among different cell types<br /> * generally reprogrammed during the life cycle of tan organism<br /> * Epigenome is also affected by environmental clues<br /><br />How do we ecipher the "epigentic code"?<br /> * sytematic approach<br /> * large scale profindg of chromatin modification<br /> * finding common modifications<br /> * validation<br /> <br />Profiling: <br /> * ChIP-Seq based. (started with Tiling arrays)<br /> * use antibodies that recognize chromatin modification.<br /> <br />Vignette: <br />[beautiful pictures]<br /> * Chromatin signature for the promoter and gene body<br /> * H3K4me3 marks active promoters<br /> * H3K36me3 marks gene body of active genes<br /> * Signature has led to identification of thousands of long non-coding RNA genes.<br /><br />Chromatin signatures of enhancers<br /> * Can use information about modifications to model patterns<br /> * predict enhancers in the human genome.<br /> * 36,589 enhancer predictions were made<br /> * 56% found in intergenic regions<br /> * test a few with reporter assays - show that 80% of predicted enhancers do drive reporter genes. (Far fewer of the control sequences do - missed number)<br /><br />Finding chromatin modification patterns in the genome de novo<br /> (Hon et al, PLoS Comp Bio 2009)<br /> * 16 different patterns of chromosome modification<br /> * some are enhancers, <br /> * others have no associations<br /> * one has pattern highly enriched for exons.. regulates alt splicing.<br /><br />Summary <br /> * chromatin modification patterns could be used to annotate ...<br /> * Epigenome Roadmap project (Generate reference epigenome maps for a large number of primary human cells and tissues)<br /><br />Datasets are available at GEO. (NCBI)<br /><br />Mapping of DNA methyltion and 53 histone modifications in human cells<br /> * Human embryonic stem cells (H1) <br /> * Fetal fibroblast cell line<br /><br />Method for mapping DNA methylation<br /> * Ryan Lister and Joe Ecker (Salk)<br /> * sodium bisulfite (C to U), if not methylated<br /> * Must do deep sequencing. If using HiSeq - could do it in 10 days. Used to take 20 runs<br /> * Methylation status for more than 94% of cytosines determined.<br /> * 75.5% in H1, 99.98% in Fibroblast<br /> * DNA methylation is depletee from functional sequences<br /> * no-CpG methlyation is enriched in gene body of transcribed genes suggesting link to the transcription process<br /><br />11 chromatin modification marks<br /> * comparing cells: different results<br /> * K9me3 and K27me3 become dramatically extended (7% in ES to more than 30% in fibroblast.)<br /> * genes with above marks are highly enriched in developmental genes.<br /><br />Reduction of repressive chromatins in induced pluripotent cells<br /><br />Repressive chromatin domains occupy small fraction of genome which is maintained as open structure in stem cells<br /> <br />Repressive chromatin domains occupy large fraction of genome, keeping genes involved in development silenced in differentiated cells.<br /><br />Summary: <br /> * widespread difference in epigenomes of ES and fibroblasts <br /> * stem cells are characterized by abundant non-CpG methylation<br /> * Expansion of repressive domains may be a key characteristic of cellular differentiation<br /> * [Missed 2]<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-9007704763463516774?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-5319824168763487502010-02-27T06:42:00.000-08:002010-02-27T07:00:49.087-08:00AGBT 2010 - Jesse Gray - Harvard Medical School<span style="font-weight:bold;">Widespread RNA Polymerase II Recruitment and Transcription at Enhancers During Stimulus-Dependent Gene Expression</span><br /><br />Mamalian brain is [paraphrased] Awesome technology<br />* Sensory experience shapes brain wiring via neuronal activation<br />* Whiskers compete for real estate in meta-sensory cortex.<br />* Brain can re-wire to adapt to environment<br />* Transcriptional changes in nucleous as brain cells reprogram<br />* (Discussion in terms of real-estate for rat whisker areas of brain.)<br /><br />Neuronal activation affects circuit function by altering gene expression<br />* Activity dependent gene expression<br /><br />Cascade<br />* Ca++ influx<br />* kinases & phosphatases<br />* CREB + SRF TFs<br />* recruit Creb binding protein<br />* Induce about 50-100x expression in genes (eg, fos)<br />* Can we do genome wide approaches to understand what's being expressed?<br /><br />An experimental system for genome-wide analysis of activity-regulatee gene expression<br />* grow in dish<br />* depolarize with KCl<br />* do ChIP-seq and RNA-seq<br /><br />CBP and transcription factor binding at fos locus<br />* see CBP binding at conserved region up stream, as well as promotor for fos gene<br />* also see NPAS4 CREB and SRF with similar (but not identical) binding sites<br /><br />Is the activity dependent binding CBP restricted to the locus or genome wide?<br />* compare CBP peaks in both conditions<br />* binding appears limited to KCL stimulated only.<br /><br />Are CBP-bound sites enhancers or promoters or both?<br />* Promoters don't necessarily drive transcription<br />* Promoters have H3K4Me3 histone modifications (enhancers dont)<br />* 3d configuration to bring enhancers together with promoters.<br /><br />Most CBP peaks are not at TSSs and do not show H3K4Me3<br />* 5079 at TSSSs<br />* 36,069 not at TSSs<br /><br />Align all seq that are enhancers<br />* there is much M3K4Me1 (clear pattern)<br />* there is not much M3K4Me3<br /><br />Use known site<br />* upstream from Arc - used to build a construct<br /><br />CBP and HK4Me1-marked loci function as activity-dependent transcriptional enhancers.<br />* Found 8 enhancers<br /><br />Summarize:<br />* about 20,000 CBP sites that are activity-regulated enhancers<br />* do not correspond to annotated start sites<br />* H3K4Me1 modified<br />* lack H3K4Me3 mark<br />* do not initiate long RNAs<br />* confer activity-regulation on the arc promotor<br /><br />Questions about activity-regulated enhancers<br />* do they play a role in binding RNA Polymerase II?<br />* Evidence is tending towards saying that most enhancers do not seem to have RNAPII binding.<br /><br />fos enhancers bind RNAPII<br />* use chip for RNAPII and CBP<br />* 10-20% of sites have RNAPII at enhancer<br />* potential artifact - crosslinking conditions may exaggerate this by tying promotors and enhancers.<br /><br />Does RNAPII at enhancers synthesize RNA?<br />* Enhancers at the fos locus produce enhancer RNAs<br />* non-polyadenylated RNA? Yes.<br />* you do get some transcription at enhancers... [doesn't this start to describe lincRNA?]<br /><br />Enhancer transcription is correlated with promoter transcription.<br /><br />The Arc enhancer can be activated without the presence of the Arc promoter<br />* increases in polymerase binding at enhancer even when promoter is gone.<br />* preliminary - but may not be transcription when the promoter is gone.<br />* what is the function of eRNA transcription? (don't know the answer yet)<br />* Could be that it helps to lay down epigenetic marks.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-531982416876348750?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-63750850363624939162010-02-27T06:04:00.000-08:002010-02-27T06:32:03.861-08:00AGBT 2010 - Keynote: Henry Erlich - Roche Molecular Systems<span style="font-weight:bold;">Applications of Next Generation Sequencing: HLA Typing With the GSFLX System</span><br /><br />High Throughput HLA typing<br />* the allelic diversity is enormous<br />* Focussing on HLA class I and II genes (germ-line)<br /><br />Challengeing because it's the most polymorphic region in the genome<br />* HLA-B has well over 1000 alleles<br />* only 68 different serological types can be distinguished<br />* 3,529 genes at 12 loci as of April 2009<br />* chromosome 6<br />* Can't be typed using existing conventional techniques [I assume in high throughput]<br />* DR-DQ region - involved in type I diabetes<br />[Much detail here, which I can't get down fast enough with any hope at accuracy.]<br /><br />Polymorphism is highly localized.<br />* virtually all of the polymorphic amino acid residues are localized to a groove.<br />* most allelic differences are protein coding.<br />* critical to distinguish known alleles<br /><br />Nomenclature<br />* eg HLA-A * 24020101<br />* only the first 4 numbers are the ones that distinguish the protein.<br /><br />Survival curve for bone marrow transplant<br />* even with 8/8 allele matches, there are WAY more things that need to be matched - and so you need the best possible match.<br />* a single coding mismatch can cause graph vs host disease.<br />* Bone Marrow matching requires high precision<br /><br />[List of disease applications - 22 different diseases including Narcolepsy, cancers, drug allergic reactions..]<br /><br />GWAS in Type 1 diabetes.<br />* identified disease related genes - HLA SNPs are significant<br />* Dr-DQ haplotypes are associated strongly with Odds ratio for diabetes<br />* looking at genomic risk factors increase up to 40x<br /><br />[something about a particular combination of DR-DQ giving VERY high risk, and consequently is never seen in humans...]<br /><br />Forensics<br />* Dot blots... evolved into Probe Array Typing System.<br />* Even if you have hundreds of probes, you still have "HLA Genotye Ambiguity"<br />* "Fail to distinguish alleles" without NGS (with or without phasing..)<br /><br />[Explanation of how 454 works - protocol]<br /><br />Approach<br />* amplify exons with MID primers/emPCR/sequence<br /><br />Benefits of clonal sequencing<br />* set phase to reduce ambiguity<br />* allow amplification and sequencing of multiple members of multi-gene family with generic primers<br />* allow sorting /separation of co-amplified sequences from target sequence (signal)<br /><br />Parallel clonal sequencing of 8 loci x 24 samples<br /><br />[More protocol... ]<br /><br />Graph of read length : around 250bp<br /><br />Connexio Assignment of DRB1 Genotype<br />* image reassuring to a HLA researcher.<br />* like the interface (plug for the company)<br />* aligns sequence, consensus sequence, does genotype assignment<br />* [Must admit, the information on this interface is rather mysterious to me...]<br />* [Several more slides of Connexio data and immunology types that mean nothing to me.]<br />* get a genotype report... <br /><br />Analysis...<br /><br />Testing on SCIDS patient<br />* patients are potentially chimeric<br />* look for presence of non-transmitted maternal allele<br />* can find stuff in "fail layer" because software assumes only two alleles possible.<br /><br />[Wow... I know I don't know much immunology, but I'm not getting much out of this. This is a lot of software for immunologists, and I really don't understand the terminology, making it challenging to get coherent notes.]<br /><br />Takes about 4 days - [says 5-7 on the slide]<br />* amplicon prep<br />* emulsion<br />* DNA bead process<br />* loading wells<br />* sequencing on GSLFX<br />* Data analysis<br /><br />[Missed slide on how much data they were getting - 1M reads?]<br /><br />Multiplex - 500 samples in one run<br />* Got good results [not copying down seemingly random DRB numbers...]<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6375085036362493916?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-47023301895911698152010-02-26T22:09:00.000-08:002010-02-26T22:11:08.247-08:00AGBT 2010 - Christopher Mason - Weill Cornel Medical College<span style="font-weight:bold;">Developmental Changes in Human Neocortical Transcriptome Revealed by RNA-Seq</span><br /><br />How do we go from sequence to organism?<br /><br />Example of disease that they were able to find change in exon.. but that's not the normal. Brain transcriptome is especiallly bad.<br /><br />Complexity of transcriptome is vast.<br /><br />NGS transformed the amount of data we're getting<br /><br />Compared microarrays vs RNA-seq<br /> * RNA-seq gives you much more information on DE.<br /> * Metric for RNA-seq expression (Reads per kb per million reads)<br /> * Controls: spike in synthetic w poly-A tails [next slide: control worked]<br /><br />Looking at brain<br /> * validate existing gene boundaries.<br /> * longer isoforms<br /> * find other genes<br /> * 70-90% of genes expressed in the brain with strong neuro-developmental correlation<br /> * Ensembl genes categories expressed: many types of RNAs found<br /> * ~18% of splicee forms are unique to each individual - splicing levels similar across development<br /> * at high expression, 80-90% of genes have alt isoforms<br /><br />[Lists of genes that were DE in fetal/adult brain - "things that make sense"]<br /><br />What is different is Transcription Factors - especially Zinc Finger TFs. <br /> * Shift towards fetal expression<br /><br />Zinc Finger<br /> * most rapidly expanding class of genes<br /><br />Look at UTRs<br /> * fetal brain exhibits myriad extensions of gene models and variable UTRs.<br /> * TARs found. (Transcriptionally activated regions) - confirmed with PCR<br /><br />No visible end of gene discovery.<br /> * the deeper you go, the more new things you see.<br /><br />ROC plot<br /> * sensitivity (TP / TP + FN) and specificity <br /> * looks incredible - nearly straight to 1.<br /><br />Source of "wiggles" in RNA-seq.<br /> * it's everything, really<br /> * biggest problem: annotation is one source.<br /><br />Human genome is not just 33Mb.... it's only 1/2 to 1/5th ofthe exome capture.<br /> * 165 Mb have been validated on multiple SeQC platforms!<br /><br />There aren't just 20,000 genes - it's closer to 45,000!<br /><br />Begat: every bp of the genome is a locus for ttesting, each remiaing sequence is a variable.<br /><br />Don't forget, we also have to filter out viruses/bacteria/other<br /> * Code for Begat is available. (Email given - forgot to copy it down.)<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-4702330189591169815?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com3tag:blogger.com,1999:blog-23988675.post-68750713558804199262010-02-26T15:23:00.000-08:002010-02-26T15:24:13.936-08:00AGBT 2010 - Manual Garber - Broad<span style="font-weight:bold;">Annotating LincRNA Transcripts Using Targeted Sequencing</span><br /><br />Goal: Identify functional large ncRNAs in the mammalian genome<br /> * look like mRNA, but non-coding <br /> * Use Chip-Seq to separate genome into regions<br /> * use Tiling arrays, hybridize RNA... <br /> * Tiling arrays - no information about connectivity, limited resolution<br /><br /> * studying the functions of lincRNAs reqruie precise sequences for both experimental and computational analyses.<br /><br />Use RNA-Seq protocol to build transcriptome<br /><br />what RNA-seq gives you:<br /> * RNA, map to genome<br /> * introns... junction reads.<br /> * use reads with mate in poly-A to find end.<br /><br />Used Tophat to align<br /><br />Junction reads:<br /> * Longer reads provide junction evidence<br /> * first, use only reads that align with a gap. (Build connectivity map)<br /> * topology map<br /> * use map with ChIP-Seq data to build "paths"<br /> * use paths to call transcripts<br /> * clean up with Paired End Data - > join or kill unlikely isoforms.<br /><br />Example:<br /> * Mouse ES<br /> * Illumina sequence (156M - 76bp reads)<br /> * 75% exonic alignment<br /> * correctly reconstruct most expressed known genes at single nucleotide resolution.<br /> * works even on overlapping genes.<br /> * 81% genes fully-reconstructed<br /> * Good recovery of genes at all expression levels.<br /><br />Novel Transcripts discovered:<br /> * 800 loci between genes<br /> ** 250 out of 317 ES lincsRNA are reconstructed<br /> * 200 loci overlapping genes<br /> ** 131 overlap coding exons. (making them antisense for visual purpose.)<br /><br />Are they protein coding genes?<br /> * LincRNAs are probably too small to produce proteins [Strange assumption, IMHO... maybe I'm missing something.]<br /> * 650 of 800 have no lincRNAs have no coding potential<br /> * have lower expression level than coding regions.<br /> * intergenic transcript conservations.. (similar conservation to old lincRNAs)<br /> * Antisense transcripts? - no antisense coding potential<br /> * antisense expression - very low antisense expression<br /> * Antisense conservation - a little more conserved than sense lincRNA because of overlap with exons of genes<br /> * antisense exons are not conserved.<br /><br />What do overlapping trancripts do?<br /> * expression is low,<br /> * little or no conservation<br /> * correlation with overlapping transcripts<br /> * Thus: artifacts, noise, fine tuners? other ideas?<br /> <br />Conclusion <br /> * novel statistical method takes advantage of longer reads<br /> * mouse ES coding gene novelties<br /> * intergenic non coding RNA (lincRNA)<br /> * new family of antisense non coding RNA<br /> * validation of 18/20.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6875071355880419926?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-50156046793623632112010-02-26T15:00:00.002-08:002010-02-26T15:29:52.138-08:00AGBT 2010 - Brian Haas - Broad<span style="font-weight:bold;">Genome annotation using mRNA-Seq: A case study of Schizosaccharomyces pombe</span><br /><br />Leverage evidence for genome annotation<br />* eg, 3 ab initio gene predictions<br /><br />Major chanllenge:<br />* lack of high quality evidence<br />* this is changing with NGS.<br />* we now have evidence - but we need to standarize and develop algorithms<br />* reconstructing transcripts is difficult<br /><br />Approach 1: de novo assembly<br />* treat them like EST<br />* align to genome<br /><br />Approach 2: align reads to genome<br />* reconstruct based on alignments<br /><br />Sequencing genomes from Schizosaccharomyces<br />* pombe is model organism - sequenced in 2002<br />* 12.5Mb, 5k genes, avg gene 1,489 bp<br />* genome should be well annotated, good quality annotations<br /><br />Seq:<br />* 44M reads, 65% aligned (Maq)<br />* align to genome - look good<br />* challenge is to bring it to high quality automated state<br /><br />Align: Use TopHat for short read alignment + Cufflinks<br />Assemble: Velvet/Ananas + GMAP<br /><br />ELT structures transferred into PASA, which does refinement, alt splicing and validate existing annotations<br /><br />This is all exploration - This is NOT a tool Bake off.<br /><br />Elts: Velvet (21167), Cufflins (4158), Ananas (8309)<br />Almost all alignments to genome were perfect.<br /><br />Then, test how many assembled to reconstruct full length gene support: Ananas did best, cufflinks 2nd best, velet only 1/3 of those done by Ananas.<br />* Velvet did very well with supporting introns<br /><br />Problems:<br />* readthrough and encroachment<br />* again, ananas did best, velvet 2nd best, Cufflinks worst (by a long shot.)<br /><br />Examples given.<br />* Velvet seems to give fractionated transcripts.. breaks where coverage is high. [Probably seq errors are causing it to break?]<br />* some annotations needed to be extended<br />* corrected genes - merging two genes that are really one.<br /><br />Compare:<br />* none of these methods are great - they're all missing some that others caught.<br /><br />Challenges:<br />* some well covered genomic loci not fully reconstructed (paralogs?)<br />* intron readthrough/encroachment<br />* incorrectly merged genes/transcripts<br />* UTR structures and alt splicing.<br /><br />For well covered genomic loci not fully reconstructed<br />* identify disjoint regions<br />* colect reads and assemble independently<br />* genome directed to avoid misassembly<br />* very fast to do this<br />* This helps, but still have a long way to go.<br />* more tuning needed (expect to get up to 90%)<br /><br />Dissecting merged transcripts.<br />* use coverage based assembly clipping - break up transcripts<br /><br />Technology will greatly facilitate efforts<br />* Use stranded mRNA-seq<br /><br />Summary:<br />* the information from mRNA-seq is needed for high throughput annotation<br />* current tools show progress<br />* still much more to be done in optimization<br />* need for optimized methods for ALL types of genomes.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-5015604679362363211?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com4tag:blogger.com,1999:blog-23988675.post-60034117155882379582010-02-26T14:26:00.001-08:002010-02-26T22:08:19.475-08:00AGBT 2010 - Shuro Sen - NHGRI<span style="font-weight:bold;">Transcriptome Profiling of ClinSeq Particpants by Massively Parallel Short-Read DNA Sequencing</span><br /><br />[No Microphone - I may not get much from this talk. Mostly I will be pulling from Slides, I think]<br /><br />ClinSeq:<br />* cohort of 1,000 individuals<br />* initial focus on Cardiovascular disease<br />* Consent for follow up<br />* transcriptome, exome + few genomes<br />* application of large-scale medical sequencing in a clinical research setting.<br />* concurrent "Omes" from same individual<br />* move on to other diseases in the long term<br /><br />* started with sanger<br />* now moved to Illumina<br /><br />* published marker paper on this topic last Sept in Genome Research<br /><br />ExpressSeq<br />* transcriptome component of ClinSeq<br />* demonstrate use of RNA-seq in clinical research<br />* better than SAGE or Microarray<br /><br />Transcriptome + Exome<br />* gene expression<br />* splicing<br />* gene fusions<br />* etc<br /><br />Atherosclerosis<br />* hardening of arteries<br />* Looking for biomarkers for calcification<br />* can look for it by CT scan (in example, arteries look like bone.. [Ouch!]<br /><br />Study:<br />*4 people w high calcification, 4 with low calcification<br />* two RNA sources: LCLs and whole blood<br />* emphasis on uniform cell culture conditions<br />* repeated EBV transformation from same individual (see noise)<br />* RNA Fragmentation (Covaris S2)<br />* PCR amplification 12 cycles<br />* two PE 51bp lanes Illumina<br /><br />Differential gene expression<br />* Expression vs Statistical Significance.<br />* "upside down volcano plot"<br />* found about 100 genes that were differently expressed and significant<br />* Looking at those 100 in detail<br />* Many of these genes are noise.<br />* more sequencing reads to improve statistical depth<br /><br />Discussing his bet hits - but not giving names of genes.<br /><br />[Kind of silly to take notes on random unnamed genes. Take home message is that some of the genes were found that were known in the process -but obviously not all of them. TFs, TKs and something associated with rheumatoid arthitis. This might be a good time for me to rant about how picking any random list of proteins will give you things that you think are promising. All gene hit sets are "interesting" at first, and useless when not validated... but that's obvious, no?]<br /><br />Coming up<br />* analysis of next 8 subjects<br />* follow up<br />* sequence more subjects for rare variants<br />* integrated analysis of genome and transcriptome dat to uncover SNV loci underlying differential expression. ("integrating multiple omes")<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6003411715588237958?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-75822247378044359122010-02-26T14:14:00.000-08:002010-02-26T14:15:33.145-08:00AGBT 2010 - Nicole Cloonan - The University of Queensland<span style="font-weight:bold;">Translation-State RNAseq of Human Embryonic Stem Cells using Paired-End Sequencing.</span><br /><br />Intro to Stem Cells<br />* hot topic - potential for cell generating therapies<br />* Self renewable<br />* pluripotent<br />* directable<br />* tractable<br /><br />Looking at Extracellular space network.<br />* molecules that control cell-cell interactions (among others)<br /><br />The "Plurinet"<br />* defines the pluripotent status of the cell<br />* protein-protein interactions<br />(Muller et al, Nature 455:401-505)<br /><br />Transcriptional complexity<br />* 6 transcripts per gene on average<br />* so how does this affect the plurinet<br /><br />SOLiD RNA-Seq<br />* have a pipelien... [too fast]<br />* done SET and PET.<br />* 80% of tags map, 194M 50mers, 114M 25mers<br /><br />Tags that don't map:<br />* LincRNA, intergene, etc...<br /><br />PET.<br />* alternate splicing<br />* works well if you know what the annotations are.<br />* with PET, you can build transcript models if you don't have them already - learn more about alt. splice<br />* can be used for novel exon discovery<br /><br />Chip-Seq from Ku et al<br />* Extended Exons. 3' exon extensions can be very long.<br /><br />[Why is this Chip-Seq?]<br /><br />Do Virtual Northerns<br />* Size fractionations<br />* What you find is that most annotated genes have the right refseq predicted lengths.<br />* however, some are shorter, some are longer<br />* Frequency at which tags from a particular library match predicted (based on refseq) vs from RNA data... You do see that some have very different results.<br /><br />RNA are translated...<br />* if no signal peptide, cytoplasmic (on free ribosomes)<br />* if has signal, then it's translated by ribosomes bound to membranes<br />* use sucrose gradient to separate the two populations<br />* do PET, (35/75bp reads)<br />* compare signals in both fractions - they come up well in the predicted fraction.<br /><br />Novel transcription<br />* membrane associated RNA have very different proportions of extension (mainly long 3' UTRs) than those in the cytoplasmic fraction<br /><br />MiRNA biogenesis and mRNA interactions.<br />* use fractionation to test<br />* RISC associated with polysomes (which works with fractionation)<br />* complexes stay together through fractionation<br />* Long UTRS are enriched for mRNA binding sites<br /><br />Back to Plurinet<br />* Complexity is incredibly increased with the extra products and miRNA<br /><br />Summary:<br />* PET allows you to reconstruct loci level complexity from RNAseq data<br />* Size fractionation is useful<br />* translation state RNAseq allow s the capture of mRNA and miRNA data from polyribosomes<br />* Transcriptional complexity impacts greatly on interactions.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-7582224737804435912?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-63094750779518391932010-02-26T13:47:00.000-08:002010-02-26T13:48:24.720-08:00AGBT 2010 - Jonas Korlach - Pacific Biosciences<span style="font-weight:bold;">Direct Single Molecule, Real Time RNA Sequencing.</span><br /><br />Opportunity to further work with this platform to replace enzyme in ZMW with other enzymes of interest - can observe new functionality.<br /> <br />"Single Molecule Realtime Biology" [SRMB? How do you say that acronym?]<br /><br />Of interest: Reverse Transcription<br /> * replace polymerase with rna polymerase (reverse transcriptase)<br /> * have done this - simple extension tests.<br /> * done kinetic analysis, and the phospho dntps are incorporated well, but MUCH slower (1 order slower) than non-marked nucleotides<br /><br />Tested the system out anyhow.<br /> * Seems to work in principle - albeit it's slow. One dNTP in enzyme is not yet one nucleotide inserverion. <br /><br /> Ribosomal RNA Sequencing.<br /> * Can withold catalytic metal, which allows binding, but not ligation. Thus, you can just watch the flourescnece - and in this case, binding only happens with correct nucleotide.<br /> * can also detect modified RNA bases - eg, Pseudouridine. Can measure binding time - takes longer.<br /><br />Detection of Modified RNA bases<br /> * pauses indicate kinetic changes<br /><br />For viruses, you can get a single enzyme to process the entire genome of a virus - very long read lengths at the tail end of the distribution.<br /><br />HIV reverse transcriptatse translocation dynamics.<br /> * use terminating bases and AIDS drugs - and monitor incorporation and pulses.<br /> * Show graphs of kinetic analysis of P-Sites and N-site<br /> * Can then study binding in the presense of the terminators/drugs.<br /> * Can calculate binding energy from puslses. <br /><br />Summary:<br /> * Demonstrated SMRT RNA sequencing - still room to grow.<br /> * Deomnstrated SMRT Biology - Translation (shown tomorrow) and reverse transcriptase.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6309475077951839193?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-69632979833992724052010-02-26T13:25:00.001-08:002010-02-26T13:32:03.644-08:00AGBT 2010 -Pacfic Biosciences Workshop<span style="font-weight:bold;">"The debut of the 3rd Generation"</span><br /><br />Intro:<br />* Came from the basement of a building in cornell. [what is it with basements on campus?]<br />* technology detects 500 photons per base<br />* Raised $266M in company history<br /><br />History<br />* show slide with first results that launched company - detecting 3 labeled C's, barely<br /><br />"yes, it is big, yes, it is heavy, and yes, it does work"<br />* smallest: $50,000 desktop version<br />* Largest: full human genome in 15 minutes.<br /><br />Already have manufacturing for reagents - and building a facility to construct machines.<br /><br />Steve Turner: Founder, CSO, Board Member<br /><br />Overview<br />1. brief overview of technology<br />2. Update on Collaborations<br />3. Instrument debut<br />4. Applications<br />5. Scalability<br /><br />* Video of polymerase - same one from web.<br /><br />Collaborations:<br />* influenza<br />* cancer transcript<br />* long read progress<br />* strobe seqeuncing for strutral variation<br />* Palustris systems biology [Go palustris!!!]<br />* circular sonsensus sequencing<br />* survey of coverage bias<br />* direct detection of methylation and DNA modification<br /><br />Influenza:<br />* serotyping doesn't give picture - immunologically distict viruses.<br />* Fast Time to result: 9 hours from sample Extraction to sequencing analysis completion.<br />* did not look at consensus call - used single molecule reads.<br />* match single molecules with sequenced refernce genomes of similar influenza.<br />* Turned out that the strain was misidentified - phylogeny was incorrect.<br />* side benefit: in every case, each segment was covered in single reads. Potential for quasi-species studies of viruses.<br /><br />Sequenced MCF-7<br />* known alt. splice forms implicated in tumorigenesis.<br />* Can map entire transcripts (2400bases) in single read.<br />[neat stuff]<br /><br />10,351 base read scrolling... goes on and on.<br />* they see up to 20kb reads.<br /><br />Strobe sequencing<br />* answer to Mate Pairs?<br />* Polymerase is damaged by laser, so reads will continue until damaged<br />* Turn off the light, and the polymerase is unharmed... will continue till you turn the lights back on.<br />* Who needs mate pairs when you can just sequence 10kb at a time?<br />* show repeat lengths - at 20kb, you can sequence most of your repeat regions. - Strobe it as well...<br />* Very useful for assembly.<br /><br />Insertion AC223433 fosmid<br />* can use time as a way to look at insert size.<br /><br />Palustris<br />* 58 contigs from palustris<br />* Hybrid assembly - now have a single contig. (Used Strobe, straight and other tech..)<br /><br />Read Length.<br />* Expect that you can epxand readss to 50-70kb.<br />* demonstrate by haprpin ligation to lambda genome (linear)<br /><br />circular consensus sequencing<br />* make something circular, then go 'round and 'round till you get consensus.<br />* Q40 on single molecules by going over it many times<br /><br />Prep:<br />* results in Low bias for GC content<br />* tested on many organisms<br /><br />modified nuclear bases<br />* look at kinetics of base incorporation<br />* modified nuclear base Methylated Adenosine causes kinetic differences<br />** 6-10x kinetic changes.<br />* Methylated Cytosine - still get a signal<br />* Hydroxymethycytosine: can also see that - also different from other traces<br />* duration and spacing are different for the three bases.<br />* Single base resolution, less than 1% FP, methylation detection on single moleucles<br />* also looked at other modifications - can always tell that it's different.<br />* Polymerase stalls at T-dimers.<br /><br />[Summarized it all]<br /><br />[Insert CEO talk here - wonderful company, wonderful people, "state of the art", hard work.]<br /><br />Unveil worlds first 3rd gneration sequencer<br />* Movie time!<br />* 8 Cells per package - $100 per cell.<br />* SMRT Cell - 96 / tray.<br />* reagent plate (96 well)<br />* each cell works indepenently - in any protocol<br />* Uses CSV files<br />* API to LIMS with designs.<br />* System looks pretty child-proof (though probably not idiot-proof)<br /><br />Monitoring ar run:<br />1. monitor at instrument or remotely<br />2. View real time base incorporations<br />3. remaining runtime<br />4. status of each cell from cell prep to run.<br /><br />Signal to noise ratio is dramatically improved from last year<br /><br />Alignment?<br /><br />Portal:<br />* web based interface<br />* accessible from any computer<br />* automated secondary analysis<br /><br />Reports:<br />* full complement of reports automatically generated<br />* quality files<br />* ....<br /><br />Browswer integrated into viewer.<br /><br />Supports:<br />* BAM/SAM<br />* FastQ<br />* SRA<br />* etc...<br /><br />All in one day.<br />* sample prep to analysis.<br /><br />* methylation sequencing will be released in an update<br />* direct rRNA sequencing.<br /><br />Working towards SMRT Translation<br />* replace Trancription (Polymerase) with translation (ribosome & labeled tRNA....)<br /><br />[ok, didn't see that coming]<br /><br />Scaling of performance over instrument life<br />* current yeild 30% improved to 90%<br />* Multiplex: 80k improves to 160,000<br />* speed 1-3bps improving to 15bps<br /><br />Throughput should pass 2nd generation with this instrument. Expect new instrument in 3 years to blow all of this away.<br /><br />Interpretation of Genomics will require epigenetics, etc etc etc. and much data processing. [Oddly, That's what I tried to convince Complete Genomics people of this morning, without success.]<br /><br />Questions:<br />* Dark Bases? They are not dark bases - they are missed bases. They now have better bases, that bind better than the natural bases. Missed bases are a problem - the nucleotide docks, and if happens too fast, you don't get enough phototons...<br /><br />* Something about algorithms for de novo assembly - check out the posters, and we'll have more information for you.<br /><br />* What is your error rate? [Very agressive question] Single pass error rate is greater than ensemble sequencing. You don't get systematic error in Pac Bio - Approach towards consensus is linear. You know when you see systematic errors - you can catch and repair. Expect Q90 with this technology.<br /><br />* Exponential decay on read lengths.<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-6963297983399272405?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com2tag:blogger.com,1999:blog-23988675.post-44146481143531677852010-02-26T10:40:00.002-08:002010-02-26T10:54:52.529-08:00AGBT 2010 - Elaine Mardis - Washington University School of Medicine<b>Single Molecule Sequencing to Detect and Characterize Somatic Mutations in Cancer Genomes</b><br /><br />[Disclaimer Statement - she is a Pac Bio board member]<br /><br />Why Sequence Whole Genomes?<br />* [same as always - nothing new]<br /><br />Focus on talk today is on point mutations<br /><br />How Current NGS (eg, Illumina) works:<br />* Sequence tumour & normal to 30x,<br />* Compare to reference, then compare tumour to normal, and remove known dbnps sites, etc etc...<br />* Validate SNVs.<br /><br />4 Tier levels.<br />* focus validation on Tier 1 results.<br /><br />Why Validate?<br />* Pipeline is tuned to have a slightly elevated false positive mutation rate so things aren't missed.<br />* Orthogonal validation is important.<br />* Validation is expensive and time consuming, however.<br /><br />Why check for prevalence of mutations?<br />* Each tumour gNA sample consists of the contributions of many tumour cells<br />* digital nature of NGS data allows an estimation of how common each validated mutation is in the tumor cell population<br />* more prevalent mutations are likely "older" - happen earlier in progression.<br /><br />Recurrent SNVs<br />* why? Adding evidence. The ones that happen more often are likely to be earlier in progression and are thus more likely to be drivers. [Not sure I buy that logic, however.]<br /><br />Limitations:<br />* Faster Sequence data generation (analysis is not getting cheaper)<br />* iNcreased validation/prealece data demand (need to decrease cost)<br />* Recurrent mutation screening (site specific vs whole gene)<br /><br />Medical impact:<br />* always want our results to be useful. [Kind of ignoring this part... selling us on the use of sequencing for medical use.]<br /><br />Discussion of AML project, as discussed in last talk.<br />* prognostic IDH1 mutations.<br /><br />[Dr. Mardis' talks always remind me of an infomercial... It has the feel of a commercial presentation, but with data to back it up. It's glossy, the slides are clean, and the presentations feel well rehearsed - something we just don't get much of in science talks.]<br /><br />Insert sales pitch for Pac Bio systems here.<br /><br />[5 slides later... ]<br /><br />three experiments:<br />* first for accuracy<br />* second for sensitivity<br />* third for detection of mutational prevalence<br /><br />Accuracy:<br />* 32 directed PCr products from glioblastoma tumor normal pair<br />* 77% neoplastic cellularity<br />* SMRT sequencing (alpha prototype detector)<br />* Wrote software for SNP detection<br />* 94% of 86 known sites were found<br />* 6 FP and 6FN results<br /><br />* 5 LOH sites were detected properly<br />* All mutations were detected at different confidence levels<br /><br />Sensitivity<br />* used AML genome<br />* 95% population purity<br />* All variants detected at each cellularity...<br /><br />Detection of Mutational Prevalence:<br />* Concordance with Illumina is good - but not great in tier 3 mutations. C to T mutations were slightly biased against.<br /><br />Conclusions<br />* Platform is Ramping up quickly<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-4414648114353167785?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-91744309433812947432010-02-26T06:39:00.003-08:002010-02-26T06:43:32.686-08:00AGBT 2010 - Keynote speaker: James Downing - St. Jude Children's HospitalThe Molecular Pathology of Acute Leukemia<br /><br />Was head of pathology at St. Jude for many years - doing cancer genomics before it was called cancer genomics.<br /><br />First time at AGBT<br />No methodology or technology - focus on biology and clinical relevance. Not going to present NGS data! Using completely outdated technology - and all of it was published in the last 12 months.<br /><br />The cancer he's focused on is the best characterized of all the cancers. <br /><br />What leukemia really is: Proliferating B-cells that rapidly take over the whole body. Highest tumor lode of all the cancers. In his generation, 95% of children died within 12 months of diagnosis. Now have 80-85% cure... but relapses happen in 30%.<br /><br />[Classical diagram of immune system lineage]<br /><br />Mutations in early progenitors generate leukemias. Two types: ALL and AML. They are not homogeneous diseases, however. Distinct biological subtypes are characterized by translocations. - They contribute to the leukemia: Necessary, but not sufficient.<br /><br />What are the biologic processes that need to be altered to generate leukemia:<br />1. Alteration in self-renewal capacity - need to become "immortal" (unlimited self-renewal) (eg AML1-ETO)<br />2. Need to have an altered response to growth signals - contnued growth (eg. BCR-ABL1)<br />3. Block in apoptosis (eg PML-RAR alpha)<br />4. Block in differentiation<br /><br />Doing "routine molecular diagnosis":<br /> * CNV, expression, etc<br /> * Use Affy Chips<br /><br />What have they found? (using 242 diagnostic ALLs with matched germ line DNA.)<br /> * there are a small number of copy number changes per casee... vary markedly across the different subtypes. (eg, MLL: ~1, other has ~11)<br /> * more Deletions that Amplifications<br /> * 60% of b lineage all have a genetic lesion in a gene regulating B-cell differentiantion (PAC5, Ikaros, EVF, LDF1, BNK)<br /><br />PAX5 deletions most common.<br /> * 10 exons... <br /> * Half of deletions deleted half of the genes<br /> * Others delete required domains<br /> * some were homozygous, but not all.<br /> * Lots of fusions with this gene occurs as well.<br /> * Point mutations were also seen in binding domains... <br /><br />[Ok, so this gene can be deleted in many ways... got it. The cells find ways to kill off this gene.] <br /> <br />Haploinsufficiency in PAX5 deficient mice<br /> * Was not sufficient to cause lymphoma.<br /> * cooperates with BCR-ABL1 to cause lymphoma. (Mouse Model)<br /> * strong driving pressure for diabling the b-cell differentiation genes in Leukemia.<br /><br />60% of B-progenitors ALL have Mutations in B-cell regulatory Genes<br /><br />Look at Ikaros<br /> * entire literature about altered isoforms. <br /> * saw a high frequency of mutations in BCR-ABL1 ALL,<br /> * 85% of BCR-ABL ALL have deletions of Ikaros: Almost never see the deletions in Ikaros.<br /> * mapping deletions of Ikaros: Some are complete, but there is a subset of deletions that commonly knock out all 4 zinc fingers (exons 3-6). <br /> * Never see Ikaros "isoforms" without these deleitons. There probably are no isoforms - it's always genetic lesions.<br /> * Deletions typically happen within a few bases of each other - result from aberrant RAG-mediated recombinations.<br /><br />Start putting the lesions together. [Nice lists of genes for each of the 3 pathways]<br /><br />Clinical relevance:<br /> * looking for markers in a new cohort. Remove two types of ALL (BCR-ABL1 + infant), look at 221 samples: Are there new markers?<br /> * Yes, it was Ikaros: 75% of relapse if you have Ikaros deletions.<br /><br />Compare BCR-ABL1- and Ikaros- (Bad outcome) with BCR-ABL1+ ALL (Also has Ikaros deletions)<br /> * Significant expression similarity<br /> * Look at the Kinases: JAK family, which have a high rate of mutations in ALLs.<br /><br />JAK mutations: <br /> * not seen in other types of cancers - unique to JH2 domain, clustering in a single spot. (R683)<br /> * Turns out that high risk ALL have JAK deletions.<br /><br />CRLF2 = TSLPR, IL-7/IL07R<br /> * Over expression of this receptor (compensating for Jak Mutations and lack of signaling), combine to cause a proliferative signal. [I didn't get everything here.]<br /><br />Looking at high risk again:<br /> * Ikaros deletions<br /> * Jak Mutations<br /> * CRLF2 (cytokine receptor mutations)<br /><br />What other kinases are activated in this subset of patients? <br /> * Work in progress<br /> * quick review of other genes they're now finding... [too fast to get that down.]<br /><br />Genetic Alterations Acquired at Relapse<br /> * Relapsing is only 20% blast population.<br /> * Need to Flow sort.<br /> * CDKN2A/B mutations<br /> * [list of genes, including ikaros... ]<br /> * No common mechanism of relapse - variety of pathways<br /> * Varieties do not include drug target mutations. It's always in signalling, etc.<br /> * 7% of relapse is "unrelated" (secondary leukemia)<br /> * 8% same as diagnosis<br /> * 34% clonal evolution from diagnosis<br /> * 51% clonal evolution from pre-leukemic clone<br /><br />Summary: <br /> * small number of variation<br /> * Ikaros mutations<br /> * Aberant RAG-mdeidated recombination<br /> * JAK mutations<br /> * ...<br /><br />This disease "begs for NGS" - Get a complete picture of what's going on.<br /> * Collaborating with WashU. (Mardis, Wilson, Ley)<br /> * Doing the "Bad" leukemias (infant, high risk, CBF)<br /> * also doing brain and solid tumours (neuroblastoma osteosarcoma, retinoblastoma)<br /> * Started Feb 1st - already have 5 genomes and matched normals.<br /> * over $50M invested in this project<div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-9174430943381294743?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-70550260958893716022010-02-25T18:32:00.001-08:002010-02-25T18:32:49.604-08:00AGBT 2010 - Ogan Abaan - NIH/NCI<div><b>Identification of novel cancer mutations in sarcomas</b></div><div><br /></div><div>Sarcomas: two categories </div><div> * simple genetic changes (eg. Ewings)</div><div> * Complex genetic changes (eg, osteosarcomas)</div><div> </div><div>Soft tissue sarcomas in general:</div><div> * rare</div><div> * high mastastasis.</div><div> * connective tissue origin.</div><div> * 50 subgroups - most have unknown biology</div><div> </div><div>Tumour samples from 24 soft tissue sarcoma patents</div><div> * matched normals will be sequenced when available at some point in the future.</div><div><br /></div><div>Target: </div><div> * 15k exons from 1334 genes </div><div> * used "in-solution" capture method.</div><div> * 33.5k -150mers</div><div> * no repeat masking. </div><div> * biotinylated baits</div><div><br /></div><div>Used Eland - and used GAII or GAIIx, as available - mixed read lengths</div><div><br /></div><div>Custom python scripts - wrote them himself. Still a work in progress.</div><div><br /></div><div>Variant Calling is VERY simple. Uses Phred score based approach, adjusted by error rate at that position.</div><div><br /></div><div>Did the standard: filter on dbsnp130, annotate on UCSC refGene and Visual confirmation (IGV Browser)</div><div><br /></div><div>Shows stats - they don't look great, but they seem similar to those published in Tewhey et al (Genome Biol 2009). [Shown to justify low rates?]</div><div><br /></div><div>Some optimization could be done to get more coverage.</div><div> * gets 23-46% at greater than or equal to 10x, paper gets 88% or more at 7x</div><div><br /></div><div>6 of variants are known in COSMIC db.</div><div><br /></div><div>KEGG pathway: Many mismatch repair... [actually, this is the usual set you'd see with any cancer sample. Nothing sticks out.]</div><div><br /></div><div>Conclusion: </div><div> * 305 variants, no common variants.</div><div><br /></div><div>Future: </div><div> * increase sample size. </div><div> * pathway analysis</div><div> * Understand biology</div><div><br /></div><div>[Not the most impressive talk - I could give the same talk on my cell lines, and would have roughly the same results.... nothing particularly interesting.]</div><div><br /></div><div> </div><div><br /></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-7055026095889371602?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-25883452225516730992010-02-25T18:17:00.001-08:002010-02-25T18:17:27.765-08:00AGBT 2010 - Ian Bosdet - BC Gancer Agency<div><b>Mutational Profiling of Pre and Post-Treatement Lung Tumors Using Whole-Transcriptome Sequencing and Targeted Sequence Capture</b></div><div><br /></div><div>EGF receptor is often mutated (non-small cell)</div><div> * some tyrosine kinase inhibitors exist, but response is variable.</div><div> * clinical characteristics known to be associated with response was used as primary criterial for recruitment</div><div><br /></div><div>Identifying patients that are likely to bbenefit from TKI therapy can have a significant impact on overall survival. </div><div> * cells become addicted to the rampant signalling from TK. Cutting it off can kill them</div><div>* often a mutation that can dampen or negate result of drug.</div><div><br /></div><div>All cancers used were first line.</div><div> * non-smoker</div><div> * female & asian </div><div> * stage IIIb</div><div> * NSCLC 1st line.</div><div><br /></div><div>Majority of patients have now progressed - and encouraged to donate 2nd biopsy.</div><div><br /></div><div>65 patients over 2 years,</div><div>goal: non progression over 8 weeks.</div><div> 80% did not progress in 8 weeks.</div><div> * 23 partial response, </div><div> * 24 stable disease</div><div><br /></div><div>30 tumours selected for RNA sequencing</div><div> * 13 responders, 14 non-responders </div><div> * 3 progression tumours</div><div> * gene expression analysis and mutation discovery</div><div> * some correlation to clinical characteristics.</div><div><br /></div><div> One gene correlated with EGFR sensitivity mutations.</div><div> Another seemed to correlated to smokers who did not respond: IER5L</div><div><br /></div><div>Excess unaligned reads were aligned to virus transcripts - Highly enriched for Epistein-Barr Virus. Tumour ended up being re-classified.</div><div><br /></div><div>3 patients then sequenced with Capture:</div><div> * Used Agilent (47,558 baits)</div><div> * Normal, pre-treatment and post-treatment tumour samples</div><div> * can be used to identify small deletions</div><div> * Putative somatic mutations resulting in significant amino-acid alterations were identified using SNVMix</div><div> * Mutations similar between patients were not observed, but pre-treatment tumour pairs show significant overlap.</div><div><br /></div><div>[Talking about putative somatic mutations.... I got ripped into for doing the exact same analysis and calling the same mutations "most likely" somatic 2 weeks ago... DOH.]</div><div><br /></div><div>Sumary:</div><div> * clinical selection of patients can greatly enhance incidence of EGFR and mutations and response to erlotnib at 8-weeks</div><div>* EGFR mutation status is a good but imperfect predictor of patient response</div><div> * mutation discovery in treatment naive lung tumours has identified a relatively small number of mutations (need validation(</div><div> * more progressions will be analyzed.</div><div><br /></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-2588345222551673099?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0tag:blogger.com,1999:blog-23988675.post-48517170946988963092010-02-25T17:59:00.001-08:002010-02-25T18:01:04.204-08:00AGBT 2010 - Daniel MacArthur - Welcome Trust Sanger Institute<div><span class="Apple-style-span" style="font-weight: bold; ">Loss-of_Function Mutations in Healthy Human Genomes: Implications for Clinical Genome Sequencing</span></div><div><br /></div><div>[Missed the firsts couple minutes?]</div><div> </div><div>Analysis of 1000 genomes data.</div><div><br /></div><div> Loss of Function sub-group</div><div> Aim: create a catalogue of variants predicted to result in severe disruption of gene function</div><div> </div><div>What is a LOF variant: [annotation based on GENCODE v3lb]</div><div> 1. stop codon SNPs</div><div> 2. splice disruption SNPs</div><div> 3. frame shift indels</div><div> 4. disruptive structural variants. (eg. loss of exons, loss of start codons...)</div><div><br /></div><div>LOF variants:</div><div> * enriched for: </div><div> ** severe recessive mutations</div><div> ** other variants with functional effects</div><div> ** neutral variatns in redundant genes/pseudogenes</div><div> ** Sequencing and annotation arefacts</div><div><br /></div><div>Many of these will be neutral.</div><div><br /></div><div>3 pilots.</div><div> * total of 1,6556 unique genes affected.</div><div> * that is to say that a substantial portion of the genome has LOF variants</div><div> * acknowledging that there are errors, that's still a lot. (=</div><div><br /></div><div>Disrupted genes per individual. Visible difference between European vs. Yoruba. (Africans have higher variability)</div><div><br /></div><div>Structural variants seem relatively constant, splicing seems constant, stops seem to vary most. (CEU, CHB, JPT, YRI) [I'm eyeballing]</div><div><br /></div><div>Expect to se some carriers for recessive disease mutations</div><div> * Several likely carrier mutations identified. [didn't catch them]</div><div><br /></div><div>Derived allele frequency spectra.</div><div> * stop and splice are heavily shifted to the low end (0.05+)</div><div> </div><div>LOF sites are enriched for artefacts</div><div> * Conserved region have less polymorphisms, but equal amount of error.</div><div> * Non-conserved have more polymorphisms, and equal error:</div><div> ** thus tends to increase artefact rate in conserved regions.</div><div><br /></div><div>LOF clustering points to mapping and annotation arefacts</div><div> * 91% of LOF carying genes contain only one LOF variant.</div><div> * there are some genes that are enriched for multiple independent LOF variants.</div><div> ** many of them are CNV, seg dup, close paralogues.... which means that they're artefacts too.</div><div> * other annotation artefacts exist too... LOFs are making them stand out.</div><div><br /></div><div>Beyond cataloging:</div><div> * large scale sequencing studies tend to produce many potential LOF candidates</div><div> * discriminate between disease causing and benign variations.</div><div> * is there a functional profile distinguishing recessive and LOF-tolerant genes?</div><div><br /></div><div>Compare LOF-tolerant genes (& non-OR) to 725 recessive disease genes from OMIM. (Early results)</div><div> * use it to do classification</div><div> * linear discriminant analysis</div><div><br /></div><div>[Kind of feels like a fast drive-by-blogging... my notes really didn't do justice to Daniel's explanations - i just managed to get down some of the points.]</div><div><br /></div><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/23988675-4851717094698896309?l=www.fejes.ca%2Findex.php' alt='' /></div>Anthony Fejesapfejes@gmail.com0