Bioinformatics at Work Again!

Genetic Disease Research Project

 

This time we are going to research genetic diseases using bioinformatic websites.

Before you begin, check out the FAQS site to ensure your understanding of chromosomes, genes, and disease. 

1.  Choosing your Disease

First go to the website Human Genome Landmarks provided by the Department of Energy.

Just for interest sake, put your arrow over several of the chromosomes shown here.  Note how many bases are found in the different chromosomes.  Now choose a chromosome. Double click on that chromosome.  When it appears follow the directions to zoom in so you may read the many genes found within that chromosome.  You may scroll up and down the chromosome by a right click and hold.   Other possibilities:  Genetics Home Reference, Genes & diseases, Your Genes Your Health.

Choose a disease that you would like to research.  (You may want to bookmark this page in case you need to come back to it for additional information). Warning!  Warning! Choosing the disease is a crucial factor.  Some will be easy and some very hard.  Look carefully at the allelic variants before you make your choice The following are diseases that are known to be extremely hard:  atherosclerosis, dyslexia, Gaucher Disease, Coagulation Factor 5, Rickets (PHEX), Ehlers Danlos Syndrome (COL3A1), and schizophrenia!

  1. Go to OMIM (Online Mendelian Inheritance in Man)

  2. Choose the limits tab.

  3. In the Only records with: check the box next to Allelic Variant

  4. In the top where it says "search OMIM for", type in the disease.

  1. Hit the Go button

  2. Select a gene by clicking on the OMIM number (ie. 600279) for the disease you are interested in. The disease may be listed by the disease name or a protein associated with it.
  3. Check over the OMIM page and look at the allelic variants. To find the allelic variants use the "Table of Contents" on the right and choose "allelic variants".
  4. On these pages you may be able to find the mode of inheritance (autosomal dominance, sex-linked recessive, translocation, etc.)  Hopefully, you will also be able to find the DNA or protein changes (allelic variants) that must occur to cause this disease.  If you don't, you may want to go back and choose another OMIM number for that disease (step 6) or you may have to reconsider and find a different disease.

Be careful. 

Allelic variants are CRUCIAL.  Scroll down the page to find these, or  use the "Table of Contents" on the right and choose "allelic variants". Make sure the choices will be doable.  For Example this disease is a GREAT choice:

This disease is a TERRIBLE choice:

Other buttons will give you additional information.  Play around.

Here are some important details that are found on the OMIM page.

2.  Finding the Disease Gene Location

As stated the OMIM number will provide you with lots of information.   One bit of info is the "gene map locus".  22p16 means that this gene is on chromosome 22 on the upper arm (p) a certain distance from the centromere.  "q" means it is on the lower arm.  

3.  Finding Inheritance.  Having trouble with how the disease is inherited?  Try Orphanet.

4.  Finding Amino Acid and DNA Sequences

On the right of the OMIM page, click on  NCBI RefSeq under DNA.  It may give choices.  Look for a choice that is for mRNA.  This will lead to information about the wild type DNA and protein. Select a choice and click on the link.  Choose a link that has an NM number.  For example

Follow your link and scroll down.  Closer to the bottom of the page will be an amino acid sequence like this: 

/translation="MAAAEEGCSVGAEADRELEELLESALDDFDKAKPSPAPPSTTTA
                     PDASGPQKRSPGDTAKDALFASQEKFFQELFDSELASQATAEFEKAMKELAEEEPHLV
                     EQFQKLSEAAGRVGSDMTSQQEFTSCLKETLSGLAKNATDLQNSSMSEEELTKAMEGL
                     GMDEGDGEGNILPIMQSIMQNLLSKDVLYPSLKEITEKYPEWLQSHRESLPPEQFEKY
                     QEQHSVMCKICEQFEAETPTDSETTQKARFEMVLDLMQQLQDLGHPPKELAGEMPPGL
                     NFDLDALNLSGPPGASGEQCLIM"

 

Check to see if it matches your allelic variants.  For example, if a variant is for GLN79TER (Glutamine #79 becomes a termination codon).  Glutamine is represented by Q, so check to see if amino acid #79 is Q.  If it is, you’ve found your protein!  If not return to the sequence list and choose another sequence with a different NM number, and check it out.

Once you have your protein, copy and paste the sequence in your worksheet.  This sequence is called the Wild Type which means is for the Healthy protein. Be sure to copy and paste the page source too!

 

The DNA sequence is further down the page.  This gives a nicely numbered format, but may not be an easy match for allelic variants.    Go back to the amino acid sequence. Above it,  look for the CCDS hyperlink.  Go to this new page.

 

   
 

COOL!  This is really neat.  This will show both your Wild type mRNA and protein sequence.  If you mouse over the nucleotide or protein sequence and then click on the highlighted codon or amino acid, it will highlight the corresponding codon or amino acid. Copy and paste the nucleotide sequence in your worksheet.  This sequence is called the Wild Type which means is for the Healthy mRNA. Be sure to copy and paste the page source too!a

 

Want a numbered sequence of your protein?  Go to the OMIM home page. Change the search to Protein and put in your OMIM symbol.

Select the protein and you will go to a page with your protein that looks like this. Make sure it matches the one from your DNA page.

 

 

 

Amino Acid Abbreviations

 amino

acid

 letter code

single letter code

 

 amino

acid

three letter code

single letter code

alanine

Ala

A

 

leucine

Leu

L

arginine

Arg

R

 

lysine

Lys

K

asparagine

Asn

N

 

methionine

Met

M

aspartic acid

Asp

D

 

phenylalanine

Phe

F

cysteine

Cys

C

 

proline

Pro

P

glutamic acid

Glu

E

 

serine

Ser

S

glutamine

Gln

Q

 

threonine

Thr

T

glycine

Gly

G

 

tryptophan

Trp

W

histidine

His

H

 

tyrosine

Tyr

Y

isoleucine

Ile

I

 

valine

Val

V

5.  Color coding the Amino Acid and DNA Sequences

Choose the allelic variants that you want to display.  The sequences must be numbered and in Courier New font.  If your disease has 4 or more, you MUST do 4!  For each variant choose a color to use to color code your DNA and Amino Acid sequence.  Simply highlight the base pair(s) or amino acid(s) that are affected.  At the bottom of the page tell what happen.  For example "Guanine 90 was changed to Thymine" and then under the amino acids "Arginine 30 was changed to Proline".  Use a different color for each variant.  The protein numbers should work out exactly correct, but the DNA may not due to whether or not the researcher put in the promotor, etc.  If for example there is not a thymine at 3685 when OMIM says there should be, you may have to check other mRNA variants if they existed.  Once again,  variant 1 will most likely work the best.  If those don't work, go to the ccds page as described in step 4.  Highlight the amino acid that changed and the page will show you where the DNA is changed.

Don't want to number the DNA?  Biology Student Workbench will do it for you.  Log in to BSW.  Create a new session.  Go to nucleic tools.  ADD new sequence.  Then VIEW sequence, but change format to GENBANK. 

6 Write an Introduction to your disease

Your introduction is just an overview or short summary of the disease.   It must include a summary of the disease that includes essential information on the symptoms, onset, and prognosis of the chosen disease.  Also included on the index page are necessary details about the means of inheritance, the location of the gene, and the affected protein with a brief description of its function.  OMIM is difficult to understand for some of these details.  Try other reliable sources such as  Yahoo, Genetics Home Reference, (especially good for protein function if you type in OMIM symbol) or DMOZ.  Having trouble with how the disease is inherited?  Try Orphanet.

7 Write a conclusion to your disease.

Your concluding paragraph must explain the change(s) that occurred to both the DNA and the Protein.  If applicable, use terms like base substitution, insertion and deletion for DNA and missense mutation, nonsense mutation, or frameshift for proteins. Generalize about the types of mutations and the effects upon the protein. On this page, do NOT give specific changes! (ie. at 2571 g was changed to a)

8 Creating your website.

Follow the Grading Rubric for the requirements on the website.  SEVERAL THINGS ARE CRITICAL!  Check this page to make sure you are setting things up correctly.  Everything must be in a folder named after your disease. 

  Also, the first page must be saved as "index" (only "index"  do not add anything else to its names) in this folder.  All pictures must be sourced immediately below the picture.  Sources for the written material for each page must occur on that page.  If a direct quote is made, it must be immediately after the quote.  Your OMIM page and the DNA/protein page must be cited.  Just writing OMIM, google images, wikipedia, webMD, etc are not acceptable.  The direct link to the page must be given.  For example, http://omim.org/entry/600279   is the OMIM page for Zellweger Syndrome PEX19 and http://www.ncbi.nlm.nih.gov/nuccore/NM_002857.3    is the direct link to the DNA/protein page.

**********************************************************************************************************************************************************************  

Your final product should include:

A webpage that consists of the following:

                       Plagiarism is a punishable offense.  Anyone caught copying will receive a 0 on the project.

All sources, including the direct OMIM page, must be cited on the page that material is referenced

Quoted material must use quotations & be cited directly after the quote.

Check out the Grading Rubric for this project and use this to make sure you have the necessary components.

GOOD LUCK!