Hacker Perspective: Genes as Technology (Information Coding)
This document expresses some thoughts which have been rolling around my mind since 1980 when I rekindled my interest in biology.
At that time, I was a computer technologist who had recently decided that a career in software design might offer more stability
than one in hardware maintenance. That same year I stumbled upon "The Eighth Day of Creation" in a Toronto book store when
something clicked in my head. This web-page will be successful only if I can light a similar spark in at least one other person.
Enjoy!
Neil Rieck (1998-01-23)
Computer Information Technology
- Notation
- Probably because humans have 10 fingers, our most common numbering system is based upon base-10 which employs ten symbols
(0 to 9). When we require more than ten symbols, we left-shift, then reuse.
- Computers represent information in base-2 notation (also known as binary digit or bit notation) where a single bit can
represent two binary states:
- 0 usually represents OFF (either "no current" or "no voltage")
- 1 usually represents ON (either "some specified current" or "some specified voltage")
- Base-2 means there are only two symbols. When we need a value called 2, we set the current bit to zero then carry a one to
the next location (a.k.a. binary "10" means decimal "2")
- Storage
- When stored in dynamic memory,
one transistor (not counting addressing, refresh, or buffer circuits) is required to store one bit (two states: on and off).
- When stored in static memory, six
transistors (not counting addressing or buffer circuits) are required to store one bit (two states: on and off).
- When stored in EEPROM memory, one floating gate (not
counting addressing, programming, or buffer circuits) is required to store one bit (two states: on and off).
- In Harvard processor architectures,
instructions are stored in one memory while data is stored in another.
- In Von Neumann processor architectures,
instructions and data are mixed in the same single memory.
- so three binary bits (each capable of 2 states, on and off) can represent 8 decimal numbers (2x2x2=8) with values of 0 to
7.
data bits |
equivalent decimal value |
000 (all off) |
0 |
001 |
1 |
010 |
2 |
011 |
3 |
100 |
4 |
101 |
5 |
110 |
6 |
111 (all on) |
7 |
- When stored outside of the computer, binary data can be represented by:
- a hole (or not) on paper tape
- a hole (or not) on punched cards
- a magnetic North or South on magnetic tape or disk
- a magnetic flux reversal on magnetic tape or disk (phase encoding)
- a pit (or not) on a manufactured CR-ROM or DVD
- etc.
- Data Grouping
- 08 bits are grouped to form a byte (also known as a binary
term)
- 16 bits are grouped to form a word (2 bytes)
- 32 bits are grouped to form a long word (2 words or 4 bytes)
- 64 bits are grouped to form a quad word (4 words or 8 bytes)
- 80 bits are grouped to form IEEE floating point numbers
- Instruction Grouping
- This depends on the processor. Some simple appliance (eg. microwave ovens) CPUs only require 4-bits to represent a single
instruction like MOVE, ADD, or STORE
- Most 8-bit, 16-bit, 32-bit, and 64-bit processors support instructions as small as 8-bits.
- Because the first memory systems were expensive, it made more sense to do as much as possible with each instruction. These
systems were classified as CISC
(Complex Instruction Set Computers). As memory became cheaper engineers decided they could do more things in parallel and
out of order only if the instruction set was simplified. So a new technology known as RISC (Reduced Instruction Set Computers) was developed which can only really be described as "Relegate
Important Stuff to the Compiler"
- Newer CPUs support VLIW (Very Long
Instruction Words) technology which is referred to by some as "Variable Length Instruction Words"
- Some instruction addressing modes will increase the length of basic instructions by including:
- data (immediate mode)
- address (absolute mode)
- pointer (indirect mode)
Summary: This technology is based upon the simplicity of bits and bytes which many people take for granted after
seeing binary demonstrations using everything from ping pong balls to light bulbs. In order for this simple representation to
work, very complicated circuitry is required to provide memory, storage, and instruction processing.
Points:
- storing a bit inside either 1 transistor (dynamic memory) or 6 transistors (static memory) seems simple but overlooks the
complex technology behind the semiconductor industry. For example; will a binary 1 be represented as 5 volts, 15 volts, or a
stored charge? How will the transistors be connected to each other? How will they be connected to the outside world?
- retrieving data from memory sounds simple enough, but grouping thousands to millions of transistors into addressing
circuits in order to return only the desired data, is easier said than done.
- executing instructions retrieved from memory also sounds simple enough, but grouping millions to billions of transistors
into instruction processing circuits is very difficult.
Genetic Information Technology
- Notation
- All biology on Earth represents genetic information in a base-4 notation known as DNA (DeoxyriboNucleic Acid).
- Storage
- This chemical data format looks like a twisted ladder where the rails are composed of phosphorous and sugars (ribose)
while the rungs are made up of complementary base sequences. This twisted appearance is why DNA is called the double helix.
- Complimentary base sequences:
- purine (larger base molecule; a double ring,
nitrogen containing base)
- pyrimidine (smaller base molecule; a single
ring, nitrogen containing base)
- because of space restrictions between the rails of the ladder, a purine is always joined to a pyrimidine on each rung
- Adenine on one rail always connects to Thymine on the opposite rail
- Cytosine on one rail always connects to Guanine on the other rail
- Data Grouping
- Unknown; but data seems to be embedded with instructions inside the DNA. In this respect, DNA seems to be similar to the
Von Neumann processor architecture mentioned above where code and data reside in the same object.
- Instruction Grouping
- Three "base sequences" are known as a codon
- Since each base position could (in theory) have one out of four different base sequences, one codon could (in theory)
represent 64 different values. (4x4x4=64)
- a variable group of codons (depends on the information) represents a gene
- Instruction Processing
- This depends on the processor:
- Protein Synthesis:
- In a eukaryote (a cell with a nucleus), DNA is found only inside the nucleus. During protein synthesis,
transcription enzymes copy small segments of the DNA to produce a molecule called mRNA
(a.k.a. Messenger RiboNucleic Acid)
- During DNA transcription, when ever the transcription enzymes encounter Adenine, Guanine, or Cytosine
in the DNA source, the same base chemical is put into the destination RNA. However, when Thymine is
encountered in DNA, the base chemical Uracil is written into the destination RNA molecule.
- When DNA-RNA transcription is complete, the mRNA molecule is transported from cell's nucleus into the cell's main
body to be processed by an organelle known as a ribosome.
- when a 3 base codon is read, it specifies to the ribosome which amino acid to use next (amino Acids are the pearls
which define the protein necklace). Using the following "Genetic Code Table" (which is
not used by mitochondria), we can see that the codon sequence of UGG specifies the symbol trp
which represents the amino acid tryptophan (see the "Amino Acid Symbol Table" further down). We can also see
that codons UAA, UAG and UGA all specify the punctuation symbol stop. The ribosome employs
a molecule called Transfer RNA (a.k.a. tRNA) to deliver amino acids to the site of protein
assembly.
Genetic Code vs. Mitochondrial Code"
Genetic Code (Codon Translation) |
Mitochondrial Genetic Code
yellow = differences from left-hand table |
1st |
2nd |
3rd |
---U--- |
---C--- |
---A--- |
---G--- |
U |
phe
phe
leu
leu |
ser
ser
ser
ser |
tyr
tyr
stop 2
stop 1 |
cys
cys
stop 3
trp |
U
C
A
G |
C |
leu
leu
leu
leu |
pro
pro
pro
pro |
his
his
gln
gln |
arg
arg
arg
arg |
U
C
A
G |
A |
ile
ile
ile
met/start 4 |
thr
thr
thr
thr |
asn
asn
lys
lys |
ser
ser
arg
arg |
U
C
A
G |
G |
val
val
val
val |
ala
ala
ala
ala |
asp
asp
glu
glu |
gly
gly
gly
gly |
U
C
A
G |
|
1st |
2nd |
3rd |
---U--- |
---C--- |
---A--- |
---G--- |
U |
phe
phe
leu
leu |
ser
ser
ser
ser |
tyr
tyr
stop
stop |
cys
cys
trp
trp |
U
C
A
G |
C |
leu
leu
leu
leu |
pro
pro
pro
pro |
his
his
gln
gln |
arg
arg
arg
arg |
U
C
A
G |
A |
ile
ile
met
met |
thr
thr
thr
thr |
asn
asn
lys
lys |
ser
ser
stop
stop |
U
C
A
G |
G |
val
val
val
val |
ala
ala
ala
ala |
asp
asp
glu
glu |
gly
gly
gly
gly |
U
C
A
G |
|
Superscripts:
- Listed as "nonsense code 1 (amber)" in book 1
- Listed as "nonsense code 2 (ochre)" in book 1
- Listed as "nonsense code 3" in book 1
- Listed as "start" in book 2
References:
- page 489, "The Eighth Day of Creation"
by Horace Judson (1980 edition, Touchstone
paperback; a much newer edition is available)
- page 61, "Genethics - The Ethics of
Engineering Life" by David Suzuki
(1988 hardcover edition)
|
References:
- page 69, "Unraveling DNA - the most
important molecule of life"
by Maxim D. Frank-Kamenetskii
(1997 paperback edition)
|
Amino Acid Symbol Table (for codon tables above)
Symbol |
Amino Acid (or Function) |
ala |
alanine |
asn |
asparagine |
asp |
aspartic acid |
arg |
arginine |
cys |
cysteine |
gln |
glutamine |
gly |
glycine |
glu |
glutamic acid |
his |
histine |
ile |
isoleucine |
leu |
leucine |
lys |
lysine |
met |
methionine (and/or punctuation = start) |
phe |
phenylalanine |
pro |
proline |
ser |
serine |
thr |
threonine |
trp |
tryptophan |
tyr |
tyrosine |
val |
valine |
stop |
punctuation = stop (stop protein synthesis) |
Summary: Even though this technology is based upon the simplicity of base 4 math, instruction storage (DNA),
instruction fetching (DNA to RNA transcription), and instruction execution (RNA to protein synthesis in the ribosome) make
this information technology much more complicated than it would first seem.
Points:
- notice that the coded information in the DNA doesn't build up protein from scratch elements, it specifies amino acids
which are already very complicated molecular structures.
- enzymes are catalytic proteins that assist all processes some of which include; food digestion, growth, repair,
transcription, and replication. These enzymes must be manufactured ahead of time before other processes can begin. This begs
a which-came-first question; enzymes (which are protein) or protein synthesis? Is it possible that some simple proteins,
like enzymes, can be built by some other process so this whole thing can boot strap itself? Is it possible that some
proteins can be built by reading the DNA directly?
- Rats can manufacture all 20 specified amino acids required for protein synthesis. Humans can only manufacture 12 amino
acids which means that the missing 8 must come from our diet.
- The CPU-like machine called the ribosome had to be manufactured some how. So how and when? In my opinion, grouping
millions to billions of transistors into instruction processing circuits might be child's play compared to building one of
these.
- Wikipedia Links:
- Ask anyone today "who cracked the genetic code?" and you will hear the names James Watson and Francis Crick. While it is true that these are two names of three to receive a Nobel Prize for their
work in this area, they only discovered how information was encoded in nucleic acids. Marshall Nirenberg and Heinrich
J. Matthaei are the two scientists credited with cracking the Genetic
Code
Comparative Technology
Hey, if comparative anatomy is allowed then why not this?
- Computers
- a bit represents the known state of a transistor or signal
- a group of bits represents either an instruction for the CPU, or data
- bits might be simple, CPUs are not
- instructions usually run forward for a time. However, some instructions tell the computer to conditionally branch (or
jump) forward or backward to memory addresses which contain alternative segments of the program. This is what gives them
their decision making properties.
- Examples:
1. IF condition THEN jump there
2. IF condition 1 OR condition 2 THEN jump there
3. IF condition 1 AND condition 2 THEN jump there
- Genes
- one base pair of DNA represents 1/3 of a codon
- a codon represents the desired amino acid instruction for a ribosome protein assembler
- base pairs might be simple, ribosomes are not.
- could it be that the presence of a certain amount of manufactured enzyme acts like a conditional test (GOTO) which then
causes a different part of the DNA to be enabled and then transcribed? This would be the basis of a mechanism where
different routines and subroutines are conditionally enabled as the cell lives (program executes)
- to build on this idea further, it probably is true that certain hormones can enable or disable the expression of
genes. It now looks like certain elements in our environment (including food) may enhance or suppress these hormones. In
some instances it may be possible that certain substances may act upon the DNA directly (environmentally induced
cancer?)
- could it be that in the case of cancer, that a lung cell makes a conditional branch error and starts executing a routine
that belongs to a quickly dividing and functionally immortal epithelial cell? There are rare forms of (dermoid) cancer where
teeth, hair, and fully formed fingers, are found inside tumor inside the body.
Hmm... I wonder...
- Man is a base 2 programmer (and we still have to reboot Windows 95 systems a couple of times a day)
- The creator of this realm appears to be a base 4 programmer.
- While there does seem to be value in bacteria (nitrogen fixers etc.), I don't know of any value attached to viruses (Measles,
Small Pox, Influenza) other than evolutionary pressure.
End Notes
- When I refer to Genetic IT (Information Technology), I am not referring to that branch of computer technology known as Genetic
programming or Genetic algorithms. What I am referring to is Genetic Biology, which I consider to be the ultimate technology.
- When I refer to ribosomes as CPU's I am not claiming that ribosomes can add and subtract like silicon CPU's. I only claim that
there seems to be a lot of similarities between computing technology and protein synthesis. Now if we could find the spot in DNA
where brain morphology is defined, then we would have to really give this genetic technology idea some further consideration
- Some of this stuff is continued here: genes as technology #2
- In 2004 I discovered a very cool online resource called
https://en.wikipedia.org which I have begun to reference in various spots on this page.
Wikipedia Links
Other Links
Protein Data Bank Links
Local Links
Back to
Home
Neil Rieck
Waterloo, Ontario, Canada.