Saúl Gausin: Teoría de la información y métodos de codificación

Mostrando entradas con la etiqueta Teoría de la información y métodos de codificación. Mostrar todas las entradas

miércoles, 29 de mayo de 2013

Image Compression

In this entry I'm gonna talk about image compression using Huffman
method to compress the image. Huffman is a method that makes a three
keywords where each keyword has a value that have less weight than the
one that is replaced, and as you get more large the three the high
frequencies are more weight than the ones with high frequencies.
Below we can see an example of how does it work.

In the program that I made you need to put a keyword "comprimir" to
select that you wanna compress a file or "descomprimir" to select that
you wanna descompress a file, after you press enter you are gonna put
you're file name once that you put it, the compression or
decompression begins.

When you compress a file with my program you are gonna see in the
terminal a dictionary of frequencies of the values followed by a
binary file created with the Huffman's algorithm and then the time
of execution.

The flow for compress is the next one:

And the flow for decompress is the follow one:

All this happen using the pixel value of each coordinate of the image,
and then using the flow that I mencioned before we can get the image
compressed or decompress (once is compressed).

A good way to examine the performance is saving the time of execution
and the size of the file that print out my program. To see how the performance
is going I recommend a linear graph.

The program just is working for compression, I had problems to decompress
the file but the logic pretty much the same as previous programs that I made
about using Huffman's algorithm.

This is the code that I made:

martes, 28 de mayo de 2013

Bioinformatic

For this homework I have to search an article in the web that talk
about Bioinformatic related with Theory Information. The articule that
I found is the next one:

Integrating Biological Research through Web Services
Hong Tina Gao, Jane Huffman Hayes, et al.

In this topic they talk about a case of study where demostrates that
Web Services could be key to coordinating and standardizing
incompatible applications in bioinformatics, an effort that is
becoming increasingly critical to meaningful biological research.

This starts with the disadvantage with the people that work in biology,
they need to analyze microarray to search certain proteins, look for
some patterns, certain sequence, etc., in a microarray are aproximatetly hundreds of
millons of molecules so it's to hard to make a search manually, for
this they make scripts where they implement algorithms depending in
what they are working for, then when they get a result they have to
make readable that result. As you see it's very difficult to make a
script, implement an algorithm and then make readable the information.

This is a microarray, a set of molecules.

That's why in these guys implement Web Services to avoid that the
people that work in Biology need to know about programming, need to
implement algorithms and then making readable, they said that the
biologist just need to know about basic skills about computer like
uploading a file in a web page, use keyboard and mouse, then they can
put some kind of rules to say if they need to look for some sequence,
search and certain of molecules, proteins, etc., after they put the
rules they just wait like a 10 or 15 minutes for a result, here is
important to say that in normal experiments in a normal computers is
about 1 hour or 10 hours to wait for a result, and after wait the time
necessary you just get a file in the result wished and ready to read.

Below we can see an diagram of how the web service that takes as inputs
microarray works.

This way any biologist can make experiments in just minutes, avoiding
that the biologist the need of programming skills.

Conclusion

I think this is a great idea because the biologist just need to make
some clicks to get his results and then get some conclusiones in some
minutes. This help to make a good improvement of the biologist and
increase the levels of experiments, invents and progress in biology.

Reference

Gao, H. and Huffman Hayes, J. (2005) Integrating Biological Research
through Web Services. Disponible en:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1413114&tag=1
[Accesado: 28 Mayo 2013].

Glosary of Information Theory

Cyclic codes
They are linear block error-correcting codes that have convenient algebraic structures for efficient error detection and correction.

Matrix theory
It's a branch of mathematics which is focused on study of matrices. Initially, it was a sub-branch of linear algebra, but soon it grew to cover subjects related to graph theory, algebra, combinatorics and statistics as well.

Merge
In the context of information theory is like mixing, adding in dictionaries, matrixes and files.

Signal
In the context of information theory is a kind of signal generated by a electromagnetic phenomenon and in each sign codify a content that can be analyzed.

Previous
In the context of theory information could be a file size before it's compressed.

Weakness
In the context of theory information could be if the compression ratio is too low.

Unique
In the context of theory information could be a unique key that could be used for the dictionary as a reference.

Ignore
In the context of theory information could a certain bits that can be used for represent white spaces.

Emit
In the context of theory information could apply in adaptative encoding while the code is compressing a file and when it's require to change some logic of the code you can emit a kind of signal.

Reconstruction
In the context of theory information can be a method that decode a compress file.

Target
In the context of theory information can be a method that encode a file that is a target.

References.

Cyclic code
http://en.wikipedia.org/wiki/Cyclic_code

Matrix theory
http://en.wikipedia.org/wiki/Category:Matrix_theory

jueves, 9 de mayo de 2013

Error-Correcting Codes

In this entry I talk about Hamming Code that is a code for error-correcting using linear algebra, just multiplying a matrix with a message module 2 to encode a message and to verify the integrity of the message we just verify multiplying with a matrix and as a result we receive a vector in binary that tell us in which bit of the message transmitted is incorrect.

To start experimenting first we need the next code, made it by me.

This is an image to demostrate how it works. Below of the image I explained more about the inputs and outputs of the program.

First we gave a message in this case I used '1111' with a noise ratio of 30%, then we encode the message and add some noise. We have message encoded in this case is '1011111' with an error in the second bit and the message corrected is '1111111', and just discarting the last 3 values then the real message transmitted is '1111'.

There are some errors detected in the current program the hamming code because the current code just can detect one error.

To solve this we can implement the extended hamming code, that just adding an extra bit of parity.

References

Encoding and Decoding with the Hamming Code

http://www.uwyo.edu/moorhouse/courses/4600/encoding.pdf

jueves, 25 de abril de 2013

Extra Points: Dictionary Method's

Dado un valor de entrada "abbacc", a continuación de hará una
sustitución de datos apartir del dictionary definido en la parte de
abajo. Suponiendo que los valores del diccionario son valores
binarios.

Dictionary
Keys Values
a 0
b 1
c 010
d 011

Ahora realizamos la sustitución.

Output

0,1,1,0,010

Para obtener los datos simplemente vamos por cada valor separado por
comas y obtenemos las letras de cada uno de ellos.

T4: Adaptive coding

The goal of this homework is to invent, implement and evaluate an adaptative code that compress ASCII text. The requirements is that we're not allow to based on an existent algorithm. As you may know, we use an income letter and that letter must be coded and saved to make a kind of dictionary to make a compression.

The principal idea it's just save the letter as it comes and keep it in a dictionary of generated symbol and the frequency gave it, and of course I have a rule to make it adaptative but first I need to explain how it works.

Sequence Letter Symbol Frequency Current Output

-----------------------------------------------------------------------------------------------------------------------

1 A -> [a:0] [a:1] [a:0]

-----------------------------------------------------------------------------------------------------------------------
2 A -> [a:0] [a:2] [a:0]0

-----------------------------------------------------------------------------------------------------------------------

3 B -> [b:1] [b:1] [a:0]0[b:1]

-----------------------------------------------------------------------------------------------------------------------

4 C -> [c:01] [c:1] [a:0]0[b:1][c:01]

-----------------------------------------------------------------------------------------------------------------------

5 A -> [a:0] [a:3] [a:0]0[b:1][c:01]0

-----------------------------------------------------------------------------------------------------------------------

6 C -> [c:01] [c:2] [a:0]0[b:1][c:01]0,01

-----------------------------------------------------------------------------------------------------------------------

As you can see this is a way to compress ASCII text just using a dictionary and putting his own values in the compression, but here comes a question "Where is the adaptative?", well the adaptative form comes with this rule, when I have a letter with a certain percentage of the total of frequencies, then I switched the letter with minimum large of symbol and the minium frequency. Let's explain it.

Sequence Letter Symbol Frequency Current Output

-----------------------------------------------------------------------------------------------------------------------

7 C -> [c:01] [c:3] [a:0]0[b:1][c:01]0,01,01

-----------------------------------------------------------------------------------------------------------------------
8 C -> [c:01] [c:4] [a:0]0[b:1][c:01]0,01,01,01

-----------------------------------------------------------------------------------------------------------------------

9 C -> [c:01] [c:5] [a:0]0[b:1][c:01]0,01,01,01,01

-----------------------------------------------------------------------------------------------------------------------

10 C -> [c:1] [c:6] [a:0]0[b:1][c:01]0,01,01,01,01[c:1]

-----------------------------------------------------------------------------------------------------------------------

11 B -> [b:01] [b:2] [a:0]0[b:1][c:01]0,01,01,01,01[c:1]b[:2]

-----------------------------------------------------------------------------------------------------------------------

I didn't the code but I complete my design, that's all.

jueves, 18 de abril de 2013

Extra points: Exercise from the book Introduction to Information Theory and Data Compression

The exercise is the next one:

5.3.1. (a)Answer: First we need to make a tree to represent the data provided, next we get the expected value of the original word length and the huffman word length, then we divide the original word length and huffman word length and we get the compression ratio.

To explain my answer I use a good tool online in this page http://huffmandemo.appspot.com/ that's made by Cecilia Urbina.

First I made a Huffman's tree in a notebook by handmade, next I produce a table with the data provided in the problem vs the coded data. To get a length of the word we get the LCM (Least Common Multiple) of the frequency, then we demuestrate using a word with that length. Once we got the information with the word we can get the expected value using the next formulas.

The first one is for expected value of the coded value (that value that's made with the huffman's tree) and the original that's the word provided in the problem.

Huffman's Tree.

Table.

Then we do the same procedure in the problem (b).

Huffman's Tree.

Table.

Source:

Introduction to Information Theory and Data Compression by Hankerson et al.

jueves, 11 de abril de 2013

T3: Text compression

Para la actividad de esta clase se creo un programa que usa el algorítmo de Huffman para hacer compresión de texto. El algorítmo de Huffman es fácil simplemente por cada letra del texto se guarda un nodo en el árbol, dicho árbol contiene todas las letras que contiene el texto, si la letra se repite en conjunto con otras letras este se le crea un nodo padre junto con la letra que se repite; Esto se hace con el propósito de que se usen de una manera casi igual a los diferentes nodos.

Para correr el programa hacemos los siguiente y nos mostrará las siguientes instrucciones:

Realmente los archivos creados al momento de comprimir fueron muy pesados porque utilice un formato de strings y esta manera se incrementa el valor de cada variable en promedio del peso de la variable de string es de 24 bytes y el de una representación binaria consta de un peso de 12 bytes esto probandoló en Python.

Ahora viendo esto si vemos los siguientes pesos de los textos comprimidos podemos ver que hay una gran diferencia de pesos pero si supones que el peso es de la mitad podemos decir que en el peor caso se encuentra el archivo pequeño ya que el archivo original pesa 1KB pero si examinamos el comprimido podremos ver que nos genera uno de 5KB luego de ver lo anterior podemos casi asegurar que su peso usando el formato binario sería de 2.5KB.

Según lo que confirme experimentando con compresiones tipos zip es que a como vaya creciendo el texto la compresión va ir reduciendo su tamaño ya que si haces una compresión con un archivo pequeño no se verá gran diferencia y al contrario de esto pasa con archivos de tamaño considerable. Aquí podemos ver un ejemplo comparativo.

jueves, 14 de febrero de 2013

Task 1: Noisy Channel

In this task I make a simulation of a channel, where transmit a word by a channel and a receptor receive the word and the word could come with some noisy like errors of the word. The program was made in Python making different methods, simulator() (Simulator make all the logic of the channel), generador() (generator of a letter of the word to send in the channel), palabra() (gets the word by a large) and transmisor() (It's the transmissor that sends the words by the channel). You can see my code below.

And with the help of a batch, awk and gnuplot I made a graphic of the output of the program.

This is the graphic where the axis x is the frequency of zero, axis y is the probability of ones, axis z is the probability of zero. The heat map represent the probability of success.

Saúl Gausin

Blogs

miércoles, 29 de mayo de 2013

Image Compression

martes, 28 de mayo de 2013

Bioinformatic

Glosary of Information Theory

jueves, 9 de mayo de 2013

Error-Correcting Codes

jueves, 25 de abril de 2013

Extra Points: Dictionary Method's

T4: Adaptive coding

jueves, 18 de abril de 2013

Extra points: Exercise from the book Introduction to Information Theory and Data Compression

jueves, 11 de abril de 2013

T3: Text compression

jueves, 14 de febrero de 2013

Task 1: Noisy Channel

jueves, 24 de enero de 2013

Teoría de la información y métodos de codificación

Popular Posts

About Me

Labels

Blog Archive

Social Icons

Blogs

miércoles, 29 de mayo de 2013

martes, 28 de mayo de 2013

jueves, 9 de mayo de 2013

jueves, 25 de abril de 2013

jueves, 18 de abril de 2013

jueves, 11 de abril de 2013

jueves, 14 de febrero de 2013

jueves, 24 de enero de 2013

Popular Posts

About Me

Labels

Blog Archive