Here are some of the projects I did (alone or as a group) during my undergraduate and graduate studies. Feel free to click around to know more about them.
In this Seminar I conducted a review of the MIMO space-time code technique known as BLAST. Both the Vertical and Diagonal variants were studied. The cases with and without channel information at the transmitter were analysed, and then the fast fading scenario (V-BLAST) and slow fading scenario (D-BLAST) were studied. The Seminar involved writing a 6-page IEEE transactions format report followed by a 15 minute presentation.
Modified the Synchronous OverLap and Add (SOLA) technique used to change the speed of audio segments without changing their pitch to work with non-linear change in speed over the duration of speech segment. This was applied to IBM's World Wide Telecom Web: a mobile based spoken web interface for developing countries. The speech data was successfully subjected to a Speech Activity Detector, and then its speed was changed.
The Kalman filter was successfully used to track the Formula 1 cars in the TV broadcast video sequence. Tracking was maintained without any faults even in case of overlaps and occlusions. More detailed analysis and 2 sample videos can be found here. Click here to download the presentation (no Videos).
I implemented an algorithm for blind source separation of images using Convex Optimization algorithms. The foundation was that mixed images lie in the convex hull area of the original sources. An assumption that the images are local dominant was made to allow the separation. The idea was to try this to identify Dissolves in video sequences, however, the minute motion that appears between 2 frames did not allow for separation. Another idea of Block processing was tried, for gaining speed in the process of separation. However, it had its own problem, not being able to know which block came from which image. Click here to download the presentation.
In this project, the complex wavelet transform was first studied and implemented. The Kingsbury filters were used. Further, the segmentation of the Devanagari script to a basic form (for identifying consonants) was carried out. As a prototype, the OCR application was tried, however the results were not very encouraging. Comparison with baseline DCT and also with various real wavelets was presented. Click here to download the presentation.
We (Petra and I) designed a Lab session for the Coding and Tx. of Multimedia Content course, which would help better understand the actual working of the MPEG-2 video coding standards. The interactive session involved the features of block-matching, the use of colour, and also compared the ordering of the I,B,B,P,... frames. Different combinations of GoP size were tried, and a Rate-Distortion graph generated for 2 sample test videos.
The project of Speech recognition was carried out on the HIWIRE database, an European Union project to better understand the problems of speech recognition inside a noisy cockpit environment. Methods of cleaning the audio clips like Spectral subtraction, wavelet based subtraction were tried. Also an Maximum Likelihood Linear Regression (MLLR) adaptation was carried out which helped improve the performance. Click here to download the presentation.
We (Ohil and I) implemented basic image processing algorithms like rgb2gray, histogram equalization, edge detection, gaussian blur, etc. These were ported to a Nokia 3110 classic mobile phone. The J2ME (Java mobile edition) was used to port the software, and the camera was integrated with it.
The D-algorithm for stuck-at fault testing was implemented using Matlab. A simple protocol for the description of input circuit (netlist) was devised. Was among the two people (the other being Ohil) who wrote a program to do the same, and thus could handle large and complex circuits with re-convergent branching.
In this project, we (Ohil and I) wrote an algorithm to generate the Reduced Ordered Boolean equation from an equation with redundant min-terms. The coding was done in C#. Further to have a better understanding of the decision diagram, the graphing tool 'dotty' from the GraphViz package was used. The diagram scripts were generated automatically using the C# codes.
We (Ohil and I) implemented the Set Partitioning In Hierarchical Trees (SPIHT) algorithm for wavelet based image compression. Fingerprint images were used as the test case for their great amount of detail. The compression performance was compared with the other alternative of Embedded Zero-tree Wavelet (EZW).
We (Ohil and I) used the tool called SPARK was used to design a two-layered feed-forward backpropagating artificial neural network. The idea was to try to have an architecture that could allow the training of the neural network in parallel, thus making the training much faster. The tan-sigmoid was used as a squashing function. Basic operations like XNOR, XOR, NAND were trained and evaluated. The implementation was in C.
Our (Ohil, Vaibhav, Sneha and I) project for the Digital Signal Processing course was guided by Prof. Sumam David along with Dr. Amitav Das from Microsoft Research India. We learnt about Speech recognition, the basics of HMMs and also worked on Speech segmentation into Voiced / Unvoiced, and further down the hierarchy.
The project's major component was to build an isolated word recognition system, using the Dynamic Time Warping algorithm for the classification. MFCCs were used as features. This basic idea was extended to build:
Using the software Magic, we (Ohil, Vaibhav and I) designed a 32-bit Barrel shifter. It was tested on the IRSim simulator, and detailed analysis for timing issues, delays was carried out using NGSpice.
This project was to help us understand the differences between the standard digital modulation techniques. Monte-Carlo simulations for comparison of BPSK / BFSK; 4-PAM / 8-PAM; BPSK / QPSK / 8-PSK; and 16-PSK / 16-QAM were carried out. The PN-sequence generators and their autocorrelation properties were also studied.
We (Ohil, Vaibhav and I) developed a JPEG2000 image decoder and ported it to the Xilinx Virtex II Pro FPGA. Images were stored on compact flash cards and were processed by an embedded PowerPC processor. The JPEG2000 lossy compression standard was implemented and involved wavelet transformation and entropy decoding as defined by the standards. 24 bit VGA output was generated on a standard DB15 connector.
The initial idea was to perform basic floating point operations. With this in mind, we (Ohil and I) decided to use a protocol for the floating point representation as
a.bcd x 10^ef. Where a, b, c, d, e and f could take values 0-9. Basic operations like add, subtract, multiply, reciprocate, divide, factorial were the starters, and were extended to compute integral powers, trigonometric functions (through infinite series), permutations and combinations, deg-to-rad and rad-to-deg conversion. A user-friendly GUI was also created where fonts were manually defined using patterns of 0s and 1s.
The goal of this project was to prove that a universal remote control could be made to perform the basic functions of multimedia control. We (Ohil and I) first learnt the remote control protocols. The 38kHz signal required to modulate the codes was generated using a 555-timer IC. We successfully demonstrated its working on a LG-DVD player.
Using the software called TKGate, we (Ohil, Vaibhav and I) designed a traffic signal controller. It included features like a countdown timer, a manual override, an emergency (ambulance) switch, a night light (yellow blinking) and an accident light which would warn all traffic to go slow.
A snapshot of all multimedia related projects can be found here. This is the link you will come to from any of our papers too!
I hope to collect links at one place to make it easy to browse the wonderful works.