Mini projects

Here are some of the projects I did (alone or as a group) during my undergraduate and graduate studies. Feel free to click around to know more about them.

At:
KIT Karlsruhe
Year:
2010

In this Seminar I conducted a review of the MIMO space-time code technique known as BLAST. Both the Vertical and Diagonal variants were studied. The cases with and without channel information at the transmitter were analysed, and then the fast fading scenario (V-BLAST) and slow fading scenario (D-BLAST) were studied. The Seminar involved writing a 6-page IEEE transactions format report followed by a 15 minute presentation.

At:
IBM Delhi
Year:
2010

Modified the Synchronous OverLap and Add (SOLA) technique used to change the speed of audio segments without changing their pitch to work with non-linear change in speed over the duration of speech segment. This was applied to IBM's World Wide Telecom Web: a mobile based spoken web interface for developing countries. The speech data was successfully subjected to a Speech Activity Detector, and then its speed was changed.

At:
UPC Barcelona
Course:
Advanced Signal Processing
Year:
2010

The Kalman filter was successfully used to track the Formula 1 cars in the TV broadcast video sequence. Tracking was maintained without any faults even in case of overlaps and occlusions. More detailed analysis and 2 sample videos can be found here. Click here to download the presentation (no Videos).

At:
UPC Barcelona
Course:
Convex Optimization
Year:
2010

I implemented an algorithm for blind source separation of images using Convex Optimization algorithms. The foundation was that mixed images lie in the convex hull area of the original sources. An assumption that the images are local dominant was made to allow the separation. The idea was to try this to identify Dissolves in video sequences, however, the minute motion that appears between 2 frames did not allow for separation. Another idea of Block processing was tried, for gaining speed in the process of separation. However, it had its own problem, not being able to know which block came from which image. Click here to download the presentation.

At:
UPC Barcelona
Course:
Wavelets: Theory and Applications
Year:
2010

In this project, the complex wavelet transform was first studied and implemented. The Kingsbury filters were used. Further, the segmentation of the Devanagari script to a basic form (for identifying consonants) was carried out. As a prototype, the OCR application was tried, however the results were not very encouraging. Comparison with baseline DCT and also with various real wavelets was presented. Click here to download the presentation.

At:
UPC Barcelona
Course:
Coding and Transmission of Multimedia Content
Year:
2010

We (Petra and I) designed a Lab session for the Coding and Tx. of Multimedia Content course, which would help better understand the actual working of the MPEG-2 video coding standards. The interactive session involved the features of block-matching, the use of colour, and also compared the ordering of the I,B,B,P,... frames. Different combinations of GoP size were tried, and a Rate-Distortion graph generated for 2 sample test videos.

At:
UPC Barcelona
Course:
Speech and Language Technologies and Applications
Year:
2009

The project of Speech recognition was carried out on the HIWIRE database, an European Union project to better understand the problems of speech recognition inside a noisy cockpit environment. Methods of cleaning the audio clips like Spectral subtraction, wavelet based subtraction were tried. Also an Maximum Likelihood Linear Regression (MLLR) adaptation was carried out which helped improve the performance. Click here to download the presentation.

At:
NITK Surathkal
Course:
DSP Systems and Architecture
Year:
2009

We (Ohil and I) implemented basic image processing algorithms like rgb2gray, histogram equalization, edge detection, gaussian blur, etc. These were ported to a Nokia 3110 classic mobile phone. The J2ME (Java mobile edition) was used to port the software, and the camera was integrated with it.

At:
NITK Surathkal
Course:
VLSI - Testing and Testability
Year:
2009

The D-algorithm for stuck-at fault testing was implemented using Matlab. A simple protocol for the description of input circuit (netlist) was devised. Was among the two people (the other being Ohil) who wrote a program to do the same, and thus could handle large and complex circuits with re-convergent branching.

At:
NITK Surathkal
Course:
Logic Synthesis Techniques
Year:
2009

In this project, we (Ohil and I) wrote an algorithm to generate the Reduced Ordered Boolean equation from an equation with redundant min-terms. The coding was done in C#. Further to have a better understanding of the decision diagram, the graphing tool 'dotty' from the GraphViz package was used. The diagram scripts were generated automatically using the C# codes.

At:
NITK Surathkal
Course:
Digital Signal Compression
Year:
2008

We (Ohil and I) implemented the Set Partitioning In Hierarchical Trees (SPIHT) algorithm for wavelet based image compression. Fingerprint images were used as the test case for their great amount of detail. The compression performance was compared with the other alternative of Embedded Zero-tree Wavelet (EZW).

At:
NITK Surathkal
Course:
VLSI - CAD
Year:
2008

We (Ohil and I) used the tool called SPARK was used to design a two-layered feed-forward backpropagating artificial neural network. The idea was to try to have an architecture that could allow the training of the neural network in parallel, thus making the training much faster. The tan-sigmoid was used as a squashing function. Basic operations like XNOR, XOR, NAND were trained and evaluated. The implementation was in C.

At:
NITK Surathkal & Microsoft Research India
Course:
Digital Signal Processing
Year:
2008

Our (Ohil, Vaibhav, Sneha and I) project for the Digital Signal Processing course was guided by Prof. Sumam David along with Dr. Amitav Das from Microsoft Research India. We learnt about Speech recognition, the basics of HMMs and also worked on Speech segmentation into Voiced / Unvoiced, and further down the hierarchy.

The project's major component was to build an isolated word recognition system, using the Dynamic Time Warping algorithm for the classification. MFCCs were used as features. This basic idea was extended to build:

  • An interactive-voice-response-system for booking of travel tickets in major cities in India
  • A program to start softwares using voice commands
  • A framework for playing the popular Solitaire game only by using voice commands. It was linked together with Matlab and Windows API scripts for hitting keys and moving the cursor. The commands were broken down to a few simple ones like Move, Open, Top and Finish. Very simple image processing was performed to identify the type of card, so that the voice command could guide the cursor to the appropriate location. A backend kept record of all the cards that were visible at any given time.

At:
NITK Surathkal
Course:
VLSI Design
Year:
2008

Using the software Magic, we (Ohil, Vaibhav and I) designed a 32-bit Barrel shifter. It was tested on the IRSim simulator, and detailed analysis for timing issues, delays was carried out using NGSpice.

At:
NITK Surathkal
Course:
Digital Communications
Year:
2008

This project was to help us understand the differences between the standard digital modulation techniques. Monte-Carlo simulations for comparison of BPSK / BFSK; 4-PAM / 8-PAM; BPSK / QPSK / 8-PSK; and 16-PSK / 16-QAM were carried out. The PN-sequence generators and their autocorrelation properties were also studied.

At:
NITK Surathkal
Course:
Digital Systems Design
Year:
2007

We (Ohil, Vaibhav and I) developed a JPEG2000 image decoder and ported it to the Xilinx Virtex II Pro FPGA. Images were stored on compact flash cards and were processed by an embedded PowerPC processor. The JPEG2000 lossy compression standard was implemented and involved wavelet transformation and entropy decoding as defined by the standards. 24 bit VGA output was generated on a standard DB15 connector.

At:
NITK Surathkal
Course:
Microprocessors
Year:
2007

The initial idea was to perform basic floating point operations. With this in mind, we (Ohil and I) decided to use a protocol for the floating point representation as a.bcd x 10^ef. Where a, b, c, d, e and f could take values 0-9. Basic operations like add, subtract, multiply, reciprocate, divide, factorial were the starters, and were extended to compute integral powers, trigonometric functions (through infinite series), permutations and combinations, deg-to-rad and rad-to-deg conversion. A user-friendly GUI was also created where fonts were manually defined using patterns of 0s and 1s.

At:
NITK Surathkal
Course:
Linear Integrated Circuits
Year:
2007

The goal of this project was to prove that a universal remote control could be made to perform the basic functions of multimedia control. We (Ohil and I) first learnt the remote control protocols. The 38kHz signal required to modulate the codes was generated using a 555-timer IC. We successfully demonstrated its working on a LG-DVD player.

At:
NITK Surathkal
Course:
Digital Electronic Circuits
Year:
2006

Using the software called TKGate, we (Ohil, Vaibhav and I) designed a traffic signal controller. It included features like a countdown timer, a manual override, an emergency (ambulance) switch, a night light (yellow blinking) and an accident light which would warn all traffic to go slow.

Links

At a glimpse

A snapshot of all multimedia related projects can be found here. This is the link you will come to from any of our papers too!

Multimedia Resources

I hope to collect links at one place to make it easy to browse the wonderful works.

CV, MM Papers on the Web

Meta-resources for Computer Vision conferences CVPapers, ACM Multimedia MM papers, and Multimedia Information Retrieval papers MIR.