In the running world, there is growing demand for the software systems to recognize
characters in computer system when information is scanned through paper documents as we
know that we have number of newspapers and books which are in printed format related to
different subjects. These days there is a huge demand in storing the information available in
these paper documents in to a computer storage disk and then later reusing this information by
searching process. One simple way to store information in these paper documents in to
computer system is to first scan the documents and then store them as IMAGES. But to reuse
this information it is very difficult to read the individual contents and searching the contents
form these documents line-by-line and word-by-word. The reason for this difficulty is the font
characteristics of the characters in paper documents are different to font of the characters in
computer system. As a result, computer is unable to recognize the characters while reading
them.
Thus our need is to develop character recognition software system to perform Document
Image Analysis which transforms documents in paper format to electronic format. For this
process there are various techniques in the world. Among all those techniques we have chosen
Optical Character Recognition as main fundamental technique to recognize characters. OCR
thus derives the meaning of characters, their font properties from their bit-mapped images.
PROBLEM STATEMENT
In the running world there is a growing demand for the users to convert the printed documents
in to electronic documents for maintaining the security of their data. Hence the basic OCR
system was invented to convert the data available on papers in to computer process able
documents, So that the documents can be editable and reusable. The existing system/the
previous system of OCR on a grid infrastructure is just OCR without grid functionality. That is
the existing system deals with the homogeneous character recognition or character recognition
of single languages.
The drawback in the early OCR systems is that they only have the capability to convert and
recognize only the documents of English or a specific language only. That is, the older OCR
system is uni-lingual.
OVERVIEW OF PROPOSED SOLUTION APPROACH
The problem is for the software systems to recognize characters in computer system when
information is scanned through paper documents as we know that we have number of
newspapers and books which are in printed format related to different subjects. Whenever
we scan the documents through the scanner, the documents are stored as images such as
jpeg, gif etc., in the computer system. These images cannot be read or edited by the user.
But to reuse this information it is very difficult to read the individual contents and the
contents form these documents line-by-line and word-by-word. These days there is a huge
demand in “storing the information available in these paper documents in to a computer
storage disk and then later editing or reusing this information by searching process”.
INTEGRATED SUMMARY OF THE LITERATURE STUDIED
Neural network play a very important role in Artificial intelligence. For the neural network training is
important part it provide ability to guess the best Answer. There are two type of training for neural
networks
Un-Supervised Training
Supervised Training
Supervised training provides the neural network with training sets and the anticipated output. Unsupervised
training supplies the neural network with training sets, but there is no anticipated output provided
UNSUPERVISED TRAINING
Unsupervised training is a very common training technique for Kohonen neural networks. What is meant by
training without supervision is that the neural network is provided with training sets, which are collections of
defined input values. But the unsupervised neural network is not provided with anticipated outputs.
Unsupervised training is usually used in a classification neural network. A classification neural network takes
input patterns, which are presented to the input neurons. These input patterns are then processed, and one
single neuron on the output layer fires. This firing neuron can be thought of as the classification of which
group the neural input pattern belonged to. Handwriting recognition is a good application of a classification
neural network.
The input patterns presented to the Kohonen neural network are the dot image of the character that was
hand written. We may then have 26 output neurons, which correspond to the 26 letters of the English alphabet.
The Kohonen neural network should classify the input pattern into one of the 26 input patterns.
During the training process the Kohonen neural network in handwritten recognition is presented with 26 input
patterns. The network is configured to also have 26 output patterns. As the Kohonen neural network is trained
the weights should be adjusted so that the input patterns are classified into the 26 output neurons. This
technique results in a relatively effective method for character recognition.
SUPERVISED TRAINING
The supervised training method is similar to the unsupervised training method in that training sets are
provided. Just as with unsupervised training these training sets specify input signals to the neural network.
The primary difference between supervised and unsupervised training is that in supervised training the
expected outputs are provided. This allows the supervised training algorithm to adjust the weight matrix based
on the difference between the anticipated output of the neural network, and the actual output.
There are several popular training algorithms that make use of supervised training. One of the most
common is the back-propagation algorithm. It is also possible to use an algorithm such as simulated annealing
or a genetic algorithm to implement supervised training.
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTS
FUNCTIONAL REQUIREMENTS
1. The product shall be able to recognize the input character
2. The product shall be able to load the image form the local storage where they are stored.
3. The product shall be able to edit the image correctly.
4. The product should be proper character recognition system.
5. Product shell recognizes the image clearly.
NON FUNCTIONAL REQUIREMENT
1. Reliable: The apps should display authentic and right guidelines/information regarding
information security and should not mislead. Failures/errors should be debug able .
2. Performance: Speed should be efficiently high. The apps/buttons should respond within
1-5s
3. Safety: The apps should not cause any damage to human in any form.
ARCHITECTURE OF THE PROPOSED SYSTEM
The Architecture of the optical character recognition system on a grid infrastructure consists of the three
main components. They are:-
Scanner
OCR Hardware or Software
Output Interface
MODULES AND THEIR FUNCTIONALISTS
Our software system Optical Character Recognition on a grid infrastructure can be divided into five
modules based on its functionality.The modules classified are as follows:-
Document Processing Module
System Training Module.
Document Recognition Module.
Document Editing Module and
Document Searching Module.
FINDINGS
System can detect the character if there is a slight difference in training set character and input set of
character.
This system more stable as compared to other existing systems.
Compare with other systems, this provides more correct output.
CONCLUSION
What does the future hold for OCR? Given enough entrepreneurial designers and sufficient research and
development dollars, OCR can become a powerful tool for future data entry applications. However, the
limited availability of funds in a capital-short environment could restrict the growth of this technology. But,
given the proper impetus and encouragement, a lot of benefits can be provided by the OCR system. They are:-
The automated entry of data by OCR is one of the m ost attractive, labor reducing technology
The recognition of new font characters by the system is very easy and quick.
We can edit the information of the documents more conveniently and we can reuse the edited information
as and when required.
The extension to software other than editing and searching is topic for future works.
The Grid infrastructure used in the implementation of Optical Character Recognition system can be
efficiently used to speed up the translation of image based documents into structured documents that are
currently easy to discover, search and process
FUTURE WORK
The Optical Character Recognition software can be enhanced in the future in different kinds of ways
such as:
Training and recognition speeds can be increased greater and greater by making it more user-friendly.
Many applications exist where it would be desirable to read handwritten entries. Reading handwriting is a
very difficult task considering the diversities that exist in ordinary penmanship. However, progress is
being made.
for the full presentation --> http://www.slideshare.net/IAMINURHEARTS1/ocr-ppt-35272335
for youtube presentation -->https://www.youtube.com/watch?v=AYRTcxPvw04&feature=youtu.be