本文是计算机专业的留学生作业范例,题目是“Lip Reading Using Neural Networks Computer Science Essay(用神经网络唇读计算机科学论文)”,神经网络具有从复杂或不精确的数据中获得意义的非凡能力,可以用来提取模式和检测过于复杂而无法被人类或其他计算机技术注意到的趋势。一个训练有素的神经网络可以被认为是一个专家的一类信息,它已经被给予分析。将神经网络应用于LIP READING,这是最简单的语音识别方法之一。它是语音识别领域广泛采用的最新技术之一。
Abstract 摘要
Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze. Neural network is applied in LIP READING, one of the easiest ways to recognize the speech. It is one of the latest techniques widely preferred for speech recognition.
We descrie a lip reading system that uses both, shape information from the lip contours and intensity information from the mouth area. Shape information is obtained by tracking and parameterising the inner and outer lip boundary in an image sequence. Intensity information is extracted from a grey level model, based on principal component analysis. In comparison to other approaches, the intensity area deforms with the shape model to ensure that similar object features are represented after non-rigid deformation of the lips. We describe speaker independent recognition experiments based on these features. Preliminary results suggest that similar performance can be achieved by using either shape or intensity information and slightly higher performance by their combined use.
我们描述了一种唇读系统,它使用了嘴唇轮廓的形状信息和嘴巴区域的强度信息。通过对图像序列的内、外唇边界进行跟踪和参数化,获得图像的形状信息。基于主成分分析,从灰度模型中提取强度信息。与其他方法相比,强度区域随形状模型进行变形,以保证唇形非刚性变形后相似的物体特征得到表达。我们描述了基于这些特征的说话人独立识别实验。初步结果表明,使用形状或强度信息都可以实现类似的性能,并通过它们的联合使用略微提高性能。
1.NEURAL NETWORK 神经网络
A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform “intelligent” tasks similar to those performed by the human brain. Neural networks resemble the human brain in the following two ways:
神经网络是一种强大的数据建模工具,能够捕获和表示复杂的输入/输出关系。神经网络技术发展的动机源于开发一种人工系统的愿望,这种人工系统可以执行类似于人类大脑执行的“智能”任务。神经网络在以下两方面与人类大脑相似:
A neural network acquires knowledge through learning.
A neural network’s knowledge is stored within inter-neuron connection strengths known as synaptic weights.
Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an expert in the category of information it has been given to analyze.
The true power and advantage of neural networks lies in their ability to represent both linear and non-linear relationships and in their ability to learn these relationships directly from the data being modeled. Traditional linear models are simply inadequate when it comes to modeling data that contains non-linear characteristics.
The most common neural network model is the multilayer perception (MLP). This type of neural network is known as a supervised network because it requires a desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data so that the model can then be used to produce the output when the desired output is unknown.
A graphical representation of an MLP is shown below.
The MLP and many other neural networks learn using an algorithm called back propagation. With back propagation, the input data is repeatedly presented to the neural network. With each presentation the output of the neural network is compared to the desired output and an error is computed. This error is then fed back (back propagated) to the neural network and used to adjust the weights such that the error decreases with each iteration and the neural model gets closer and closer to producing the desired output. This process is known as “training”.
2.SPEECH RECOGNITION 语音识别
Speech recognition work is one of the most exciting areas of modern computer science research. For the computers to understand speech and gesture. The sheer variety and complexity of a word makes recognizing similar words very difficult. . A neural network is a model of the way in which the human brain works. They are ideally suited to all forms of pattern recognition and have the extraordinary ability to learn.
语音识别工作是现代计算机科学研究中最令人兴奋的领域之一。让电脑理解语言和手势。单词的多样性和复杂性使得识别相似的单词非常困难。神经网络是人类大脑工作方式的模型。它们非常适合所有形式的模式识别,并具有非凡的学习能力。
Neural networks are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this potentiality of neural networks to improve the speech recognition accuracy.
Neural network is applied in LIP READING, one of the easiest ways to recognize the speech. It is one of the latest techniques widely preferred for speech recognition.
3.LIP READING 唇读
Lip reading involves the extraction of visual speech features. The most visual speech information is contained in the inner and outer lip contour, it has also been shown that information about the visibility of teeth and tongue provide important speech cues. Particularly for fricatives, the place of articulation can often be determined visually, i.e. for labiodentals (upper teeth on lower lip), interdentally (tongue behind front teeth) and alveolar (tongue touching gum ridge) place. Other speech information might be contained in the protrusion and wrinkling of lips.
唇读涉及到视觉语音特征的提取。最直观的语音信息包含在内嘴唇和外嘴唇的轮廓中,牙齿和舌头的可见性信息也提供了重要的语音线索。特别是摩擦音,常常可以通过视觉来确定发音的位置,即唇形(下唇上牙)、齿间(前牙后舌)和牙槽(舌触龈脊)的位置。其他的言语信息可能包含在嘴唇的突出和褶皱中。
Lip reading approaches can be classified into:
Image-based systems.
Model-based systems.
Image-based systems use grey level information from an image region containing the lips either directly or after some processing as speech features. Most image information is therefore retained, but it is left to the recognition system to discriminate speech information from linguisticvariability and illumination variability.
Model-based systems usually represent the lips by geometric measures, like the height or width of the outer or inner lip boundary or by a parametric contour model which represents the lip boundaries. The extracted features are of low dimension and invariant to illumination. Model-based systems depend on the definition of speech related features by the user. The definition may therefore not include all speech relevant information and features like the visibility of teeth and tongue which are difficult to represent.
The early systems performed well for a speaker independent recognition task, but it did not contain any intensity information which might provide additional speech information. Here we extend this system by augmenting the feature vector with intensity information extracted from the mouth region. We evaluate the contribution of intensity information separately and in combination with shape features.
4.SHAPE MODELLING 形状造型
For modelling the shape variability of lips, we use an approach based on active shape models. These are statistically based deformable models which represent a contour by a set of points. Patterns of characteristic shape variability are learned from a training set, using principal component analysis (PCA). The main modes of shape variation captured in the training set can therefore be described by a small number of parameters.
为了对嘴唇的形状变异性进行建模,我们采用了一种基于活动形状模型的方法。这些是基于统计的可变形模型,由一组点表示轮廓。使用主成分分析(PCA),从训练集学习特征形状变化的模式。因此,在训练集中捕获的形状变化的主要模式可以用少量的参数来描述。
The main advantage of this modelling technique is that heuristic assumptions about legal shape deformation are avoided. Instead, the model is only allowed to deform to shapes similar to the ones seen in the training set. Any shape x representing the co- ordinates of the contour points can be approximated by
x=x’ + Pb
Where x’ is the mean shape, P the matrix of eigenvectors of the Covariance matrix and b, a vector containing the weights for each eigenvector. Only the first few eigenvectors corresponding to the largest eigenvalues are needed to describe the main shape variability.
Shape model for the inner and outer lip contour with profile vectors, perpendicular to the lip contours.
Lip model with mean shape and mean intensity
We built and tested two models of the lips: Model 1, which represents the outer lip boundary only and Model 2, which represents the outer and inner lip boundary. The models are used to locate, track and parameterise lip movements in image sequences. The weights for the shape modes are recovered from the tracking results and serve as features for the recognition system.
5.INTENSITY MODELLING 强度模型
Several approaches for speech reading, based on intensity information have been developed. Our approach for extracting intensity information is based on principal component analysis and is related to the exigent lips. This approach placed a window around the mouth area on which PCA was performed. Since the window does not deform with the lips, the eigenvectors of the PCA mainly account for intensity variation due to different lip shape and mouth opening. We already obtain detailed information of the lip shape from our shape model by a small number of parameters and are therefore mainly interested in intensity information which is independent of lip shape.
基于强度信息的语音阅读方法已经被开发出来。我们的强度信息提取方法是基于主成分分析,并与紧急嘴唇有关。这种方法在进行PCA的口腔区域周围放置了一个窗口。由于窗口不随嘴唇变形,主成分分析的特征向量主要是由于嘴唇形状和开口程度的不同而引起的强度变化。我们已经通过少量的参数从形状模型中获得了唇形的详细信息,因此我们主要感兴趣的是与唇形无关的强度信息。
We follow an approach, where one dimensional profile is sampled perpendicular to the contour at each model point as shown in Figure 1. But instead of using local grey level models we construct a global grey-level model by concatenating the vectors of all model points to form a global intensity vector h. We then estimate the covariance matrix of the global profile vectors over the training set and perform PCA to obtain the principal modes of profile variation. Any profile h can now be approximated by where is the mean profile, Pg the matrix of the first column eigenvectors, corresponding to the largest eigen values and bg , a vector containing the weights for each eigen vector.
6.SPEECH MODELLING 演讲造型
The weights for the shape model and the intensity model are extracted at each image frame to form frame dependent feature vectors for the recognition system. We use either the shape parameters or the intensity parameters or both parameter sets as feature vector for the recognition system. Assuming accurate tracking performance, the shape and intensity parameters are invariant to translation, rotation and scale. The intensity modes account for both, illumination differences and differences due to the visibility of teeth and tongue and protrusion.
在每一帧图像上提取形状模型和强度模型的权值,形成与图像帧相关的特征向量,用于识别系统。我们使用形状参数或强度参数或两个参数集作为识别系统的特征向量。在保证精确跟踪性能的前提下,形状和强度参数对平移、旋转和缩放是不变的。强度模式考虑了两者,照明差异和由于牙齿、舌头和突出物可见性的差异。
Dynamic speech information is important and often less sensitive to inter speaker variability, i.e. intensity values of the lips will remain fairly constant during speech while intensity values of the mouth opening will change during speech. The intensity values of the lips will vary between speakers but the temporal changes of intensity might be similar for different speakers. Dynamic features will therefore be more robust to different illumination and different speakers.
We have described lip reading system that uses both, shape and intensity information. An important property of the intensity model is that it deforms with the lip contour model in order to represent the same object features after lip movements. Recognition tests using only intensity parameters indicate that much visual speech information is contained in grey level information which might account for protrusion or visibility of teeth and tongue. Recognition performance was slightly higher for intensity features than for shape features and their combined use outperformed both feature sets.This excellent application in lip reading is under research and expected to give out lot of fruitful outcomes. Its wide usage for the impaired adds more importance to this application.
我们已经描述了唇读系统,使用了形状和强度信息。强度模型的一个重要性质是,它与唇形模型一起变形,以表示同一物体在唇形运动后的特征。仅使用强度参数的识别测试表明,许多视觉语音信息包含在灰度信息中,这可能解释了牙齿和舌头的突出或可见性。强度特征的识别性能略高于形状特征,它们的联合使用优于两个特征集。该技术在唇读中的应用正在研究中,有望取得丰硕成果。它在残疾人中的广泛使用增加了这一应用的重要性。
留学生作业相关专业范文素材资料,尽在本网,可以随时查阅参考。本站也提供多国留学生课程作业写作指导服务,如有需要可咨询本平台。