利用OlamiSDK实现语音控制计算器(iOS)
博客链接:http://blog.csdn.net/scarlettzhao0602/article/details/76576836
十载的鹤壁网站建设经验,针对设计、前端、开发、售后、文案、推广等六对一服务,响应快,48小时及时工作处理。全网营销推广的优势是能够根据用户设备显示端的尺寸不同,自动调整鹤壁建站的显示方式,使网站能够适用不同显示终端,在浏览器中调整网站的宽度,无论在任何一种浏览器上浏览网站,都能展现优雅布局与设计,从而大程度地提升浏览体验。成都创新互联从事“鹤壁网站设计”,“鹤壁网站推广”以来,每个客户项目都认真落实执行。
一、简介:
Olami Calculator是一款在键盘输入算式的普通计算器的基础上,增加了支持语音控制输入算式输出结果的人工智能计算器。此外还增加了多种动画效果,计算结果提示音功能,多元化主题换肤功能,以及保存计算公式,侧滑栏查看收藏记录等功能。网上也有许多语音计算器,但是打开看,只是添加了按钮提示音等,并不能识别我们对着计算器说的内容,而Olami Calculator可以实现不用手动敲击键盘,只需要把想知道结果的算式对着语音计算器说出来,例如三加四乘五、清空等,然后Olami会根据自己的一套语音识别系统帮我们准确识别出来。真正做到一款语音控制的计算器。
二、界面直观化展示
四、代码处实现
先来看下OlamiRecognizer.h为我提供了哪些接口
*返回结果 */ -(void)onResult:(NSData*)result;/* *取消本次会话 */-(void)onCancel;/* *识别失败 */-(void)onError:(NSError *)error;/* *音量的大小 音频强度范围时0到100 */-(void)onUpdateVolume:(float) volume;/*** *开始录音 */-(void)onBeginningOfSpeech;/** *结束录音 * */-(void)onEndOfSpeech;@endtypedef NS_ENUM(NSInteger, LanguageLocalization) { LANGUAGE_SIMPLIFIED_CHINESE = 0, //简体中文 LANGUAGE_TRADITIONA_CHINESE = 1 //繁体中文};@interface OlamiRecognizer : NSObject@property (nonatomic,weak) iddelegate;@property (nonatomic, assign,readonly) BOOL isRecording;//是否正在录音-(void)start;//开始录音-(void)stop;//结束录音,开始识别-(void)cancel;//取消本次回话/** *设置语系的选项,目前只支持一种,简体中文 */-(void)setLocalization:(LanguageLocalization) location;/** *CUSID;//终端用户标识id,用来区分各个最终用户 例如:手机的IMEI *appKey;//创建应用的appkey *api;//要调用的API类型。现有3种:语义(nli)和分词(seg)和语音(asr) *appSecret;//加密的秘钥,由应用管理自动生成 */-(void)setAuthorization:(NSString*)appKey api:(NSString*)api appSecret:(NSString*)appSecret cusid:(NSString*)CUSID; -(void)setVADTimeoutFrontSIL:(unsigned int)value;//设置VAD前端点超时范围 1000~~10000(ms) 默认3000-(void)setVADTimeoutBackSIL:(unsigned int)value;//设置VAD后端点超时范围 1000~~10000(ms) 默认2000-(void)setInputType:(int) type;//设置是语音输入还是文字输入 0 为语音 1为文字输入-(void)setLatitudeAndLongitude:(double) latitude longitude:(double)longit;//设置地理位置,参数为经纬度-(void)sendText:(NSString*)text;//发送输入的文字
项目中,首先 初始化Olami语音识别对象并设置代理
/** *CUSID;//终端用户标识id,用来区分各个最终用户 例如:手机的IMEI *appKey;//创建应用的appkey *api;//要调用的API类型。现有3种:语义(nli)和分词(seg)和语音(asr) *appSecret;//加密的秘钥,由应用管理自动生成 */#define AppKey @""//查看自己的#define AppSecret @""#define macID @""-(void)setupOLAMI{ _olamiRecognizer= [[OlamiRecognizer alloc] init]; _olamiRecognizer.delegate = self;//此处为OlamiRecognizerDelegate [_olamiRecognizer setAuthorization:AppKey api:@"asr" appSecret:AppSecret cusid:macID]; //设置语言,目前只支持中文 [_olamiRecognizer setLocalization:LANGUAGE_SIMPLIFIED_CHINESE]; }12345678910111213141516171819201234567891011121314151617181920
设置一个录音键
#pragma mark --录音键- (IBAction)recordButton:(UIButton *)sender { //设置为语音模式(代理方法:0为语音) [_olamiRecognizer setInputType:0]; //开始录音 if (_olamiRecognizer.isRecording) {//isRecording = YES 即为录音模式 [_olamiRecognizer stop];//代理方法 [_recordButton setImage:[UIImage p_w_picpathNamed:@"话筒4.png"] forState:UIControlStateNormal]; }else{ [_olamiRecognizer start];//代理方法 [_recordButton setImage:[UIImage p_w_picpathNamed:@"话筒7.png"] forState:UIControlStateNormal]; [_recordButton.layer addAnimation:[self shine] forKey:@"shine"];//添加一个动画 } }//发光动画- (CABasicAnimation *)shine{ CABasicAnimation *animation =[CABasicAnimation animationWithKeyPath:@"shine"]; animation.fromValue = [NSNumber numberWithFloat:1.0f]; animation.toValue = [NSNumber numberWithFloat:0.0f]; animation.autoreverses = YES; animation.duration = 0.5; animation.repeatCount = MAXFLOAT; animation.removedOnCompletion = NO; animation.fillMode = kCAFillModeForwards; animation.timingFunction = [CAMediaTimingFunction functionWithName:kCAMediaTimingFunctionEaseIn]; return animation; }#pragma mark -- 录音结束(代理方法)- (void)onEndOfSpeech { [_recordButton setImage:[UIImage p_w_picpathNamed:@"话筒4.png"] forState:UIControlStateNormal]; [_recordButton.layer removeAnimationForKey:@"shine"]; }
识别音量
#pragma mark--NLU delegate - (void)onUpdateVolume:(float)volume { if (_olamiRecognizer.isRecording) { _waveView.present = volume/100; } } waveview: 根据sin函数 y=Asin(ωx+φ)+b //e.g.:1. CGContextRef context = UIGraphicsGetCurrentContext(); CGMutablePathRef path = CGPathCreateMutable(); CGContextSetLineWidth(context, 3); CGContextSetLineCap(context, kCGLineCapRound); CGContextSetAllowsAntialiasing(context, true); CGContextSetRGBStrokeColor(context, 124 / 255.0, 145 / 255.0, 155 / 255.0, 1.0); CGContextBeginPath(context); float y= (1 - _present) * rect.size.height; CGPathMoveToPoint(path, NULL, -10, y); for(float x=0;x<=rect.size.width;x++){ y= sin( 3*x/rect.size.width * M_PI + moveX/rect.size.width *M_PI ) *maxA + _currentLinePointY; CGPathAddLineToPoint(path, nil, x, y); } CGContextAddPath(context, path); CGContextDrawPath(context, kCGPathStroke); CGPathRelease(path);
界面差不多就这些,主要是看返回来的result
调用代理这个方法-(void)onResult:(NSData*)result; 其语义分析后的结果以一个json字符串的形式回调过来,对这个字符串进行解析,就可以获得想要的变量。
#pragma mark --返回结果- (void)onResult:(NSData *)result { NSError *error; __weak typeof(self) weakSelf = self; if (error) { NSLog(@"error is %@",error.localizedDescription); }else{ NSDictionary *json = [NSJSONSerialization JSONObjectWithData:result options:NSJSONReadingMutableContainers error:&error]; NSLog(@"json=%@",json); if ([json[@"status"] isEqualToString:@"ok"]) { NSDictionary *asr = [json[@"data"] objectForKey:@"asr"]; //如果asr不为空,说明目前是语音输入 if (asr) { [weakSelf processASR:asr]; } NSDictionary *nli = [[json[@"data"] objectForKey:@"nli"] objectAtIndex:0]; NSDictionary *desc = [nli objectForKey:@"desc_obj"]; int status = [[desc objectForKey:@"status"] intValue]; if (status != 0) {// 0 说明状态正常,非零为状态不正常 NSString *result = [desc objectForKey:@"result"]; dispatch_async(dispatch_get_main_queue(), ^{ _resultLabel.text = result;//输出不正常提示 _resultLabel.font = [UIFont systemFontOfSize:20]; [_resultLabel startAnimation]; _showTextView.text = asr[@"result"]; AudioServicesPlaySystemSound (soundID); }); }else{ NSDictionary *semantic = [[nli objectForKey:@"semantic"] objectAtIndex:0]; //对slot和算式的处理结果 [weakSelf processSemantic:semantic asr:asr]; //处理modifier NSArray *modifierArr = [semantic objectForKey:@"modifier"]; [weakSelf processModifier:modifierArr result:desc[@"result"]]; } }else{ _showTextView.text = @"请说出要计算的公式"; } } }#pragma mark --处理ASR语音对话节点- (void)processASR:(NSDictionary*)asrDic { NSString *result = [asrDic objectForKey:@"result"]; if (result.length == 0) { //如果结果为空,则弹出警告框 [self showAlert:@"没有接受到语音,请重新输入!"]; return; }else{ dispatch_async(dispatch_get_main_queue(), ^{ NSString *str = [result stringByReplacingOccurrencesOfString:@" " withString:@""];//去掉字符中间的空格 NSLog(@"answer result = %@",str); }); } }//处理semantic节点返回的slot - (void)processSemantic:(NSDictionary*)semanticDic asr:(NSDictionary *)asr { NSMutableArray *sumArr = [NSMutableArray array]; for (NSDictionary *dic in semanticDic[@"slots"]) { NSString *nameStr = dic[@"name"]; //遍历,然后把slot添加到数组里 NSString *textStr = [[sumArr componentsJoinedByString:@","] stringByReplacingOccurrencesOfString:@"," withString:@""]; NSLog(@"textstr=%@",textStr); if (![textStr isEqualToString:@""]) { _passString = [self replaceInputStrWithPassStr:textStr]; if (asr) { _lastAnswer = _resultLabel.text;//语音记录上一次记录 }else{ _lastAnswer = @""; } //第一次运算或者不再加 if ([_lastAnswer isEqualToString:@"error"]||[_lastAnswer isEqualToString:@""]) { if (asr) { dispatch_async(dispatch_get_main_queue(), ^{ _showTextView.text = [[textStr stringByReplacingOccurrencesOfString:@"2√" withString:@"√"] stringByAppendingString:@"="];//计算公式 }); textStr = [_calcultor calculatingWithString:_passString andAnswerString:@"0"]; }else{ textStr = [_calcultor calculatingWithString:_passString andAnswerString:@"0"]; } //有结果考虑再运算的步骤 }else{ //有结果再运算的情况 UniChar c = [_passString characterAtIndex:0]; if (c =='-'|| c == '+'||c == 'x'||c =='/') { dispatch_async(dispatch_get_main_queue(), ^{ _showTextView.text = [[_lastAnswer stringByAppendingString:[textStr stringByReplacingOccurrencesOfString:@"2√" withString:@"√"]] stringByAppendingString:@"="];//计算公式 }); textStr = [_calcultor calculatingWithString:[_lastAnswer stringByAppendingString:_passString] andAnswerString:@"0"];// } //有结果但是不想再运算 else{ dispatch_async(dispatch_get_main_queue(), ^{ _showTextView.text = [[textStr stringByReplacingOccurrencesOfString:@"2√" withString:@"√"] stringByAppendingString:@"="];//计算公式 }); textStr = [_calcultor calculatingWithString:_passString andAnswerString:@"0"]; } } dispatch_async(dispatch_get_main_queue(), ^{ AudioServicesPlaySystemSound (soundID1); _resultLabel.font = [UIFont systemFontOfSize:50.0]; _resultLabel.text = textStr; }); [_resultLabel startAnimation]; } }
后台返回:语音内容是显示在asr字段里,大家可能会有疑问后台怎么识别的我们语音的内容,这是由于我们之前在olami平台创建新应用后导入了一套识别相应内容的grammar,这样olami的语义解析功能会为我们自动识别出想要得到的变量内容。
比如我说:3+6乘九等于几?
对应grammar语法:[<再>][<数字一>]<符号一><数字二><符号二><数字三>[<结果>|<等于>]
返回结果:
json={ data = { asr = { final = 1; result = "\U4e09\U52a0\U516d\U4e58\U4e5d\U7b49\U4e8e\U51e0"; "speech_status" = 0; status = 0; }; nli = ( { "desc_obj" = { status = 0; }; semantic = ( { app = calculator; customer = 59530feb84aea6f385319c65; input = "\U4e09\U52a0\U516d\U4e58\U4e5d\U7b49\U4e8e\U51e0"; modifier = ( ); slots = ( { name = number3; "num_detail" = { "recommend_value" = 9; type = float; }; value = "\U4e5d"; }, { name = number1; "num_detail" = { "recommend_value" = 3; type = float; }; value = "\U4e09"; }, { name = number2; "num_detail" = { "recommend_value" = 6; type = float; }; value = "\U516d"; }, { name = symbol1; value = "+"; }, { name = symbol2; value = x; } ); } ); type = calculator; } ); }; status = ok; }
再加三等于几?
对应grammar:[<再>][<数字一>][<符号一>][<数字二>][<结果>|<等于>] 、
后台返回json字段:
json={ data = { asr = { final = 1; result = "\U518d\U52a0\U4e09\U7b49\U4e8e\U51e0"; "speech_status" = 0; status = 0; }; nli = ( { "desc_obj" = { status = 0; }; semantic = ( { app = calculator; customer = 59530feb84aea6f385319c65; input = "\U518d\U52a0\U4e09\U7b49\U4e8e\U51e0"; modifier = ( ); slots = ( { name = again; value = a; }, { name = number2; "num_detail" = { "recommend_value" = 3; type = float; }; value = "\U4e09"; }, { name = symbol1; value = "+"; } ); } ); type = calculator; } ); }; status = ok; }
计算过程:涉及到算法,数据结构堆栈问题,大概思路设置优先级,设置两个栈,一个数据栈,一个运算符栈,在运算符栈底添加#方便处理。获取表达式第一个元素如果是数据添加到数据栈中,元素如果是运算符,那么每次都要跟运算符栈定元素比较优先级,如果取得的运算符的优先级大于栈顶元素优先级时,该运算符直接进栈,优先级不大的话,就要取栈顶运算符优先运算,最后碰到#停止。如果有记忆上一轮结果的话,结果需要放到数据栈栈进行下一次处理
代码下载地址:https://github.com/zhaoshihui/calculator_olami_ios.git
标题名称:利用OlamiSDK实现语音控制计算器(iOS)
链接地址:http://pwwzsj.com/article/pejjdo.html