Tangdi is one of the Microsoft China customers and builds intelligent robots, which will be used at public service places like hospitals and banks to introduce the services provided and help the customers to get the info the needed in a very convinient way. The robot uses Windows 10 to build a client side program and uses Bot Framework to host all the conversation logics and connects to LUIS(Language Understanding Intelligent Service) in Cognitive Services to empower it with conversation ability. Moreover, the robot leverages Face API and Emotion API in Cognitive Services to give it AI power to recognize customer’s gender, age, emotion and VIP customers.

Key Technologies Used

Core Team:

Special thanks to the Tangdi team, Microsoft Corp TED team, and Microsoft China DX Technical Evangelist team, and the Microsoft Audience Evangelism team. This project team included the following participants:

  • Wenhua Shi – Robot Group Lead of Tangdi Technologies
  • Haipeng Su – Director of Tangdi Technologies
  • Yanli Cai – Product Manager from Tangdi Technologies
  • Ari Bornstein - Microsoft SDE 2, Partner Catalyst Team - Israel
  • Yan Zhang - Microsoft Audience Evangelism Manager
  • Leon Liang – Microsoft DX Senior Technical Evangelist
  • Yuheng Ding – Microsoft DX Technical Evangelist

Customer profile

Shanghai Tangdi information technology co., LTD. (Tangdi Information) is a high-tech company focusing on providing the hardware and software products and system integration services to financial institutions, intelligence community and medical institutions. Their products cover Tangbao robot, credit reporting system, convenient self-service terminals, bank card list, Internet payment systems, mobile payment system and so on. Since 2009, Tangdi has expended from Shanghai to Beijing, Guangzhou, Fuzhou, Hefei with more than 1000 employees all around the country. As one of the top customer, they showed their product on Microsoft Booth in China Development Forum and present to Microsoft GCR CEO.

To learn more about Tangdi, visit http://www.tangdi.com.cn/

Tangdi

Problem statement

Tangdi Information is expending their business to robots in Financial and Medical industry mainly to introduce the services at hospital and bank, which requires a solution on the software layer. The robot needs be able to do natural language interaction with humans, and help customers get the information they need like in the back, the process of purchasing foreign exchange, the latest fanantial products… so to augment the onsite service people. At the same time, the robot needs to have power in computer vision to get customer’s face infomation including recognizing VIP customers.

Solution and steps

Based on the customer’s needs, Microsoft Chian DX TE team help Tangdi to adopt Cognitive Services & Bot Framework integrated with current win32 program to develop all the required services of their Robot, using Bot Framework to host all the conversation logics and connected to LUIS in Cognitive service to empower the Robot with conversation ability, Leveraging Face API and Emotion API in Cognitive Services to give the robot AI power to recognize customer’s gender, age, emotion and VIP customers, at the same time giving user one-sentence summary by Computer Vision API

Architecture Diagram

  • When users talk to Robot, the win32 program on the desktop will use the frontend STT(Speech to Text) to change voice to text message, then send to bot framework. When received the reply message, the win32 program will use TTS(Text to Speech) to tansfer to voice.
  • Bot framework handles all the the robot conversation ability, by connectting to LUIS to get the intent from the user and give the user relevant information.
  • Face API and Emotion API in Microsoft Cognitive Service are also integrated in the win32 rpogram and will support the robot to recognize customer’s gender, age, emotion and VIP customers. Besides, it will give one-sentence discribe about the people in the picture by calling the Computer Vision API, which will show more info besides people’s infomation like the glasses they wearing.

Prerequisite steps

LUIS Intent

Build Bot to host conversation interaction

  • Create chat bot

Bot

Integrated with Cognitive Services for an Intelligent User Experience

  • LUIS

Luis Intent

LUIS Intent

Luis Entities

LUIS Entities

Luis Utterance Training

LUIS Utterance

  • Computer vision

Computer_vision

  • Face API

Face_API

Technical delivery

  • Code Artifacts for computer vision

  • Set request URL for vision analyze. Due to the difefrent Azure version we use in China, be careful to set the URL to azure.cn instead of azure.com, which will help to improve the performance. ```c++ namespace tdos { VisionAnalyzeImgArg::VisionAnalyzeImgArg(const std::string &name) : BaseHttpArgsImpl(name) { SetUrl(“https://api.cognitive.azure.cn/vision/v1.0/analyze”); }

VisionAnalyzeImgArg::~VisionAnalyzeImgArg() { request_info_ = NULL; response_info_ = NULL; }


- Parse http interacts with the server in post mode;
```c++

int VisionAnalyzeImgArg::ParsePostInfo(int ret_code, const char *data, const int len) {
  response_info_ = NULL;
  JsonValue json_value_object;
  int ret = GetJsonUtil()->JsonObjectFromString(data, json_value_object);
  if (0 != ret) {
    std::cout << "json from object failed: " << data << std::endl;
    return ret;
  }

  response_info_ = VisionAnalyzeImgRespInfoPtr(new VisionAnalyzeImgRespInfo());
  if (NULL == response_info_) {
    return 1;
  }

  response_info_ = ParseVisionAnalyzeImgInfo(ret_code, json_value_object);

  return 0;
}
  • Parse Jason formatted visualFeatures, include categories, adult, tags, description, faces, color, imageType, metadata;
tdos::VisionAnalyzeImgRespInfoPtr VisionAnalyzeImgArg::ParseVisionAnalyzeImgInfo(const int ret_code, JsonValue &json_value) {
  VisionAnalyzeImgRespInfoPtr info_= VisionAnalyzeImgRespInfoPtr(new VisionAnalyzeImgRespInfo());
  if (NULL == info_) {
    return NULL;
  }

  if (json_value.isMember("categories")) {
    info_->categories = ParseCategories(json_value["categories"]);
  }

  if (json_value.isMember("adult")) {
    info_->adult= ParseAdult(json_value["adult"]);
  }

  if (json_value.isMember("tags")) {
    info_->tags = ParseTags(json_value["tags"]);
  }

  if (json_value.isMember("description")) {
    info_->description = ParseDescription(json_value["description"]);
  }

  info_->requestId = GetJsonUtil()->GetTextElement("requestId", json_value);

  if (json_value.isMember("faces")) {
    info_->faces = ParseFaces(json_value["faces"]);
  }

  if (json_value.isMember("color")) {
    info_->color = ParseColor(json_value["color"]);
  }

  if (json_value.isMember("imageType")) {
    info_->imageType = ParseImageType(json_value["imageType"]);
  }

  if (json_value.isMember("metadata")) {
    info_->metaData = ParseMetaData(json_value["metadata"]);
  }

  return info_;
}
  • Parse an array of categories indicating what visual feature types to return;
    tdos::CategoriesInfoPtr VisionAnalyzeImgArg::ParseCategories(JsonValue &json_object) {
    if (!json_object.isArray()) {
      std::cout << "invalid Categories array: " << std::endl;
      return NULL;
    }
    
    CategoriesInfoPtr categories_info = CategoriesInfoPtr(new CategoriesInfo());
    if (NULL == categories_info) {
      return NULL;
    }
    
    JsonValue json_value_array_unit;
    for (unsigned int i = 0; i < json_object.size(); ++i) {
      json_value_array_unit = json_object[i];
    
      CategoriesPtr categories = CategoriesPtr(new Categories());
      if (NULL != categories) {
        categories->name = GetJsonUtil()->GetTextElement("name", json_value_array_unit);
        categories->score = GetJsonUtil()->GetDoubleElement("score", json_value_array_unit);
    
        if (json_value_array_unit.isMember("detail")) {
          categories->detail = ParseDetail(json_value_array_unit["detail"]);
        }
        categories_info->categories_list.push_back(categories);
      }
    }
    
    return categories_info;
    }
    
  • Code artifacts for Face Detection:

  • Set request URL for face detect
tdos 
{
FaceDetectArg::FaceDetectArg(const std::string &name) : BaseHttpArgsImpl(name) {
  SetUrl("https://api.cognitive.azure.cn/face/v1.0/detect");
}

FaceDetectArg::~FaceDetectArg() {
  request_info_ = NULL;
  response_info_ = NULL;
}
  • Parse Jason formatted face data, get rectangle area for the face location on image.
RectanglePtr FaceDetectArg::ParseFaceRect(JsonValue &json_object) {
  RectanglePtr rect = RectanglePtr(new Rectangle());
  if (NULL == rect) {
    TDOS_LOG_ERROR("get face detect rect info failed");
    return NULL;
  }

  rect->top = GetJsonUtil()->GetIntElement("top", json_object);
  rect->left = GetJsonUtil()->GetIntElement("left", json_object);
  rect->width = GetJsonUtil()->GetIntElement("width", json_object);
  rect->height = GetJsonUtil()->GetIntElement("height", json_object);

  return rect;
}

FaceLandMarksInfoPtr FaceDetectArg::ParseFaceLandMarks(JsonValue &json_object) {
  return NULL;
}
  • Parse Jason formatted face data, get facialHair value consists of lengths of three facial hair areas;
FacialHairInfoPtr FaceDetectArg::ParseFacialHair(JsonValue &json_object) {
  FacialHairInfoPtr facial_hair = FacialHairInfoPtr(new FacialHairInfo());
  if (NULL == facial_hair) {
    TDOS_LOG_ERROR("get face detect facial_hair info failed");
    return NULL;
  }

  facial_hair->mustache = GetJsonUtil()->GetDoubleElement("moustache", json_object);
  facial_hair->beard = GetJsonUtil()->GetDoubleElement("beard", json_object);
  facial_hair->sideburns = GetJsonUtil()->GetDoubleElement("sideburns", json_object);

  return facial_hair;
}

HeadPoseInfoPtr FaceDetectArg::ParseHeadPoseInfo(JsonValue &json_object) {
  HeadPoseInfoPtr head_pose = HeadPoseInfoPtr(new HeadPoseInfo());
  if (NULL == head_pose) {
    TDOS_LOG_ERROR("get face detect head_pose info failed");
    return NULL;
  }

  head_pose->roll = GetJsonUtil()->GetDoubleElement("roll", json_object);
  head_pose->yaw = GetJsonUtil()->GetDoubleElement("yaw", json_object);
  head_pose->pitch = GetJsonUtil()->GetDoubleElement("pitch", json_object);

  return head_pose;
}
  • Parse Jason formatted face data, get the faceAttributes info such as emotion;

tdos::EmotionPtr FaceDetectArg::ParseEmotion(JsonValue &json_object) {
  EmotionPtr emotion = EmotionPtr(new Emotion());
  if (NULL == emotion) {
    TDOS_LOG_ERROR("get face detect emotion info failed");
    return NULL;
  }

  emotion->anger = GetJsonUtil()->GetDoubleElement("anger", json_object);
  emotion->contempt = GetJsonUtil()->GetDoubleElement("contempt", json_object);
  emotion->disgust = GetJsonUtil()->GetDoubleElement("disgust", json_object);
  emotion->fear = GetJsonUtil()->GetDoubleElement("fear", json_object);
  emotion->happiness = GetJsonUtil()->GetDoubleElement("happiness", json_object);
  emotion->neutral = GetJsonUtil()->GetDoubleElement("neutral", json_object);
  emotion->sadness = GetJsonUtil()->GetDoubleElement("sadness", json_object);
  emotion->surprise = GetJsonUtil()->GetDoubleElement("surprise", json_object);

  return emotion;
}

  • Parse Jason formatted face data, get the faceAttributes info;
FaceAttributesInfoPtr FaceDetectArg::ParseFaceAttributes(JsonValue &json_object) {
  FaceAttributesInfoPtr face_attribute = FaceAttributesInfoPtr(new FaceAttributesInfo());
  if (NULL == face_attribute) {
    TDOS_LOG_ERROR("get face detect face_attribute info failed");
    return NULL;
  }

  face_attribute->age = GetJsonUtil()->GetDoubleElement("age", json_object);
  face_attribute->gender = GetJsonUtil()->GetTextElement("gender", json_object);
  face_attribute->smile = GetJsonUtil()->GetDoubleElement("smile", json_object);
  face_attribute->glasses = GetJsonUtil()->GetTextElement("glasses", json_object);

  if (json_object.isMember("facialHair")) {
    face_attribute->facialHair = ParseFacialHair(json_object["facialHair"]);
  }

  if (json_object.isMember("headPose")) {
    face_attribute->headPose = ParseHeadPoseInfo(json_object["headPose"]);
  }

  if (json_object.isMember("emotion")) {
    face_attribute->emotion = ParseEmotion(json_object["emotion"]);
  }

  return face_attribute;
}

  • Parse Jason formatted face data, a face entry contain values include faceId, faceRectangle, faceLandmarks, faceAttributes;
FaceDetecInfoPtr FaceDetectArg::ParseFaceDetectInfo(const int ret_code, JsonValue &json_value) {
  FaceDetecInfoPtr info_= FaceDetecInfoPtr(new FaceDetecInfo());
  if (NULL == info_) {
    TDOS_LOG_ERROR("get face detect info failed");
    return NULL;
  }

  info_->faceId = GetJsonUtil()->GetTextElement("faceId", json_value);
  if (json_value.isMember("faceRectangle")) {
    info_->faceRectangle = ParseFaceRect(json_value["faceRectangle"]);
  }

  if (json_value.isMember("faceLandmarks")) {
    info_->faceLandmarks = ParseFaceLandMarks(json_value["faceLandmarks"]);
  }

  if (json_value.isMember("faceAttributes")) {
    info_->faceAttributes = ParseFaceAttributes(json_value["faceAttributes"]);
  }

  return info_;
}
  • Parse http interacts with the server in post mode;
int FaceDetectArg::ParsePostInfo(int ret_code, const char *data, const int len) {
  response_info_ = NULL;
  if (NULL != data) {
    TDOS_LOG_DEBUG("face detect result: " << data);
  }

  JsonValue json_value_object;
  int ret = GetJsonUtil()->JsonObjectFromString(data, json_value_object);
  if (0 != ret) {
    std::cout << "json from object failed: " << data << std::endl;

    return ret;
  }

  if (!json_value_object.isArray()) {
    std::cout << "invalid json string: " << data << std::endl;
    return 1;
  }

  response_info_ = FaceDetecResponseInfoPtr(new FaceDetecResponseInfo());
  if (NULL == response_info_) {
    TDOS_LOG_ERROR("get face detect response info failed");
  }

  JsonValue json_value_array_unit;
  for (unsigned int i = 0; i < json_value_object.size(); ++i) {
    json_value_array_unit = json_value_object[i];
    FaceDetecInfoPtr info = ParseFaceDetectInfo(ret_code, json_value_array_unit);
    if (NULL != info) {
      response_info_->face_detect_info_list_.push_back(info);
    }
  }

  return 0;
}
  • Parse Jason formatted face data, get the emotion intensity by face expression;
double FaceDetectArg::GetEmotion(EmotionPtr emotion_ptr, std::string &emotion_type) {
  if (NULL == emotion_ptr) {
    return 0;
  }

  std::map<std::string, double> score_map;
  score_map.insert(std::make_pair("anger", emotion_ptr->anger));
  score_map.insert(std::make_pair("contempt", emotion_ptr->contempt));
  score_map.insert(std::make_pair("disgust", emotion_ptr->disgust));
  score_map.insert(std::make_pair("fear", emotion_ptr->fear));
  score_map.insert(std::make_pair("happiness", emotion_ptr->happiness));
  score_map.insert(std::make_pair("neutral", emotion_ptr->neutral));
  score_map.insert(std::make_pair("sadness", emotion_ptr->sadness));
  score_map.insert(std::make_pair("surprise", emotion_ptr->surprise));

  std::vector<FacePair>name_score_vec(score_map.begin(), score_map.end());
  std::sort(name_score_vec.begin(), name_score_vec.end(), FaceCmpByValue());
  emotion_type = name_score_vec[0].first;

  return name_score_vec[0].second;
}

}

Conclusion

After two days of Hackfest, Microsoft China DX team together with Tangdi engineer team accelerated the building of the Tangdi robot in terms of the using of Cognitive services, improved the performance of usage of certain function, like the combination of local Windows 10 API and Cognitive service including natural language interaction,face api. Speeded up the face API calling time, reduce latency in about 50%+, increased the face recognition accuracy and security by adding real sense device.

Now when customer approching the robot, it will automatically call the customer’s name if he/she is vip member. For regular customers, it will simply call Mr./Ms. based on customer’s gender analyzing by Microsoft Face API through the customers’ image. Moreover, the robot will show the emotion status of the customer by Microsoft Emotion API to attract the users.

  • Opportunity going forward:

Though the Azure Intelligent cloud, customer has started to use bot and cognitive services, especially they have tried different API in cognitive service. In the next step, they shows great interest in the other APIs in Cognitive services. Also this gives us an opportunity for the further engagement of Azure Services.

Additional resources

Video shoot on Tangdi Robot: https://youtu.be/1Uwic7LqBbs"