How to make AI robots finally talk to humans?

Waves of greetings to friends, objects falling to the ground, and other simple movements are easily understood by humans, but they can confuse the machine Twenty Billion Neurons TBN ), a startup based in Toronto and Berlin, is developing an artificial intelligence function that uses video and deep learning to enhance robots’ understanding of the visual world. TBN company CEO and chief scientist Roland Khomeini Milosevic ( Roland Memisevic ) recently at the forefront of Artificial Intelligence Conference (held in Silicon Valley, AI Frontiers conference to speak) on. 

Recently, TBN launched a support of artificial intelligence situation at a meeting in Montreal on perception incarnation ( AI-Powered context-Aware Avatar ) Millie Millie ). According to the company, Millie is a “life-size helper who interacts with you by observing and understanding your environment and what you are doing. ” The company plans to use Millie as its main product for retail and education. industry. 

In an interview with [email protected], Menicevich talked about his vision of the company and why “video is the best window for AI systems to understand how the world works. 

The text of the interview is organized as follows. 

[email protected]: Can you briefly introduce your personal experience, how do you get in touch with artificial intelligence? 

Roland Menishevich: My interest in artificial intelligence (AI) began with the reading of Douglas Hofstadter’s book at the age of fifteen or six. He wrote some popular science books focused on artificial intelligence. I stumbled upon one of his books in the bookstore and was very curious after reading it. It’s like magic, unique, fun, weird and wonderful. 

[email protected]: What impressed you the most? 

Menisevic: We learned a lot of our own knowledge through the development of AI. Humans are so weird creatures… just like picking up a mirror and seeing what humans are and why. A lot of things are amazing. This is the charm of AI.

It’s also an interesting combination, it’s very mathematical, it also involves philosophical issues, and even artistic in some respects. When you get to know AI in depth, there will be a lot of creative bursts. 

[email protected]: How is TBN generated? What are the opportunities you are trying to cope with? 

Menishevich: During my Ph.D. and later as an assistant professor at the University of Montreal, I was very interested in video understanding and was very attracted. This is not because of the video itself, but because I think video is the best window we can let the AI ​​system know how the world works, including understanding what objects are, how they work, what they are, how they behave and what they are. People call it intuitive physics or common sense.

At TBN, we have a large data generation operation system, and we ask a large number of people to shoot videos for us so that we can teach the AI ​​system world how to operate. Our company has only 20 people and is focused on solving this problem. Because we are very focused, this is the reason that appeals to me. 

[email protected]: For those unfamiliar with AI and how AI and video work together, can you explain what the company’s business means to consumers? 

Menisevic: When humans use language, they usually use analogy to express simple concepts to make advanced, abstract decisions. For example, a CEO might say, “We are brewing a storm in front of us.” Everyone knows what it means. If you look at how people use language, how to think and how to reason, it is always based on everyday experience.

Video is the best way to bring this knowledge to the AI ​​system, because video is a very rich source of information that conveys a lot of information about the world. For example, what we teach AI is that if I don’t hold an object, it will fall in a specific way and in a completely certain direction… all of which are immediately visible in the video. If you can explain these phenomena in the video, then you must have a basic understanding of these issues.

That’s why we create data that shows what’s going on in the world, and then asks the neural network to make predictions, for example, asking the AI ​​system to describe what it sees in the language. If the neural network masters this skill, it must indicate that they have absorbed some of the information in some way. 

[email protected]: It should be easier to understand in the context of medical and other industry backgrounds. Can you give an example of using it?

Menisevic: Medical care is a huge opportunity for video understanding applications, but the medical industry is heavily regulated and difficult to penetrate as a market.

There are many examples of application. For example, we started working with a hospital in Toronto to use gesture control so that the nurse who cares for the patient does not have to stop the work at hand to turn off the alarm, without taking off the gloves, pressing the button, putting on new gloves and continuing to take care of the patient. So the nurse’s work process is smoother.

Another example relates to making a record. When you care for a patient, it is necessary to record what you are doing. Everyone usually thinks this is cumbersome and time consuming. It’s very easy to watch your camera create a document, fill in all the content of your activity, and the sequence of activities. You then just need to check it again, modify some places, and then approve and say, “Okay, this is the log file.”

There are many more examples, especially for older care. See if anyone is falling, or just by talking to you to keep the old man alone. Ironically, although this technology can make a big difference in the medical field, it is also the most difficult to commercialize. For small companies like us, it is difficult to get involved in the medical field. 

[email protected]: Let us consider another industry with less regulation, such as retail. What is the potential for video understanding in this area? 

Messevich: There are many. What I am most interested in is the concept of companionship. Imagine an avatar or robot that welcomes you to the store, answering your questions about the items, prices, etc. you might be looking for. Or make you smile when you enter the store and enjoy the fun of interacting with artificial virtual creatures that can really look at you and get in touch with you in some way to increase customer engagement and satisfaction and increase Passenger traffic. 

[email protected]: What does it mean to “look at you”? 

Menishevich: The technology and data we are building allows robots to understand video. One major change is that we can give these robots the ability to look at you and understand what you are looking at on the screen.

If it’s a smart home speaker, you need to press the button and ask: “Hey, what’s the weather like tomorrow?” I said the robot can see you are approaching. They can wave to you and tell you, “Come on, let me tell you something.” They have a gaze, just like us. They look in a certain direction in order to focus on certain parts of the world. Just let their eyes point in a certain direction, they can convey what you see to you.

They can tell you that they are looking at you. They can also understand that you are watching them. You can communicate more naturally with these AI creatures than if you were to browse a directory or screen. 

[email protected]: What business models are you exploring to make these explorations a viable business? 

Menisevic: We license the technology. We authorize these neural networks, such as robots in stores to see you and understand what is happening. We also analyze this data because we generated massive amounts of data in the process. This data is a valuable source of information for companies that train their own applications.

[email protected]: It seems logical to use it on a self-driving car. 

Menishevich: But we have not stepped into the field of autonomous driving. This is a strategic consideration. The autopilot industry is already very crowded. We can provide other value in the automotive field, such as helping you control with your gestures in the car, or helping you understand the behavior of passengers. This is unique to us. We are willing to pay attention to the interior of the room and the car. Americans spend an average of 93% of their time indoors and in the car, so this is our focus. 

[email protected]: What are your goals for the next two years? 

Menishevich: We are looking for the possibility of scaling up the licensing opportunities. We have some ambitious projects on these creatures that can interact with you. Our technical goal is to enable them to do something. For example, cooking by recipe or some dance moves. On the business side, we focus on the number of users and revenue. 

[email protected]: Who are your main competitors, how do you differentiate yourself? 

Menisevic: There are many companies that have similarities, but no company currently focuses on this issue as much as we do. We are not worried that the market will be too saturated. But there is an extreme shortage of talent, and this is real competition. Big companies like Google, Amazon, Facebook, and Microsoft are all vying for talent. Every once in a while, you will also see that the services provided by cloud computing companies involve some of our features. So there is a little overlap. But all in all, our business is very targeted and there is not much competition.

[email protected]: So far, what do you know in managing your business? 

Menisevic: Our team is from 4 to 6 to 10 and 20, each time a new challenge. You need different processes, different cultures, to maintain everyone’s efficiency and good mood, as well as the health of the company. I used to be a university professor. It was a very different world, and it was largely personal. I believe that a dedicated team of 20 people can make unimaginable progress, but this group must be an effective working stakeholder. This is a challenge. Now, our status is very good and the cooperation is very effective. 

[email protected]: Is it challenging from professor to entrepreneur? 

Menisevic: Of course. But you grow up, learn a lot, understand human behavior and how the team works together. This is very interesting in itself. 

[email protected]: What is your future dream? 

Menisevic: Imagine a world where AI robots can look at you, understand you and talk to you. If there is no fundamental difference between human interaction and interpersonal interaction, I think this goal cannot be achieved. Because of our physical experience, such as pain and sorrow, AI will never really understand and empathize. But we can try to get close to this, and one day you can sit in front of a robotic friend and have an in-depth philosophical dialogue with the economic situation, or a similar situation. I foresee that one day, our AI companions can reason and think. 

[email protected]: Reasoning and thinking, indeed. But do you think AI can feel emotion? 

Menisevic: It won’t be in obvious ways. Maybe there is a way to instill some. 

[email protected]: They may imitate emotions, but they don’t feel emotions? 

Menisevic: Is this going back to the AI ​​system? I am not sure. Do you know that I am conscious? You can assume, but you can’t really prove it. If you are sitting in front of a device, the device may convey emotion in some way, you might say, “Well, it looks like conscious, but it’s a robot, so I don’t think it has emotion.”

This is a fundamental obstacle. You can’t feel the real feelings of another person at any moment. You can have some kind of empathy, you can understand the feelings of others, but you can’t prove it. You don’t know how much I have consciousness. This kind of obstacle is the same for the device. So I don’t think there is any difference in the end. People associate a certain mentality with the machine and get used to it. If you do something about the device that makes it feel hurt, you might even feel bad. But this judgment is difficult to justify.

Leave a Comment

Your email address will not be published.

In the news
Load More