Our life is surrounded by a huge amount of textual information. For example, messages on smartphones, stories written in books, labels on bottles and cans, signboards, alerts, advertisements, posters, etc., are textual information. One point that should be emphasized is that text has, at least, four functions. First, text transmits some written messages to us, which is the case with smartphones and books. Second, text works as a label of an object, which is the case of the bottles and signboards. By putting some texts on a bottle and a storefront, we can understand the content of the bottle and the type of the store. Namely, a text is used for disambiguation of the objects. Third, text can give non-verbal information by its font style (or typeface). For example, in advertisement images, texts are often printed in some fonts whose impression fits the product in the advertisement. Fourth, text, or its component characters, can keep its legibility regardless of deformations. We can recognize each character as one of the alphabet letters, even when the character is badly hand-written. In this talk, I will introduce my recent trials to understand these four functions, mainly the last three functions: label, impression, and legibility. Most of them are realized using recent deep-learning technologies and textual information and character images collected by the technologies.
The “zero-resource languages” are the languages (usually minority languages) for which we cannot obtain language data to develop speech recognizers. We study a spoken term detection (STD) method for zero-resource languages using resource-rich languages, where we detect a spoken word from a speech database of minority languages. Conventional STD methods employ dynamic time warping (DTW) for detecting keywords from speech feature vector sequences such as MFCC. The problem is that the distance between MFCC features depends on the pronunciation of the word and the speakers of the keyword and the database. One idea to focus on the pronunciation and to ignore speaker variation is to employ phonetic posteriorgrams (PPGs), which are the vectors of posterior probabilities of phonemes. The proposed method combines PPGs extracted from phonemic classifiers of multiple languages. As a result, the method showed better detection performance in a zero-resource language than the method using PPGs of a single language.
To build the Edge AI systems that can learn, reason and help humans make better decisions such on image recognition, automotive car control, video surveillance etc., the AI VLSIs mimicking the functions of human brain have been widely investigated to achieve the excellent computational speed. However, the conventional CMOS type AI VLSIs have issues about the large power consumption. From above point of view, in this invited talk, it is discussed that CMOS/MTJ hybrid VLSI technology has impact in AI systems. NV-AI Processor require tough endurance for realizing deep learning and excellent CMOS compatibility for realizing high level fusion system between remembrance and judgment. Therefore, STT-MRAM is the best choice for NV-AI Processor due to its excellent endurance and compatibility with CMOS. After that, our previous developed two kinds of AI Processor of Brain-Inspired Processors and Neuromorphic Processors with CMOS/MTJ hybrid technology are shown. Finally, it is discussed that our Ultra-Low Power AI Processors and Neuromorphic Processors with CMOS/MTJ hybrid technology are one of most suitable way to realize Edge AI systems.
Recent brain neural structure analysis with a high resolution fluorescent microscopy technology produces sub-PB class large datasets. Transferring such large data between computation resources and storage pools via typical network needs to pay cost of time mainly due to network speed gating. The target of this research project is to establish computational storage platform that enables dynamic 3D visualization of brain neural structure analysis with large scale datasets. The computational storage concept aims at unified structural design of compute and data store functions to minimize data transfer frequency and data access latency between them. The prototype of the computational storage testbed embodying multiple compute nodes in close proximity to magnetic and flash based storage nodes with computational functions have been constructed. By using performance benchmark tool and the brain neuron 3D visualization application, the data access performances to/from storage pools were evaluated and analyzed.
In this research, we constructed a deep learning model to learn and predict several different subjective judgments by human (desire to eat, whether it is made for young people, etc.) for food images. By analyzing the learned model, we investigated the image features that contributed to human judgement. First, we show that our deep learning model successfully predict the different human subjective judgements. Next, we performed two analyses, representational similarity analysis and visualization analysis, to elucidate the features important for human judgements. In the representational similarity analysis, we investigated the level of image features that is relevant to the human judgments using representational dissimilarity matrices to quantify the representational similarity between the deep learning model and human judgements. The results show that different subjective judgements are represented in different layers in the model. In the visualization analysis, we analyze the parts of images which contribute to the judgment of the deep learning model using a visual explanation technique. The results show that the model uses relatively narrow regions of the images when it judges higher rating for higher-rated images by human raters. On the other hand, the model uses relatively broad regions when it judges lower rating for lower-rated images by human raters. These results provide insights about how human raters use image features to make subjective evaluations.
The prefrontal cortex is known to be responsible for flexible behavioral adaptation, however, its neural mechanisms on the single-neuron or the local circuit level is yet to be known. We trained macaque monkeys to perform tasks that require quick adaptation of behavior, with reference to object categories stored in long-term memory and implicit rule that change after random number of trials. Single-unit recording during the performance of these tasks revealed that within the prefrontal cortex such information is explicitly represented in the firing frequency of the individual neurons. Based on those single-unit data, we constructed a dynamical computational model of the prefrontal neural circuit (in preparation). Furthermore, we performed neural interventions by using trains cranial magnetic stimulation (TMS). By suppressing the local neural activity within the prefrontal cortex by applying low-frequency repetitive TMS, flexible behavioral control according to category and rule information was impaired, whereas the trial-and-error type of learning remained intact.
Sleep is considered the main single contributor to the overall health and well-being of human beings. Past research, both on clinical and healthy populations, showed how the lack of sleep can have detrimental effects at the physiological and psychological level, leading to single accidents or the development of pathologies that in some cases result in fatality. With the aim of helping to contribute to the health and well-being of our consumers, Givaudan developed a patented approach to design sleep enhancing fragrances, using EEG as its core neuroimaging technique. However, we are also aware of the breadth of methods used to sample brain activity in sleep-related experiments, not necessarily involving the use of fragrances as stimuli to improve sleep quality. Moreover, as we aim to develop solutions to improve other aspects of the consumer’s wellbeing, we feel there is a need to be able to measure brain activity outside the lab environment, in quasi-real life situations or, potentially, in domestic environments. This talk will primarily illustrate the process of creation of our DreamscentzTM technology, a patented method to design and test sleep-enhancing fragrances using our in-house EEG capabilities. The focus will then be on the use of alternatives to EEG when it comes to real-life environment experimental setups, moving away from the classic lab setting.
The dramatic spread of smartphones equipped with high performance sensors has enabled filming and recording of other people's faces and voices, and even collection of large quantities of biometric data such as fingerprints and irises. These can also be shared in cyberspace, not only violating privacy but also illustrating the risk of breaches of biometric authentication. This high-quality biometric data can be used as learning data, making it easy to create high-quality fake media such as deepfakes, which may negatively impact people's ability to make decisions. This talk will outline these threats and introduce technology for users to control distribution of their own biometric data in cyberspace and technology for detecting fake media.
The world is filled with living species, and their interactions drive the evolution of species. Among the various species, human has an outstandingly large impact on the other organisms. To realize a sustainable society, it is essential to collect and analyze human activities on a global scale in relation to the various species composing the environment. The vocabulary of a language includes information about the culture of the society. Therefore, a cross-linguistics comparison of vocabularies can reveal the characteristics and the history of each society. In this study, we focus on the names of animal species in various languages in the world and show that they have information of their geometric origin (ultimate etymology) as a whole. Through this study, we propose a new approach using the vocabularies in the world to reveal the relationship between human society and its surrounding species, as well as the history of trade between local societies.
The aim of this study is to investigate archaeological materials at the Saginoyu hospital no. 1 tunnel tomb stone coffin discovered in the Taisho era using X-ray CT and digital measurement methods. Micro-X-ray CT (ScanXmate-D180RSS270) makes it possible to understand inner structure and density distribution of the grave goods such as a sword made of gilt bronze and iron decorated with rounded ring and a dragon. This analysis revealed that the sword was made by an unprecedented method in Japan. This fact is extremely important to consider the place of production and genealogy of the material. In addition, the authors carried out cleaning and preservation treatment of metal products. It takes much time and efforts, but these procedures must be important to analyze more precisely in future. The authors also measured the house-shaped combined style stone coffin found from the Saginoyu hospital tunnel tomb by using structure from motion (SfM) method. The measured stone coffin is composed of twelve parts; two covers, two short sides, two long-sides composed of two large stone boards, two floor boards and smaller two boards. The floor is 1.98 m in the major axis and from 0.96 to 0.8 m in the minor axis. Many cutting marks are recognized on the surface of all stone slabs covered with soil. For SfM methods, we took approximately 1600 pictures of stone sarcophagus from various positions and directions. We used Agisoft Metashape Professional as SfM software. For each rock board, we processed a plenty of pictures by Metashape, and created dense cloud data. Mesh data based on dense cloud was also created by Metashape. Completed mesh data are arranged by Blender, free and open source 3D creation software, based on the cutting forms. The maximum height of inner space was estimated approximately 0.9 m. The PEAKIT analysis of these 3-D mesh data well reveal the distribution and directions of cutting marks. This detailed 3-D research enabled us to clarify how to make and construct the stone coffin, and to infer how to put the body and grave goods into it. The characteristics of this stone coffin show that the eastern part of Shimane Prefecture was closely related not only to the western part of Shimane but also to the central part of Kyushu region in the latter half of the 6th century to the first half of 7th century.
Our project aims to use technology to improve teaching and learning. Specifically, we estimated the body movements of each student from video clips of a class. Then, we classified the postures of the students (e.g., raising hands, falling asleep) from the estimated physical movements, and examined whether the classification is useful to understand the actual situation of the class and improve the class. This is the third year of this project. In this presentation, we will first give an overview of the project and its achievements to date. Then, we will describe this year's project and its results, and finally, we will discuss how the knowledge gained so far can be used to improve teaching and learning.
Texts in images provides people with important information, and text detection technology is used in various kinds of situations. However, traditional detection methods treat all the texts in the image equally and do not consider which text is important to the user. If you have a lot of texts in the environment, it can be too information and you may miss important information. If the importance of each text can be estimated, you can get important information efficiently. In this talk, I will introduce an attempt to estimate the importance of texts and to detect only important text in an image.
We conducted happiness surveys for around 22,000 respondents all over Japan on the three periods in Dec., 2019, Sep., 2020 and Dec. 2020. In this talk, we shall report the influences of COVID-19 on subjective well-being for Japanese evaluated based on the surveys before and after the outbreak. We applied a dynamic regression model that describes joint effects of individual and spatial factors to visualize space-time behaviours of Japanese subjective well-being. Namely we quantified the factors of happiness driven by individual factors, which are age, sex, income and so on, and those by spatial factors in prefectural levels after controlling the individual ones. Examining the dynamic changes of the individual and spatial factors on the three periods, we see that the COVID-19 outbreak in Japan has damaged the subjective well-being of young females most seriously and the crucial damages still are continuing especially for the low income group in them.
Researchers have long been studying technological innovation due to its significance in driving economic growth. As technology development accelerated in through the last few decades, disruptive innovation has attracted more attention due to its ground-breaking impact on the following technological progress and the fundamental changes it caused within and beyond certain sectors. Designed to protect the deemed values of inventions, patents and their information have been widely used in such studies to generate statistics for innovation evaluation. However, many of the commonly used patent measures are based on simply counting invention quantities. Such conventional metrics are incapable to provide in-depth understanding of the quality of innovation, especially the degree of disruptive novelty. We build on the literature of creative destruction to propose two new measures of disruptive technologies: the destructiveness index that captures technological-cohort recombination, and the portion of new knowledge origins. These are based on a network approach that we develop to identify clusters of patent technological classifications and measure the changes of such clusters over time. Time series analysis in the Pharmaceuticals and Computer Technology sectors in our previous work shows that at country level, destructiveness is positively related to both the volume and quality of patents, and new knowledge ratio is negatively related to patent volume but positively to quality. In this presentation we will first introduce the above-mentioned network method and indicators. Then we will focus on the relationship between disruptive innovation and firm growth. Entrepreneurship success in a technology-intensive field like Artificial Intelligence (AI) relies heavily on the innovation outputs. We collect registration and financing records of private start-up firms in the AI sector in China and measure their innovative disruptiveness based on their patent publications. The preliminary results show that when controlled for the initial capital size, companies producing more disruptive patents are more likely to survive and attract investment in the next 5-10 years. The research provides further evidence of the importance of disruptive innovation at a micro level. We suggest that decision makers in the government and private sectors consider enhancing business and enterprise R&D investment to foster the capability to produce highly influential innovation in high-tech sectors.