“Jubatus” is a processing platform for analyzing large-scale datasets which are increasing in the current big data era. It is an open-source software which was jointly developed by NTT Information Sharing Platform Laboratories and PFI. Detailed information can be found at the Jubatus official webpage.
Jubatus has the following 3 characteristics:
Up to now, large datasets were analyzed in batches for the most. Data is stored first, then analyzed overnight or in a specified time in batches and the results would come in the next day, creating a time lag. There are 2 issues concerning stored data and data analysis. One is that it is difficult to apply analysis results in real-time due to the time lag that occurs. The other is that processing of the data may become limited according to the data size.
Jubatus processes data as it arrives and can analyze data in real-time. With Jubatus, trends can be picked up instantly from sensor data (which are entered at an extremely high pace) and information from numerous click logs, allowing your service to better accommodate your users.
In order to cope with growing data, it is necessary to improve computational capabilities. Jubatus uses a scale-out design where compute nodes are increased to secure higher computational capabilities. In terms of costs and fault tolerance, this design is more advantageous than scale-up designs which use higher compute nodes.
Jubatus also adapts a mixed computation model where machine learning is possible in real-time under distributed environments. Each compute node used by Jubatus functions individually to secure high parallel performance. The results of machine learning (discussed below) can be analyzed efficiently using mixed operations which are interchanged intermittently.
Jubatus uses machine learning to perform analysis at a deeper level compared to traditional statistical processing systems. Machine learning is technology that finds patterns and rules from big data to be able to analyze and make computational decisions related to the data in the future. Jubatus supports online machine learning which enables data to come in continuously. Using online machine learning technology, Jubatus is able to expand to a larger distributed environment and operate efficiently.
In general, machine learning converts text data, image data and other raw data entered by users to a feature vector (an abstract data structure). Traditional machine learning libraries do not support this data conversion and only accept the data abstraction. Jubatus can also handle unstructured data which is usually often found in large datasets and also supports the entire data conversion process including the development of the machine learning algorithm according to raw data. As a result, machine learning analysis results can be attained even if users enter unstructured data.