In the big data era of the Internet today, not only is information related to the web such as access logs of websites or buyer information on e-commerce sites stored, but information used in the real world such as loyalty program balances for retail stores and geolocation data retrieved from GPS devices are also stored. Datasets are becoming more diverse and larger by the day – they need to be dealt with.
In the future, big data is expected to become more complicated and the method in which it will need to be processed will naturally become more diverse. A disparate range of technologies will be required to solve this dilemma. With this in mind, PFI provides products and services according to research and development that focuses on the following technological fields.
“Search and recommendation” indicates the technology that finds information from big datasets and matches information that suits the users. We conduct extensive R&D on “search technology” which focuses on allowing users to “find desired information quickly” and “recommendations” which recommends information to users after analyzing the corresponding user’s preferences. PFI interprets these two technologies as “matching people with information” and aims to integrate the technologies furthermore in the future.
Since its inception, PFI has provided “Sedue”, a comprehensive search platform which combines search and recommendation technologies.
Machine learning is the field in which PFI is currently most focused on as it is forms the groundwork for various other technologies. It is technology that automatically learns attributes and rules contained in datasets. A typical example of machine learning is spam filtering. By classifying emails to be spam or not, email clients can automatically determine whether a specific email should be treated as spam or a valid email.
One of the advantages of machine learning is how attributes can be tagged in data and then automatically filtered to be used in learning algorithms. With such, machine learning is a powerful means to solve the complexity of big data because it can automatically pick up useful content from massive amounts of data.
In October of 2011, PFI collaborated with NTT Information Sharing Platform Laboratories and released “Jubatus”, an open-source real-time machine learning platform. The software was released as open-source to bring more mass appeal for machine learning and to increase fields where it can be utilized.
Natural language processing allows meaningful information/attributes to be extracted from words generated by humans in order to be translated to (or become machine readable in) another language. PFI uses world-class natural language processing technology and continues research in related fields to improve machine learning methodologies.
We work in the field of natural language processing because it is an applied field of study tied to machine learning. Research on machine learning relating with natural language processing has been conducted even before the Internet was introduced and continues to this day. Analysis and studies on natural language processing and its applicable fields are widely recognized. Considering how textual information is accumulating more and more everyday, it is easy to understand why natural language processing is the most active field of study relating with machine learning.
Natural language processing technology is applied the aforementioned PFI products. In addition, we also provide a tool based on natural language processing that can automatically create a dictionary from internal documents, product information and other data.
Distributed processing (computing) technology is as it indicates a computing method which utilizes computer resources in a distributed/parallel manner. Up to now, PFI has implemented distributed processing systems to handle search related queries of large information and to function as an effective method to deal with large datasets. As large-scale information retrieval systems are extremely complicated distributed processing systems, it is important first to make sure that data is retained properly and inconsistencies do not occur. Then, it is essential to apply indexing against the corresponding data. This process requires high I/O throughput and enormous computer processing. Not only does computational processing need to be organized in a parallel manner but I/O processing needs to be set up with a distributed computing design. Continuous search requests requiring high throughput will be queried against the indexed content. For businesses such as e-commerce sites and those that use search based advertisements, system downtimes can mean heavy losses and for this reason it is important that the system is robust.
PFI proactively adapts open-source large-scale distributed systems such as Hadoop in order to build a software development environment for distributed systems that require big data handling, real-time processing and robustness. PFI uses the knowledge acquired from such not only to develop in-house products but also to provide consulting services.
PFI is open to collaborations and provides consulting services relating with the technology fields mentioned here. For details, please see here.