What is AI Search?

Lay Jeon

7 min read

∙

Aug 28, 2024

History of computer information retrieval technology

The history of search technology has evolved along with the advancement of computer science and is closely linked to the development of the web. Search technologies aim to efficiently find and filter large amounts of information and provide users with the information they need. Here we will describe the main stages of advancement in search technology.

1. Early Search Technologies (1960s-1980s)

The early stages of search technology began with the foundations of computer science. In the 1960s and 1970s, text-based searches primarily took place in structured databases. During this period, the field of Information Retrieval (IR) began to develop, and basic concepts such as Vector Space Model and Boolean Search were developed.

Vector Space Model (VSM): A model that measures similarity by expressing documents as mathematical vectors. The similarity between search words and documents is calculated using word frequency (TF, Term Frequency) and inverse document frequency (IDF).
Boolean Search: Use AND, OR, NOT operators to search documents based on the presence or absence of specific keywords.

2. Early web search engines (1990s)

Web search engines emerged in the 1990s with the birth of the World Wide Web (WWW). Early web search engines retrieved information through static page indexing.

Archie (1990): The first Internet search engine, which indexed and searched files stored on FTP servers.
Veronica and Jughead (1991): Similar to Archie, it used the Gopher protocol to enable searching of Internet resources.
Emergence of web search engines: Early web search engines, such as "Wandex" in 1993 and "Aliweb" in 1994, indexed meta tags and titles of web pages.

3. Birth of modern search engines (1996-early 2000s)

The late 1990s and early 2000s were an important period in the development of search engines. During this period, search algorithms advanced and large-scale indexing and ranking technologies were developed.

AltaVista (1995): It was the first to provide natural language search and supported multimedia search such as images and videos.
Advent of Google (1998): Google introduced the PageRank algorithm to calculate the importance of web pages. PageRank determines the ranking of search results by assessing the importance of pages based on the number and quality of links. This was a huge step forward in terms of search accuracy and relevance.

4. Innovation and development of search technology (mid-2000s-present)

Since the mid-2000s, search technology has advanced beyond information retrieval to optimize user experience.

Contextual Search: A technology that provides relevant search results by understanding the user's search intent and context. Machine learning (ML) and natural language processing (NLP) technologies were applied.
Personalized Search: Provides customized search results based on the user's search history, location, interests, etc.
Rise of mobile search and voice search: The importance of mobile search has increased with the spread of smartphones, and voice search is becoming increasingly popular with the development of voice recognition technology. Representative examples include Apple's Siri, Google Assistant, and Amazon's Alexa.
Deep learning and AI-based search: As deep learning technology is applied to search algorithms, the accuracy and efficiency of searches have been greatly improved. Google introduced the Bidirectional Encoder Representations from Transformers (BERT) algorithm in 2019 to better understand the meaning of search queries.

5. Search technology of the future

Future search technologies will increasingly rely on artificial intelligence (AI) and machine learning. Technologies such as multimodal search (searching multiple formats such as text, images, and video simultaneously) are expected to advance, and more sophisticated personalization and predictive search features will be added. Additionally, search technology in augmented reality (AR) and virtual reality (VR) environments will also advance.

The history of search technology has continued to evolve in accordance with advances in computer science and changes in user needs, and its scope and depth will continue to expand along with various technological developments.

Search Technology Before the Advent of Computers

Library Classification System:
- Dewey Decimal Classification (DDC): A book classification method developed by Melvil Dewey in 1876. It classified books in the library by dividing them into 10 major subject categories, further subdividing them, and assigning them numerical codes. This system organizes bookshelves according to the topic of the book and allows users to easily find books on a specific topic.
- Library of Congress Classification (LCC): Another classification developed by the Library of Congress, which uses a combination of alphabets and numbers to classify the subject matter of books. This taxonomy has been widely used, especially in academic libraries.
Card Index System (Card Catalog):
- From the 19th century to the early 20th century, libraries managed literature through a card index system. The card index produced cards that recorded the title, author, topic, etc., for each book or piece of literature, organized them in alphabetical order, and stored them in a drawer card box.
- When a user visited a library and wanted to find a specific book or topic, they would look up the corresponding card and check the location of the book. Although this was a very systematic and efficient method, it had limitations in managing massive amounts of data.
Index and Abstracts System:
- There was a service that indexed academic papers or literature on specific topics and provided abstracts. For example, organizations that publish scientific papers or technical reports produced and distributed subject indexes and summaries of the papers. This helped researchers and scholars find the information they needed.
- Index Journal: In the academic field, there were index journals that were published regularly by collecting the titles, authors, and publication years of papers or books related to a specific topic. These indexed journals helped researchers easily find the latest literature on topics of interest.
Telephone Directories and Commercial Directories:
- Means of information retrieval for the general public included telephone directories and commercial directories. Yellow pages made it easy for people to find contact information for individuals or businesses, and commercial directories listed local stores and services so they could be searched.

Before the advent of computers, search techniques were primarily based on systematic cataloging and indexing systems in physical information institutions, such as libraries. Searches during this time were not electronic but were managed manually, requiring users to navigate on their own. This method enabled systematic management and access to information, but with the introduction of computers and electronic search technology, more efficient and faster searches became possible.

Difference Between Web Search and AI Search

In its broadest sense, search is the process of seeking information or content from a collection of data, whether that be a physical library, a digital database, or the entire internet. Search mechanisms use algorithms to retrieve, rank, and display the most relevant results to the user based on their query.

Web Search

Mechanism: Web search engines like Google, Bing, or Yahoo use keyword-based algorithms to scan indexed pages across the internet. They rank and return results based on factors like relevance, popularity (often influenced by backlinks), and keyword matches. These engines typically rely on a vast index of web pages that have been crawled by bots, and the results are static unless the index is updated.
User Interaction: Users input specific keywords or phrases, and the engine returns a list of links to web pages that match those terms. The user must sift through these results to find the desired information.
Limitations: Web search results are only as good as the algorithm's ability to understand the intent behind keywords. While advanced, they can struggle with ambiguous queries, or those requiring deep contextual understanding.

AI Search

Mechanism: AI search leverages artificial intelligence and machine learning to understand and interpret more complex queries. Instead of relying solely on keyword matching, AI search systems can understand natural language, infer user intent, and even provide contextual or predictive results. They use models trained on vast amounts of data to generate or refine responses.
User Interaction: Users can engage in more conversational or natural language queries, and AI search engines can provide direct answers, summaries, or generate content on the fly, instead of just linking to existing web pages. For instance, AI-driven systems like ChatGPT can respond to detailed questions or requests by synthesizing information from various sources, rather than just retrieving and displaying it.
Limitations: While powerful, AI search can sometimes produce information that isn't entirely accurate (hallucinations), and it depends heavily on the quality and recency of the training data.

Key Differences

Purpose and Output: Web search focuses on retrieving and ranking existing documents or web pages, whereas AI search aims to understand and generate information based on user input, often in real-time.
Complexity of Queries: AI search can handle more complex, conversational, and ambiguous queries by understanding context and intent, while web search is more effective with straightforward keyword-based searches.

Pros and Cons of AI Search

AI search represents a significant advancement in how we interact with and retrieve information. However, like any technology, it has its pros and cons:

Pros of AI Search

Contextual Understanding: AI search engines can understand the context behind a query, allowing for more accurate and relevant results. This is particularly useful for complex or ambiguous queries where traditional keyword-based searches might fail. AI can interpret natural language and infer user intent, making interactions more conversational and intuitive.
Personalization: AI search can tailor results based on a user's preferences, behavior, and history. By learning from past interactions, AI search engines can provide more personalized recommendations, improving the relevance of the information presented to each individual.
Efficiency and Speed: AI search can quickly sift through massive amounts of data to deliver precise answers. Unlike traditional search, which often requires users to browse through multiple links, AI search can provide direct answers or summaries, saving time and effort.
Natural Language Processing (NLP): AI search can handle queries posed in everyday language, making it more accessible to users who may not know the exact keywords to use. This opens up search capabilities to a broader audience, including those less familiar with digital search tools.
Multimodal Search: AI search engines are increasingly capable of processing and integrating information from various formats—text, images, video, and even audio. This allows users to search across different types of media simultaneously, offering a more holistic search experience.
Continuous Learning and Improvement: AI search systems can continuously learn and improve over time. As they process more queries and gather more data, they become better at predicting user needs and refining the accuracy of their responses.

Cons of AI Search

Accuracy and Reliability: While AI search engines are powerful, they are not infallible. They can produce "hallucinations" or generate information that is inaccurate or misleading. This is especially problematic if users trust the AI output without verification.
Bias and Fairness: AI search systems can inadvertently perpetuate biases present in the data they were trained on. If the training data is skewed, the AI's responses may reflect these biases, leading to unfair or unbalanced results. Addressing these biases is an ongoing challenge.
Transparency and Explainability: AI search engines often operate as black boxes, making it difficult for users to understand how certain results were generated. This lack of transparency can lead to trust issues, as users may not know why a particular piece of information was presented or how it was derived.
Privacy Concerns: AI search engines that personalize results often rely on collecting and analyzing user data. This raises concerns about privacy, as sensitive information could be stored, shared, or used without the user's explicit consent.
Dependency and Reduced Critical Thinking: As AI search engines become more prevalent and efficient, there is a risk that users may become overly reliant on them, potentially reducing their own critical thinking and research skills. Users might accept AI-generated answers without further questioning or exploring alternative perspectives.
Resource and Energy Consumption: AI search engines, particularly those based on large language models, require significant computational power and energy. This can have environmental implications, as the energy required to train and run these models can be substantial.

Conclusion

AI search offers a transformative approach to information retrieval, with significant benefits in terms of accuracy, personalization, and user experience. However, it also presents challenges related to accuracy, bias, privacy, and resource consumption. As AI search continues to evolve, addressing these cons will be crucial to maximizing its potential while minimizing its risks.

Reference

Written by Lay Jeon

BDM of DeepNatural. With deep expertise in AI technologies, I deliver value to our clients. I'm excited to share their insights on the future of AI and how they can transform business operations.

What is LLMOps?

LLMOps, or Large Language Model Operations, is a specialized field emerging at the intersection of AI and operational management. It primarily focuses on the lifecycle management of large language ...

Anson Park

∙

5 min read

∙

Dec 8, 2023

Llama2 plays a crucial role in the open-source LLM ecosystem.

Trending Open Source LLMs in 2024

Open-source AI, including LLMs, has played a crucial role in the advancement of AI technologies. Most of the popular LLMs are built on open-source architectures like Transformers, which have been foun ...

Anson Park

∙

5 min read

∙

Dec 19, 2023

How to Fine-Tune Open Source LLMs for My Specific Purpose

The latest LLaMA models, known as Llama2, come with a commercial license, increasing their accessibility to a broader range of organizations. Moreover, new methods enable fine-tuning on consumer GPUs ...

Anson Park

∙

10 min read

∙

Dec 20, 2023

Which LLM is Better - Open LLM Leaderboard

The Open LLM Leaderboard is a significant initiative on Hugging Face, aimed at tracking, ranking, and evaluating open Large Language Models (LLMs) and chatbots. This leaderboard is an essential resour ...

Anson Park

∙

8 min read

∙

Jan 7, 2023

Which LLM is better - Chabot Arena

Chatbot Arena is a benchmarking platform for Large Language Models (LLMs), utilizing a unique approach to assess their capabilities in real-world scenarios. Here are some key aspects of Chatbot Arena ...

Anson Park

∙

7 min read

∙

Jan 9, 2023