AI for Science

Bridging AI to the sciences.

Collaborating with researchers and practitioners across diverse fields to pursue interdisciplinary inquiries through AI.

Purpose

AI for Science is the practice of bringing LLMs and machine learning into the questions of the humanities, social sciences, and natural sciences as new "observational instruments," quantitatively reconstructing forms of knowledge production that have traditionally relied on human labor and intuition. The Affectosphere Group has worked with researchers across a wide range of disciplines — education, political science, ecology, law, sociology, the humanities, public health, chemistry, medicine, cognitive science, and traffic engineering — bringing AI to bear on each field's problems in a tailored way. Rather than treating AI as a generic tool, we design it to align with each field's theoretical frameworks, data characteristics, and ethical constraints, supporting research questions that can only emerge from within that specific discipline.

Goal

What has become clear through repeated cross-disciplinary collaboration is that the value AI provides is by no means uniform across fields. In education, we have visualized the homogenization of thinking that arises from human-AI collaboration; in political science, we have quantified the geopolitical biases embedded in training data; in ecology, we have dramatically streamlined large-scale classification tasks; and in law, we have used qualifying examinations to draw the boundary of substitutability. AI for Science is not merely the application of methods but a practice of discerning, within each field's specific context, what AI can do and what must not be entrusted to it — generating insights that can only be reached through dialogue between domain experts and methodologists.

Looking for partners

We are looking for those who need AI collaboration

Whether you would like to incorporate AI into your research, analyze data together, or are simply wondering "could AI be useful in my field?" - we are eager to work with researchers and practitioners like you.

Field does not matter. We have collaborated with partners across a wide range of disciplines, spanning both the sciences and the humanities. From method selection, data design, and model evaluation to ethical considerations, Affectosphere Group walks alongside you with the techniques we have cultivated.

To begin, please use the form below to share your research theme or interests freely.

Initial meeting

Free (30 min)

A session to hear about your research theme and challenges, and to consider together whether the question is one AI can address, and if so, what approaches might be possible. Please feel free to reach out as a place to align directions.

Ongoing collaboration

Research and personnel fees

After the initial meeting, if we move forward with a full collaboration, we charge research and personnel fees on an actual-cost basis, depending on the theme, difficulty, duration, and deliverables. An estimated quote will be provided after the initial meeting, once the scope is organized.

Inquiry for collaboration and AI support

Your message will reach Affectosphere Group. We will follow up to arrange the initial meeting.

Case studies

A brief introduction to the AI-driven collaborative research that Affectosphere Group has carried out with researchers from other fields.

AI for Science is the practice of introducing AI into science as a new observational instrument, replacing evaluation and knowledge production that previously depended on "experience," "subjectivity," and "human labor" with quantitative, reproducible procedures. Not only in biology and physics but also in fields that meet the humanities and social sciences — education, politics, law, ecology — LLMs are now implementable as devices that "read" large-scale text, and the premises of research are being rewritten at a rapid pace. At the same time, using AI as an analytical instrument generates a new validity problem — the bias of the instrument itself — making the methodology of evaluation itself an object of research. The Affectosphere Group brings techniques cultivated in affective AI research for "treating the fluctuation, subjectivity, and bias of human judgment as data" into scientific questions in adjacent fields, examining them in a cross-disciplinary manner.

Education × AI

With the spread of generative AI in classrooms, writing education is being forced into a fundamental redesign — from a framework that "evaluates the quality of writing" to one that "evaluates the originality of thought." Argument Rarity-based Originality Assessment (AROA, 2026) operationalized originality as the "rarity of an argument within a reference corpus" and proposed a framework that quantifies it along four axes: structural rarity, claim rarity, evidence rarity, and inferential depth. A comparison of 1,375 human-written essays and 1,000 AI-generated essays showed that while AI nearly saturates writing quality (Q = 0.998), claim rarity reaches only about one-fifth of human levels, with a strong negative correlation (r = -0.67) between quality and originality. The follow-up study, Does AI Homogenize Student Thinking? (2026), compared 6,875 essays across five conditions and statistically demonstrated a "quality-homogenization trade-off": AI assistance substantially boosts quality (Cohen's d = 3.7-4.8) while simultaneously removing 68-78% of the variance in argumentative structure. Furthermore, patterns of homogenization differed qualitatively across generative model families, yielding a policy-relevant finding: it is dangerous to generalize from observations of a single model. Our lab argues that writing assessment in the AI era must shift from "how well it is written" to "whether the thinking differs from others."

Political Science × AI

In an era when LLMs generate and summarize documents on politics and security, how to render visible the national and partisan biases hidden inside models is a question that touches the foundations of democracy. The Affectosphere Group has approached this through four streams — social observation, model diagnosis, data construction, and political understanding — using the Ukraine-Russia war as a case. In Multifaceted Exploration (2024) and Sentiment Analysis of Japanese Twitter Users (2024), we analyzed 200,000 Japanese tweets with a BERT-based eight-emotion model (Plutchik-aligned), quantitatively extracting a "pacifism-based bipolar response" in which strong sadness, fear, and disgust toward territorial issues and nuclear-plant attacks coexist with joy and anticipation around support for Ukraine. Next, Assessment of Conflict Structure Recognition (2024) introduced the Emotion Inversion Consistency Rate (EICR), which measures whether emotion predictions consistently flip when country names are swapped in tweets, and demonstrated that LLMs continually pre-trained on Japanese (the Swallow family) exhibit lower EICR than English-based Llama2 — i.e., they carry a stronger negative bias toward Russia and Eastern Europe. The subsequent Corpus Development (2024) proposed a four-phase pipeline (generation → emotion-intensity expansion → NSP validity check → human QA) that semi-automates data construction previously dependent on human labor, ensuring reusability as a standard corpus for the security domain. Sentiment Bias and Security Analysis (2024) extended the analysis to the training-data side, scanning the four major corpora C4, RedPajama, OSCAR, and RefinedWeb with VADER and quantifying significant negative sentiment toward Russia and Iran and over-exposure of the USA — showing that the source of bias is already embedded at the training-data stage. Taken together, these seven papers gradually assemble a methodological framework that integrates social observation, model diagnosis, data construction, and source tracing to ask: "What must we measure to detect unfairness when LLMs are brought into security and political judgment?"

Ecology × AI

In biological taxonomy, species names (genus + specific epithet) encode the background of naming — morphology, habitat, dedication, culture — and etymological analysis requires long-term, specialist literature surveys; for example, classifying the names of 48,000 spider species took two years in one documented case. Evaluation of the Automated Labeling Method for Taxonomic Nomenclature (2025) used the JAFList spider dataset (48,464 species) to evaluate LLM-based automated labeling with prompt optimization combining Role-Playing, Few-Shot, and Chain-of-Thought. As a result, high precision close to human annotation was achieved for the morphology, geography, and personal-name categories, while precision dropped sharply for ecological/behavioral and modern/cultural categories. This concretely demonstrates in the biological context a "long-tail vulnerability" common to AI for Science as a whole: LLMs are strong with "direct lexical cues" but weak with categories requiring behavioral or cultural interpretation. Our lab plays the role of empirically drawing the boundary of "how far to delegate and where humans must take over" for LLM use in low-resource domains.

Law × AI

National qualifying examinations such as the bar examination or the Real Estate Transaction Specialist Examination (RETSE) function as gatekeepers for occupations requiring specialized knowledge and social responsibility, and whether AI can reach this bar is a touchstone for the question of "AI-driven occupational substitutability." Assessing GPTs Legal Knowledge in Japanese Real Estate Transactions Exam (2024) had GPT-3.5 and GPT-4 solve ten past exams (50 questions each) from 2016-2023 and compared the results to the passing threshold. While GPT-4 outperformed GPT-3.5, neither model reached passing levels, with errors concentrated on questions probing domain-specific rules such as tax law and the Real Estate Brokerage Act. On the other hand, adding auxiliary prompts such as "take customary law into account" improved accuracy on complex legal questions, showing that while AI is not yet at the stage of fully substituting legal practice, it is promising as a learner-support and legal-assistance tool. By quantifying this "space between passing and failing," the Affectosphere Group empirically articulates the social-deployment line for AI legal applications.

Across these four domains (education, politics, ecology, law), a common structure emerges: LLMs generate "averagely plausible outputs" with high quality, yet systematically drop the "tails of the distribution" — rare arguments, minority countries, long-tail nomenclature, domain-specific rules. By bringing in the stance cultivated in affective AI research of "treating the fluctuation of human judgment as information," the Affectosphere Group redefines these issues not as mere performance problems but as problems of evaluation methodology itself. For our lab, AI for Science is at once a site for deploying the diagnostic techniques honed in affective AI to socially significant questions, and a site for re-forging the methodology of affective AI itself through collaboration with domain experts.