Product Framework
Cryptoracle: Privately-Owned Data Assetization — Transforming Private Data from "Fragmented Information" into "Analyzable, Predictable Decision-Making Assets" (of data, by data, for data). Moving beyond the limitations of traditional financial databases—such as relational databases and market data platforms—we focus on converting non-standardized content from private crypto asset social networks into measurable, aggregatable, and model-ready data assets. These serve various scenarios including market monitoring, strategy development, and sentiment analysis.
Core Goal:
Centered on the private ecosystem within the crypto asset space, leveraging large language models’ semantic parsing capabilities to deeply deconstruct non-standardized content from Discord, Telegram, private KOL channels, niche community discussions, and more. This forms a multi-dimensional structured data system encompassing user identity tags, semantic sentiment tendencies, event correlations, and propagation path characteristics.
Utilizing large language models’ semantic understanding, NLP parsing, and dynamic tagging engines to transform fragmented chat records into measurable metrics.
Employing entity correlation algorithms to aggregate high-noise redundant information into traceable relationship networks.
Applying time-series data modeling to solidify highly dynamic real-time content into reusable data assets.
1. Foundation Layer
Objective: Address issues of data sourcing, storage, and compliance to provide high-quality structured data for upper layers.
1.1 Multi-Source Integration Module
Supports full-channel access to private social networks: Discord/Telegram community messages, private chat records, KOL-exclusive channel content, etc., covering text, images (OCR parsing), and interactive behaviors (likes, shares, @mentions).
Provides standardized integration interfaces: API connections, SDK embedding, manual uploads, etc., adaptable to various private scenarios (closed communities, one-on-one chats, exclusive channels).
1.2 Data Compliance & Governance Module
Privacy Protection: Automatic desensitization (e.g., hiding phone numbers, nicknames, and other sensitive information), configurable data retention periods based on compliance requirements.
Data Quality Governance: Deduplication, cleaning of outliers (e.g., garbled text, repetitive spam), ensuring usability of raw data.
Metadata Management: Builds a unified metadata warehouse documenting data assets’ "source, format, field meaning, update frequency, and relationships," supporting metadata retrieval and visualization to help users quickly understand data context.
1.3 Storage & Computation Engine
Structured Storage:
Relational databases (e.g., MySQL/PostgreSQL) store tagged data (user tags, event tags, etc.), ensuring high availability through master-slave architecture.
Time-series databases (e.g., InfluxDB, TimescaleDB) store real-time data streams (e.g., minute-level heat values, sentiment indices), supporting fast time-range queries.
New columnar databases (e.g., ClickHouse) store high-cardinality structured data (e.g., full community message records), optimizing aggregation query performance (e.g., "message count by token + date").
Distributed Storage Expansion: Introduces HDFS or object storage (e.g., S3, MinIO) for massive unstructured/semi-structured data (e.g., raw chat records, PDF documents), supporting PB-level scalability with replica mechanisms (default 3 replicas) for data reliability.
NoSQL Storage Adaptation: Uses document databases (e.g., MongoDB) for unstructured "complex entity data" (e.g., KOL personal information + historical message features) and graph databases (e.g., Neo4j) for "entity relationship networks" (e.g., user-token-community mesh relationships), enabling deep correlation analysis.
Hybrid Computation:
Real-time computation engines (e.g., Flink) process streaming data (real-time updates of heat/sentiment indices), supporting state management (e.g., cumulative 24-hour mention count for a token).
Batch computation engines (e.g., Spark) handle historical data analysis (e.g., weekly trends, strategy backtesting), with resource scheduling via YARN/K8s and dynamic scaling.
New "batch-stream integration layer" (e.g., Flink + Hive synergy) ensures unified computation standards for real-time and historical data, avoiding inconsistencies.
2. Data Mining Layer
Objective: Transform fragmented data into structured assets and analyzable dimensions, serving as the core capability carrier.
Cryptoracle’s underlying logic unit is a "five-dimensional structured data unit," standardizing basic elements of social content in a structured manner:
"Who (user) — Where (community) — Said What (content) — About Which Event (event) — Affected Which Token (asset)."
Dimension
Content Description
Example
Who (User)
Who is the speaker? Identity tags, activity behavior, etc.
@CryptoWhale (KOL)
Where (Community)
Which platform/group did the speech occur in? How active is it?
Telegram Group A
What (Content)
Speech content and its linguistic sentiment features, propagation attributes
"SOL is pumping!! 🚀🚀"
About What (Event)
Was a specific event mentioned? What type of event is it?
"Binance announces SOL listing a new feature"
Which Project/Token Affected
Which tokens/projects are primarily involved in the speech?
SOL, BNB
This five-dimensional structure is the foundation for metric calculation and tag system construction.
2.1 Multi-Dimensional Tagging System: From Semantic Recognition to Entity Attribution
The tagging system identifies and categorizes key entities in raw private text, providing behavioral context, attribute attribution, and semantic background. It is the upstream source for all metric construction.
2.1.1 Token Tags (CO-B-01)
Identify tokens mentioned in content, supporting tasks like project attribution, ecosystem structure modeling, and asset classification. Token tags include ecosystem information and technical attribute dimensions.
Label Dimension
Example Labels
Primary Tag (Ecological Attribute)
Public Chain / L2 / DeFi / Stablecoin / AI Sector
Secondary Tag (Application Scenario)
NFT / DEX / Derivatives / RWA
Risk Attribute
High Volatility / Small Market Cap / Trading Activity
2.1.2 User Tags (CO-B-02)
Categorize speakers into types such as regular users, KOLs, project teams, bot accounts, etc., to characterize behavioral traits, influence levels, and content credibility.
Label Dimension
Example Labels
Identity Type
KOL / Retail Investor / Promotional Account / Project Team
Activity Trait
High Frequency Active / Lurker Type / Suspected Bot
Preference Type
Futures Preference / NFT Player / MEME Holder
2.1.3 Community Tags (CO-B-03)
Describe the platform and channel attributes where discussions occur, including platform type (Discord/Telegram/X, etc.), group category, activity level, focus areas, etc.,用于 building propagation networks and information source profiles.
Label Dimension
Example Labels
Activity Level
High Activity / Medium Activity / Zombie Group
Platform
Telegram / Discord / Twitter
Timezone Preference
Asia / Europe & Americas / Global Mixed
2.1.4 Event Tags (CO-B-04)
Identify event-related content and provide structured annotations for event type (e.g., exchange listing, airdrop, attack), source, heat, and propagation scope. These are crucial for event-driven strategy modeling.
Label Dimension
Example Labels
Event Type
Breaking Event / Announcement / Rumor / Hot Discussion
Source Attribute
Official Announcement / KOL Disclosure / Media Repost
Sentiment Direction
Positive / Negative / Neutral
2.2 Metric System: From Text to Modelable Features
Based on the tagging system, Cryptoracle builds a systematic private metric system to quantify and model behavioral, emotional, and structural information hidden in raw text. All metrics have clear data sources, construction logic, and interpretability.
2.2.1 Three Structured Metric Perspectives:
Dimension
Definition
Example Keywords/Indicators
Content
What is the speech about? Topics, viewpoints, token references.
Token mentions, hashtags, technical term recognition
Style
Does the speaking style show emotion, exaggeration, extremity, or aggression?
Sentiment polarity, intensity, subjectivity, extreme expressions
Spread
What is the propagation path of this content? Who is the source? Has it been disseminated?
First mention, KOL chats, co-occurrence count, interaction frequency
2.2.2 Metric Categories & Examples
To better support diverse business needs, metrics are细分 into multiple categories, each targeting different dimensions of text features and market focus.
1. Activity Metrics (CO-A-01)
Purpose: Measure the discussion heat and user participation for tokens or events, reflecting market attention and information liquidity.
Key Examples:
Metric Name
Description
Total Community Messages
Number of valid messages related to the token in associated communities
Number of Active Users
Count of unique users mentioning the token within a specified time period
Number of Mentioning Communities
Number of active groups or channels discussing the token
Total User Interactions
Total number of interactive behaviors including likes, replies, and shares
Application Example:
Comparing "total community message volume" and "number of mentioning communities" helps distinguish whether a token’s popularity is localized or widespread, aiding event propagation assessment.
"Number of active users" helps detect influx of new users or retail activity.
2. Sentiment Metrics (CO-A-02)
Purpose: Quantify user emotional attitudes toward tokens or events and their dynamics, assisting in predict market sentiment trends and potential risks.
Key Examples:
Metric Name
Description
Positive Sentiment Ratio
Proportion of positively weighted sentiment expressions based on influencer impact
Negative Sentiment Ratio
Proportion of negatively weighted sentiment expressions based on influencer impact
Sentiment Momentum
Difference between positive and negative ratios, reflecting sentiment direction and intensity
Extreme Sentiment Intensity
Weighted frequency proportion of extreme sentiment vocabulary
Sentiment Consistency Score
Degree of consensus in sentiment expression among users
Sentiment Reversal Alert
Identifies critical shifts in sentiment from bullish to bearish or vice versa
Application Example:
High "sentiment consistency score" indicates unified market views and concentrated risk; low score may signal diverging opinions and potential volatility.
"Extreme sentiment intensity" often captures bubble or panic sentiments, guiding risk management.
3. Event Metrics (CO-A-03)
Purpose: Measure the propagation, impact scope, and discussion depth of specific events in communities.
Key Examples:
Metric Name
Description
Event Keyword Anomaly
Abnormal fluctuation in keyword discussion frequency exceeding historical thresholds within a short time period
Event Discussion Heat
Total volume of discussions related to a specific event within a designated cycle
Event Discussion Breadth
Number of active communities mentioning the event
Event Discussion Depth
Average volume of event discussions within a single community
New Topic Discussion Count
Number of newly emerging topics with heat exceeding a threshold value
Application Example:
Combining "event discussion breadth" and "discussion depth" determines whether an event is broadly or concentrated.
Monitoring "event keyword anomalies" enables rapid early warning and public opinion monitoring.
4. Attribution Metrics (CO-A-04)
Purpose: Extract trading logic and decision factors behind bullish/bearish views, providing basis for strategy design.
Key Examples:
Metric Name
Description
Bullish Trading Logic Factors
Decision factors and keywords extracted from bullish sentiment texts
Bearish Trading Logic Factors
Decision factors and keywords extracted from bearish sentiment texts
Bullish Factor Heat Evolution
Heat trend changes of bullish factors over time
Bearish Factor Heat Evolution
Heat trend changes of bearish factors over time
Bullish Factor Consistency
Consensus level of bullish factors across different user groups
Bearish Factor Consistency
Consensus level of bearish factors across different user groups
Application Example:
Using "factor heat evolution" captures changes in market sentiment and logic, aiding dynamic strategy adjustments.
"Factor consistency" measures the dispersion or unity of market views.
5. Confidence Metrics (CO-A-05)
Purpose: Evaluate the quality and reliability of data sources and metrics themselves, ensuring model robustness.
Key Examples:
Metric Name
Description
Core User Dependency
Proportion of content contributed by top 10% users
Active User Ratio
Percentage of active posting users in total community users
Data Coverage Completeness
Coverage scope of metrics across time and communities
Noise Filtering Rate
Proportion of valid data after filtering bot/invalid content
Application Example:
High "core user dependency" may indicate metric volatility relies on a few users, requiring cautious interpretation.
Low "active user ratio" suggests sparse data or high volatility.
2.3 Metric Construction Methodology: Tag × Metric Synergy
Building high-quality metrics requires following these key steps:
2.3.1 Define Modeling Objectives
The first step is to clarify the application scenario, which determines what information to extract from the vast text data. Common objectives include:
Token heat monitoring: Identify "which assets are rapidly gaining attention."
Event outbreak tracking: Detect "whether event propagation is accelerating."
Sentiment opinion research: Determine "if KOL attitudes toward a project are shifting."
Factor model development: Provide "private behavior factors" for quantitative models. Metric design must serve these goals and accommodate strategy usage (timeliness vs. stability, full-market vs. subset).
2.3.2 Use Tags to Define Context Boundaries
Tags provide "information boundaries": Which users/communities should be included in calculations? What content is noise vs. signal? Does an extreme metric originate from credible sources?
Type
Role Description
Data Source Definition
Determine which communities/users' content should be included: e.g., official announcement groups vs. speculative discussion groups vs. meme culture groups
Entity Recognition
Extract key content from messages: tokens, protocols, events, KOLs, etc.
Propagation Tracing
Identify event origins and diffusion paths, determine key nodes: first mentioners, opinion leaders, forwarding chains, etc.
Example: How Community Tags Help Understand Data Sources
A Telegram group tagged as:
Group type: General market discussion
Token coverage tags: [BTC, ETH, SOL, ARB]
Preference tags: Mainstream assets, occasional hot topic coverage 👉 This means the group may naturally pay less attention to "small-cap tokens." If a small token sees high discussion frequency here, it could signal unusual significance. Conversely, it might be sporadic. Tags help distinguish "signal" from "noise" when constructing metrics.
2.3.3 Metric Construction Methods
Within the context defined by tags, metrics extract quantifiable, modelable features from text data, such as:
Heat metrics: Message volume, mention frequency, interaction count, number of speakers.
Structural metrics: Propagation path depth, KOL ratio, first mention time.
Sentiment metrics: Sentiment polarity distribution, sentiment entropy, extreme sentiment intensity. Construction relies on tags:
Community tags determine which groups are included in a metric’s calculation.
Token tags define the entity boundary for metric statistics.
User tags decide whether to apply KOL weights or filter out bot noise.
Event tags help extract temporal paths, propagation windows, etc.
2.3.4 Tag × Metric Synergy Value
Tags provide boundaries and context for metrics; metrics tags quantitative expression and modeling power. Together, they form a "traceable, interpretable, modelable" data asset.
What Tags Provide...
What Metrics Solve...
Context, Structure, Semantics
Features, Trends, Signals
Data Interpretability
Data Modelability
Filtering and Categorization Capabilities
Quantification and Comparison Capabilities
👉 Only by viewing metrics within the correct tag context can we make reasonable judgments. For example:
A small token repeatedly mentioned in "speculative groups" may indicate opportunities.
Frequent project team messages in "official groups" do not necessarily imply community heat up.
2.4 Structured Data Process: Traceable Metric Construction Pipeline
Cryptoracle’s private data system aims to build an interpretable, traceable, and modelable private behavior metric system. Its construction follows a five-step:
Tag Identification → Data Attribution → Feature Extraction → Metric Construction → Model/Strategy Application ↑---------------------------------------------↓ (Interpretability Feedback)
This ensures every metric can trace back to its source context, belonging entity, construction logic, and strategy performance, forming a self-consistent explanation chain.
2.4.1 Structured Path for Raw Content
Raw social content (unstructured text) ↓ Entity recognition (user / community / token / event) ↓ Content tagging • Influence tags (speaker level, historical behavior) • Token attribution (mentioned entity mapping) • Sentiment tags (polarity/intensity) • Event tags (type, location, related objects) ↓ Metric calculation • Heat metrics (message volume, heat change rate) • Sentiment metrics (extreme sentiment intensity, consistency index) • Structural metrics (divergence rate, drift rate) ↓ Structured output (DataFrame: standard metric format)
2.4.2 Metric Output Example
2025/7/1 14:00
ETH
4321
2300
1800
490
0.78
0.21
positive
912
37
2025/7/1 14:00
SOL
2750
1200
1350
290
0.49
0.52
negative
610
22
Field Explanation:
community_volume: Total community messages related to the token.
sentiment_positive/negative: Number of positive/negative messages.
sentiment_extreme: Number of extreme sentiment messages (e.g., intensity > |0.9|).
emotion_consistency: Consistency index (e.g., stability measure of skew distribution).
emotion_divergence: Divergence index (degree of polarity difference).
dominant_emotion: Dominant sentiment direction.
speaker_num / group_num: Number of active users / active groups.
3. Service & Application Layer
Objective: Deliver data value to users efficiently through diverse carriers, catering to different roles.
3.1 Visual Analytics Platform
Drill-down interaction: Clicking chart elements (e.g., heat peaks) drills down to raw data (e.g., corresponding time’s message records, related users).
Customizable dashboards: Users can drag-and-drop components (trend charts, radar charts, timelines) to generate dedicated dashboards based on tag combinations (e.g., "BTC + Twitter + positive events" dashboard).
3.2 API Service Platform
Standardized interfaces: Provide tag data queries (e.g., "get list of tokens mentioned by 'top KOLs' in last 24 hours") and real-time metric (e.g., "trigger callback when 'negative tag density' exceeds threshold").
Documentation & debugging: Comes with API documentation, debugging tools, and supports example code generation for scenarios (e.g., Python call for heat query interface).
最后更新于