analyzing-data
About
This skill performs comprehensive data analysis including statistical analysis, visualization, and pattern detection. Use it when users request data insights, statistical summaries, or data visualizations for datasets. It automatically handles data understanding, analysis workflows, and report generation for data science tasks.
Documentation
Data Analyzer
This skill performs comprehensive data analysis with statistical methods, visualizations, and automated reporting.
When to Use This Skill
Invoke this skill when the user:
- Asks to analyze a dataset
- Wants statistical insights
- Needs data visualization
- Requests pattern detection
- Mentions data analysis, statistics, or data science
- Wants to generate analysis reports
Analysis Workflow
Step 1: Data Understanding
Initial Assessment:
- Identify data format (CSV, JSON, Excel, etc.)
- Determine data size and structure
- Understand business context
- Clarify analysis objectives
Use the analysis script:
python scripts/analyze.py data.csv --explore
Step 2: Data Quality Check
Validation:
- Data loads successfully
- Required columns present
- Data types appropriate
- Missing values identified
- Outliers detected
- Duplicates checked
Quality Report:
python scripts/analyze.py data.csv --quality-report
Step 3: Statistical Analysis
Perform analysis based on data type and objectives.
For Descriptive Statistics:
- Mean, median, mode
- Standard deviation, variance
- Quartiles and ranges
- Distribution shape
For Correlation Analysis:
- Pearson correlation
- Spearman rank correlation
- Covariance matrix
For Advanced Analysis: See REFERENCE.md for:
- Hypothesis testing procedures
- Regression analysis methods
- Time series analysis
- Clustering algorithms
Step 4: Visualization
Create appropriate visualizations:
Univariate Analysis:
- Histograms for distributions
- Box plots for outliers
- Bar charts for categories
Bivariate Analysis:
- Scatter plots for relationships
- Line charts for trends
- Heatmaps for correlations
Multivariate Analysis:
- Pair plots
- 3D visualizations
- Dimensionality reduction plots
Generate visualizations:
python scripts/analyze.py data.csv --visualize --output-dir ./charts
Step 5: Report Generation
Create analysis report using templates from FORMS.md:
cat FORMS.md # View available report templates
python scripts/analyze.py data.csv --report executive-summary
Analysis Types
Pattern 1: Exploratory Data Analysis (EDA)
Objective: Understand data characteristics and relationships
Steps:
- Load and preview data
- Generate summary statistics
- Check distributions
- Identify correlations
- Detect outliers
- Document insights
Quick EDA:
python scripts/analyze.py data.csv --eda
Pattern 2: Comparative Analysis
Objective: Compare groups or time periods
Steps:
- Define groups/periods
- Calculate group statistics
- Test for significant differences
- Visualize comparisons
- Interpret results
See REFERENCE.md section "Statistical Testing" for test selection.
Pattern 3: Trend Analysis
Objective: Identify patterns over time
Steps:
- Prepare time series data
- Check for seasonality
- Calculate moving averages
- Fit trend lines
- Forecast future values
See REFERENCE.md section "Time Series Methods" for details.
Pattern 4: Predictive Modeling
Objective: Build models to predict outcomes
Steps:
- Feature engineering
- Train/test split
- Model selection
- Training and validation
- Performance evaluation
See REFERENCE.md section "Machine Learning" for model details.
Data Type Handling
Numerical Data:
- Summary statistics
- Distribution analysis
- Correlation analysis
- Regression modeling
Categorical Data:
- Frequency tables
- Cross-tabulations
- Chi-square tests
- Category encoding
Time Series Data:
- Trend decomposition
- Seasonality detection
- Autocorrelation
- Forecasting
Text Data:
- Frequency analysis
- Sentiment analysis
- Topic modeling
- See REFERENCE.md section "Text Analytics"
Common Issues and Solutions
Issue: Missing Values
- Strategy 1: Remove rows (if <5% missing)
- Strategy 2: Impute with mean/median/mode
- Strategy 3: Use advanced imputation (KNN, MICE)
- See REFERENCE.md section "Missing Data Handling"
Issue: Outliers
- Detection: IQR method, Z-score, isolation forest
- Action: Remove, cap, or transform
- Context: Business rules may define valid outliers
Issue: Imbalanced Data
- Resampling techniques
- Class weights
- Synthetic data generation (SMOTE)
Issue: High Dimensionality
- Feature selection
- PCA or t-SNE
- Domain knowledge filtering
Output Formats
The skill can generate reports in multiple formats:
Executive Summary:
- Key findings (3-5 bullets)
- Critical metrics
- Recommendations
- See FORMS.md template "Executive Summary"
Technical Report:
- Methodology
- Detailed results
- Statistical tests
- Visualizations
- See FORMS.md template "Technical Report"
Dashboard Format:
- Interactive visualizations
- Key metrics at a glance
- Drill-down capability
Generate specific format:
python scripts/analyze.py data.csv --format executive
python scripts/analyze.py data.csv --format technical
python scripts/analyze.py data.csv --format dashboard
Validation Checklist
Before finalizing analysis:
- Data quality verified
- Appropriate methods selected
- Assumptions validated
- Results interpreted correctly
- Visualizations clear and labeled
- Report matches requested format
- Recommendations actionable
Analysis Scope
Quick Analysis (5-10 min):
- Basic statistics
- Simple visualizations
- Key findings only
Standard Analysis (20-40 min):
- Comprehensive statistics
- Multiple visualizations
- Correlation analysis
- Formatted report
Deep Analysis (1-2 hours):
- Advanced modeling
- Hypothesis testing
- Multiple methodologies
- Executive + technical reports
Ask user for preferred scope if unclear.
Example Analysis
Input: sales_data.csv with columns: date, product, region, quantity, revenue
Output:
Key Findings
- Revenue increased 23% year-over-year
- Product A accounts for 45% of total revenue
- Western region shows strongest growth (31%)
- Seasonal peak in Q4 (38% of annual sales)
Statistical Summary
- Mean daily revenue: $12,450
- Median daily revenue: $11,200
- Standard deviation: $3,890
- 95% of days: $5,000 - $20,000
Visualizations Generated
- Revenue trend line (2023-2024)
- Product revenue pie chart
- Regional comparison bar chart
- Seasonal pattern heatmap
Recommendations
- Increase inventory for Product A in Q4
- Investigate Western region success factors
- Plan marketing campaigns for Q2-Q3 (slower periods)
Advanced Features
For complex scenarios, this skill integrates with:
REFERENCE.md sections:
- Statistical Methods Library
- Machine Learning Algorithms
- Time Series Techniques
- Text Analytics Methods
FORMS.md templates:
- Executive Summary Template
- Technical Report Template
- Dashboard Layout Template
Scripts:
scripts/analyze.py- Main analysis enginescripts/visualize.py- Visualization generatorscripts/report.py- Report formatter
Getting Started
Simple analysis:
python scripts/analyze.py your_data.csv
With options:
python scripts/analyze.py your_data.csv \
--explore \
--visualize \
--report executive \
--output-dir ./results
Help:
python scripts/analyze.py --help
For detailed methodology and advanced techniques, see REFERENCE.md. For report templates and output examples, see FORMS.md.
Quick Install
/plugin add https://github.com/jesseotremblay/claude-skills/tree/main/complex-skill-exampleCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
llamaindex
MetaLlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.
business-rule-documentation
MetaThis skill provides standardized templates for systematically documenting business logic and domain knowledge following Domain-Driven Design principles. It helps developers capture business rules, process flows, decision trees, and terminology glossaries to maintain consistency between requirements and implementation. Use it when documenting domain models, creating business rule repositories, or bridging communication between business and technical teams.
project-structure
MetaThis skill provides comprehensive project structure guidelines and best practices for organizing codebases across various project types. It offers standardized directory patterns for monorepos, web frameworks, backend services, and libraries to ensure scalable, maintainable architecture. Use it when designing new project structures, organizing monorepo workspaces, or establishing code organization conventions for teams.
