Back to Skills

data-export-excel

majiayu000
Updated Today
4 views
58
9
58
View on GitHub
Documentsexcelaidata

About

This skill exports data tables and analysis results to formatted Excel files using the openpyxl library. It runs locally, making it compatible with any LLM provider like GPT, Claude, or Gemini. Use it to create multi-sheet workbooks and professionally styled reports from data like AnnData annotations or DEG results.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/majiayu000/claude-skill-registry
Git CloneAlternative
git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-export-excel

Copy and paste this command in Claude Code to install this skill

Documentation

Excel Data Export (Universal)

Overview

This skill enables you to export bioinformatics data, analysis results, and formatted tables to professional Excel spreadsheets. Unlike cloud-hosted solutions, this skill uses the openpyxl Python library and executes locally in your environment, making it compatible with ALL LLM providers including GPT, Gemini, Claude, DeepSeek, and Qwen.

When to Use This Skill

  • Export AnnData observations (.obs) or variables (.var) to Excel
  • Save DEG analysis results with formatting
  • Create multi-sheet workbooks with different data types
  • Generate formatted Excel reports with cell styling
  • Export cluster annotations, cell type assignments, or quality control metrics

How to Use

Step 1: Import Required Libraries

import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd
import numpy as np

Step 2: Prepare Your Data

Convert your data to pandas DataFrame format:

# Example: Export AnnData observations
df = adata.obs.copy()

# Example: Export DEG results
deg_df = pd.DataFrame({
    'gene': gene_names,
    'log2FC': log2_fold_changes,
    'pvalue': pvalues,
    'qvalue': qvalues
})

# Example: Export cluster statistics
cluster_stats = adata.obs.groupby('clusters').size().reset_index(name='count')

Step 3: Create Excel Workbook

# Create new workbook
wb = Workbook()
ws = wb.active
ws.title = "Sheet Name"

# Write DataFrame to worksheet
for r in dataframe_to_rows(df, index=False, header=True):
    ws.append(r)

Step 4: Add Formatting (Optional)

# Style header row
header_fill = PatternFill(start_color="4472C4", end_color="4472C4", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF")

for cell in ws[1]:
    cell.fill = header_fill
    cell.font = header_font
    cell.alignment = Alignment(horizontal='center')

# Auto-adjust column widths
for column in ws.columns:
    max_length = 0
    column_letter = column[0].column_letter
    for cell in column:
        try:
            if len(str(cell.value)) > max_length:
                max_length = len(str(cell.value))
        except:
            pass
    adjusted_width = min(max_length + 2, 50)
    ws.column_dimensions[column_letter].width = adjusted_width

# Add borders
thin_border = Border(
    left=Side(style='thin'),
    right=Side(style='thin'),
    top=Side(style='thin'),
    bottom=Side(style='thin')
)
for row in ws.iter_rows(min_row=1, max_row=ws.max_row, min_col=1, max_col=ws.max_column):
    for cell in row:
        cell.border = thin_border

Step 5: Save the Workbook

# Save to file
output_path = "analysis_results.xlsx"
wb.save(output_path)
print(f"✅ Excel file saved to: {output_path}")

Multi-Sheet Workbooks

Create workbooks with multiple sheets for different data types:

wb = Workbook()

# Sheet 1: Cell metadata
ws1 = wb.active
ws1.title = "Cell Metadata"
for r in dataframe_to_rows(adata.obs, index=True, header=True):
    ws1.append(r)

# Sheet 2: Gene metadata
ws2 = wb.create_sheet("Gene Metadata")
for r in dataframe_to_rows(adata.var, index=True, header=True):
    ws2.append(r)

# Sheet 3: DEG results
ws3 = wb.create_sheet("DEG Results")
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws3.append(r)

wb.save("multi_sheet_analysis.xlsx")

Best Practices

  1. Column Headers: Always include column headers in the first row
  2. Data Types: Convert numpy arrays to lists before writing
  3. Large Datasets: For datasets >100K rows, consider CSV export instead
  4. File Paths: Use absolute paths or ensure output directory exists
  5. Formatting: Apply formatting sparingly to reduce file size
  6. Index: Decide whether to include DataFrame index (set index=True/False in dataframe_to_rows)

Common Use Cases

Export Quality Control Metrics

qc_metrics = adata.obs[['n_genes', 'n_counts', 'percent_mito', 'clusters']].copy()

wb = Workbook()
ws = wb.active
ws.title = "QC Metrics"

for r in dataframe_to_rows(qc_metrics, index=False, header=True):
    ws.append(r)

# Highlight cells with high mitochondrial content
for row in range(2, ws.max_row + 1):
    if ws.cell(row, 3).value > 0.2:  # percent_mito > 20%
        ws.cell(row, 3).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("qc_metrics.xlsx")

Export Marker Genes by Cluster

# Assuming you have marker genes for each cluster
marker_dict = {
    'Cluster_0': ['CD3D', 'CD3E', 'CD8A'],
    'Cluster_1': ['CD79A', 'MS4A1', 'CD19'],
    'Cluster_2': ['LYZ', 'S100A9', 'CD14']
}

wb = Workbook()

for cluster_name, genes in marker_dict.items():
    ws = wb.create_sheet(cluster_name)
    ws.append(['Marker Gene'])
    for gene in genes:
        ws.append([gene])

# Remove default sheet
if 'Sheet' in wb.sheetnames:
    wb.remove(wb['Sheet'])

wb.save("marker_genes.xlsx")

Export DEG Analysis with Conditional Formatting

wb = Workbook()
ws = wb.active
ws.title = "DEG Analysis"

# Write DEG results
for r in dataframe_to_rows(deg_df, index=False, header=True):
    ws.append(r)

# Color code by fold change
for row in range(2, ws.max_row + 1):
    log2fc = ws.cell(row, 2).value  # Assuming log2FC in column 2
    if log2fc > 1:  # Upregulated
        ws.cell(row, 2).fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
    elif log2fc < -1:  # Downregulated
        ws.cell(row, 2).fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")

wb.save("deg_results_formatted.xlsx")

Troubleshooting

Issue: "openpyxl not found"

Solution: Install the library:

import subprocess
subprocess.check_call(['pip', 'install', 'openpyxl'])

Issue: "Invalid data type for cell"

Solution: Convert numpy/pandas types to native Python types:

# Convert numpy types
df = df.astype(object).where(pd.notnull(df), None)

# Or convert specific columns
df['column_name'] = df['column_name'].astype(str)

Issue: "Memory error with large datasets"

Solution: Export in chunks or use CSV format instead:

# Fallback to CSV for large data
df.to_csv('large_dataset.csv', index=False)
print("Dataset too large for Excel, saved as CSV instead")

Technical Notes

  • Library: Uses openpyxl (pure Python, no external dependencies)
  • Execution: Runs locally in the agent's sandbox
  • Compatibility: Works with ALL LLM providers (GPT, Gemini, Claude, DeepSeek, Qwen, etc.)
  • File Limits: Excel has a 1,048,576 row limit (use CSV for larger datasets)
  • Performance: Writing ~10K rows takes 1-2 seconds; 100K rows takes 10-20 seconds

References

GitHub Repository

majiayu000/claude-skill-registry
Path: skills/data-export-excel

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill