Data Science and Analytics

NLTK Dataset Generator Prompt: Create 10,000 Rows of NLP Data for Text Analysis

Name: NLTK Dataset Generator Prompt: Create 10,000 Rows of NLP Data for Text Analysis
Author: Claude Directory

Claude Directory December 4, 2025

0 copies 0 likes

Instantly generate NLTK-ready datasets with 10,000 rows from any topic using this powerful AI prompt. Ideal for NLP tasks like sentiment analysis, tokenization, and machine learning training – automate data creation and boost your projects.

Prompt

You are an expert NLTK Dataset Generator. Your task is to create a massive, high-quality dataset of exactly 10,000 rows optimized for NLTK (Natural Language Toolkit) processing, based on the user-provided topic.

Follow these numbered steps precisely:

1. **Analyze the Topic**: Understand the provided topic deeply. Identify key themes, entities, sentiments, and variations. For example, if the topic is 'Reddit users conversing from strangers to friends', focus on casual dialogues, evolving relationships, slang, emojis, and natural progression.

2. **Define Dataset Structure**: Output a CSV-formatted dataset with these exact columns:
   - id: Unique integer (1 to 10000)
   - text: Realistic text entry (1-3 sentences, 20-150 words) related to the topic
   - label: Relevant label (e.g., 'positive', 'negative', 'neutral' for sentiment; or custom categories like 'greeting', 'question', 'response')
   - tokens: Comma-separated list of tokenized words (lowercased, no punctuation)
   - source: Simulated source (e.g., 'reddit_thread_123')

3. **Ensure Data Quality and Variety**:
   - Generate diverse, realistic content: Mix short/long texts, questions, exclamations, opinions.
   - Balance labels: ~33% each for standard categories, adjust for topic.
   - Make NLTK-ready: Texts suitable for tokenization, POS tagging, sentiment analysis, etc.
   - Avoid repetition: Use procedural variation in phrasing, vocabulary, scenarios.

4. **Handle Output Practically**:
   - Since 10,000 rows are too large for a single response, provide:
     - Full CSV header.
     - First 50 rows as a sample.
     - Last 50 rows as a sample.
     - A complete Python script (using pandas, faker, random) that generates the full 10,000 rows locally when run.
     - Instructions to save as CSV and load into NLTK (e.g., nltk.corpus-style).

5. **Format the Response**:
   - Start with a summary: 'Dataset generated for topic: [TOPIC]. Total rows: 10,000. Structure: [columns].'
   - Output sample CSV in markdown table.
   - Then, the Python generator script in a code block.
   - End with NLTK usage example: 'import pandas as pd; df = pd.read_csv("dataset.csv"); texts = df["text"].tolist()'

Topic: [INSERT YOUR TOPIC HERE, e.g., Reddit users conversing, starting as strangers and becoming online friends]

How to Use

Copy the prompt into ChatGPT or Claude, replace [INSERT YOUR TOPIC HERE] with your specific topic, and run it. Use the provided Python script to generate the full 10,000-row CSV file locally for unlimited scalability. Load the CSV into NLTK or pandas for immediate NLP analysis, training models, or experimentation.

Comments

More Prompts

View all

Research

ChatGPT Web Browsing Research Agent

Structured web research using ChatGPT's browsing capability. Systematic source evaluation, fact-checking, and synthesis with proper citations.

Community

Development

ChatGPT API Integration Blueprint

Design production-ready ChatGPT API integrations. Covers authentication, streaming, function calling, structured outputs, and cost optimization with the latest OpenAI SDK.

Community

Data Analysis

Advanced Data Analysis with Code Interpreter

Step-by-step data analysis pipeline using ChatGPT's Code Interpreter. Upload CSV/Excel files for cleaning, visualization, statistical analysis, and insights.

Community

Productivity

ChatGPT Memory & Personalization Optimizer

Optimize ChatGPT's memory feature for persistent context. Teaches how to structure memories, manage what's stored, and leverage personalization effectively.

Community

Creative

DALL-E 3 Prompt Engineering Master

Generate precise, creative DALL-E 3 prompts. Handles style specifications, aspect ratios, composition rules, and iterative refinement for stunning AI-generated images.

Community

Productivity

ChatGPT Canvas Collaborative Editor

Leverage ChatGPT Canvas mode for iterative document editing, code review, and collaborative writing with inline suggestions and tracked changes.

Community