Data Science and Analytics

NLTK Dataset Generator: Create 10,000 Rows of Topic-Specific NLP Data for Analysis

Name: NLTK Dataset Generator: Create 10,000 Rows of Topic-Specific NLP Data for Analysis
Author: Claude Directory

Claude Directory December 5, 2025

0 copies 0 likes

Instantly generate 10,000 rows of NLTK-ready datasets from any topic for NLP tasks like tokenization, sentiment analysis, and POS tagging. Streamline your Natural Language Toolkit projects with custom, high-volume synthetic data.

Prompt

You are an expert NLTK Dataset Generator specialized in creating large-scale, high-quality synthetic datasets for Natural Language Toolkit (NLTK) processing in Python. Your goal is to produce exactly 10,000 rows of structured text data based on the user's provided topic, optimized for common NLTK tasks such as tokenization, stemming, POS tagging, sentiment analysis, and corpus building.

Follow these numbered steps precisely:

1. **Analyze the Input Topic**: Carefully review the topic provided by the user. Identify key themes, entities, sentiments, and linguistic variations relevant to it. Ensure the dataset reflects realistic language use (e.g., conversations, sentences, reviews, or posts).

2. **Define Dataset Structure**: Output the dataset in CSV format for easy import into Python (e.g., via pandas.read_csv()). Use these exact columns:
   - Row ID: Sequential number from 1 to 10000
   - Text: A short, coherent text sample (20-100 words) related to the topic
   - Label: A sentiment label ('positive', 'negative', 'neutral') or category (infer 3-5 based on topic)
   - Length: Word count of the Text
   - Source: Simulated source (e.g., 'reddit_post', 'twitter_thread', 'review')

3. **Generate Diverse Data**: Create 10,000 unique rows with variety:
   - Mix sentence lengths, slang, formal/informal tones, questions, exclamations.
   - Include 30% positive, 30% negative, 40% neutral sentiments unless topic dictates otherwise.
   - Incorporate topic-specific vocabulary, entities, and scenarios.
   - Ensure grammatical correctness with occasional typos or informal errors for realism.

4. **Output Format**: 
   - Start with a header row: Row ID,Text,Label,Length,Source
   - List all 10,000 rows immediately below, one per line.
   - End with a summary: 'Dataset generated successfully. Total rows: 10,000. Ready for NLTK: nltk.download("punkt"); from nltk import sent_tokenize, word_tokenize'

User Topic: [INSERT YOUR TOPIC HERE, e.g., 'Reddit users starting strangely awkward conversations that turn into online friendships']

Generate the dataset now.

How to Use

Copy this prompt into ChatGPT or Claude. Replace '[INSERT YOUR TOPIC HERE]' with your specific topic, such as conversations or reviews. Paste the full prompt and submit to receive your 10,000-row CSV dataset instantly, ready to save and load into Python with NLTK for analysis.

Comments

More Prompts

View all

Research

ChatGPT Web Browsing Research Agent

Structured web research using ChatGPT's browsing capability. Systematic source evaluation, fact-checking, and synthesis with proper citations.

Community

Development

ChatGPT API Integration Blueprint

Design production-ready ChatGPT API integrations. Covers authentication, streaming, function calling, structured outputs, and cost optimization with the latest OpenAI SDK.

Community

Data Analysis

Advanced Data Analysis with Code Interpreter

Step-by-step data analysis pipeline using ChatGPT's Code Interpreter. Upload CSV/Excel files for cleaning, visualization, statistical analysis, and insights.

Community

Productivity

ChatGPT Memory & Personalization Optimizer

Optimize ChatGPT's memory feature for persistent context. Teaches how to structure memories, manage what's stored, and leverage personalization effectively.

Community

Creative

DALL-E 3 Prompt Engineering Master

Generate precise, creative DALL-E 3 prompts. Handles style specifications, aspect ratios, composition rules, and iterative refinement for stunning AI-generated images.

Community

Productivity

ChatGPT Canvas Collaborative Editor

Leverage ChatGPT Canvas mode for iterative document editing, code review, and collaborative writing with inline suggestions and tracked changes.

Community