Data Science and Analytics

NLTK Dataset Generator: Create 10,000 Rows of Custom NLP Data from Any Topic

Name: NLTK Dataset Generator: Create 10,000 Rows of Custom NLP Data from Any Topic
Author: Claude Directory

Claude Directory December 5, 2025

0 copies 0 likes

Instantly generate 10,000 rows of NLTK-compatible datasets for NLP tasks by inputting a single topic. Perfect for researchers, developers, and data scientists streamlining text data creation for analysis, training, and model testing.

Prompt

G'day Mate! You are the ultimate NLTK Matey Dataset Generator. Your job is to create a massive, high-quality dataset with exactly 10,000 rows tailored for NLTK natural language processing tasks. Follow this strict checklist to ensure the output is production-ready, diverse, and optimized for NLTK:

✅ **Understand the Input Topic**: The user will provide a topic, e.g., '[INSERT TOPIC HERE, e.g., Reddit user conversations turning into e-friendships]'. Generate all data strictly around this theme, incorporating variations in language, sentiment, length, and context for realism.

✅ **Define Dataset Structure**: Output as a CSV-formatted table with these exact columns: RowID (1-10000), Text (a realistic sentence/paragraph on the topic, 10-100 words), Sentiment (positive/neutral/negative), Category (e.g., conversation, query, response, story), Tokens (word count), POS_Tags (sample NLTK POS tags like 'NNP VBD'), Named_Entities (sample NE like 'PERSON: Alice').

✅ **Ensure Diversity and Quality**:
  - Mix casual, formal, slang, questions, statements, dialogues.
  - Balance sentiments: 40% positive, 40% neutral, 20% negative.
  - Vary lengths: 30% short (<20 words), 50% medium (20-60), 20% long (>60).
  - Include Australian slang if fitting the topic for fun (e.g., 'mate', 'fair dinkum').
  - Make text natural, error-free, and suitable for NLTK tokenization, stemming, POS tagging, NER.

✅ **Scale to 10,000 Rows**: Generate ALL 10,000 rows without summarization. If token limits hit, output in batches (e.g., Rows 1-2000, then 2001-4000) and instruct user to continue with 'Generate next batch'.

✅ **Format Output Perfectly**: Start with CSV header row. Use pipes | for readability if needed, but pure CSV. End with a summary: 'Dataset generated: 10,000 rows on [TOPIC]. Ready for NLTK: import pandas as pd; df = pd.read_csv("dataset.csv")'.

✅ **NLTK Compatibility**: Ensure text is preprocess-ready (no special chars breaking tokenizers). Add a sample Python snippet at end: from nltk import word_tokenize, pos_tag; example = df['Text'][0]; print(pos_tag(word_tokenize(example)))

Topic: [INSERT TOPIC HERE]

Generate the full 10,000-row dataset now, Mate!

How to Use

Copy this prompt into ChatGPT or Claude. Replace '[INSERT TOPIC HERE]' with your specific topic, like 'climate change discussions'. Paste and submit to generate the dataset in CSV format, ready for NLTK import and analysis. For large outputs, request batches sequentially as instructed.

Comments

More Prompts

View all

Research

ChatGPT Web Browsing Research Agent

Structured web research using ChatGPT's browsing capability. Systematic source evaluation, fact-checking, and synthesis with proper citations.

Community

Development

ChatGPT API Integration Blueprint

Design production-ready ChatGPT API integrations. Covers authentication, streaming, function calling, structured outputs, and cost optimization with the latest OpenAI SDK.

Community

Data Analysis

Advanced Data Analysis with Code Interpreter

Step-by-step data analysis pipeline using ChatGPT's Code Interpreter. Upload CSV/Excel files for cleaning, visualization, statistical analysis, and insights.

Community

Productivity

ChatGPT Memory & Personalization Optimizer

Optimize ChatGPT's memory feature for persistent context. Teaches how to structure memories, manage what's stored, and leverage personalization effectively.

Community

Creative

DALL-E 3 Prompt Engineering Master

Generate precise, creative DALL-E 3 prompts. Handles style specifications, aspect ratios, composition rules, and iterative refinement for stunning AI-generated images.

Community

Productivity

ChatGPT Canvas Collaborative Editor

Leverage ChatGPT Canvas mode for iterative document editing, code review, and collaborative writing with inline suggestions and tracked changes.

Community