feat: add dataframe duolicated issue - #667 by RahulDas-dev · Pull Request #669 · javascriptdata/danfojs · GitHub | Latest TMZ Celebrity News & Gossip | Watch TMZ Live
Skip to content

feat: add dataframe duolicated issue - #667 #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

RahulDas-dev
Copy link

This merge request adds a new [duplicated()] method to the DataFrame class that identifies duplicate rows within a DataFrame. This functionality is essential for data cleaning and exploration workflows.

Resolve the issue - #667

Features

  • Identifies duplicate rows in a DataFrame based on specified columns
  • Returns a Series of boolean values marking duplicate entries
  • Supports flexible options for handling duplicates:
    • keep: 'first' - Mark duplicates except for the first occurrence (default)
    • keep: 'last'- Mark duplicates except for the last occurrence
    • keep: false - Mark all duplicates
      Allows focusing on specific columns with the subset option

Implementation Details

  • Optimized to handle large datasets efficiently with a hash-based approach
  • Comprehensive input validation for better error handling
  • Well-documented with JSDoc comments and examples
// Create a DataFrame with duplicate rows
const df = new DataFrame({
  'A': [1, 2, 2, 3, 3],
  'B': ['a', 'b', 'b', 'c', 'c']
});

// Find duplicates keeping first occurrence (default)
const dups = df.duplicated();
// Returns: [false, false, true, false, true]

// Find duplicates keeping last occurrence
const dupsLast = df.duplicated({ keep: 'last' });
// Returns: [false, true, false, true, false]

// Find duplicates based on specific columns
const dupsSubset = df.duplicated({ subset: ['B'] });
// Returns: [false, false, true, false, true]

Signed-off-by: rahuldas-dev <r.das699@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant

TMZ Celebrity News – Breaking Stories, Videos & Gossip

Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.

Whether it’s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.

🎥 Watch TMZ Live

TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Don’t miss a beat—watch now and see what’s trending in Hollywood.