diff --git a/README.md b/README.md
index 82e496d76..1b9d4dc40 100644
--- a/README.md
+++ b/README.md
@@ -70,6 +70,7 @@ Please share your story by answering 1 quick question
 * Variable Creation
 * Variable Selection
 * Datetime Features
+* Text Features
 * Time Series
 * Preprocessing
 * Scaling
@@ -146,6 +147,9 @@ Please share your story by answering 1 quick question
  * DatetimeFeatures
  * DatetimeSubtraction
  * DatetimeOrdinal
+
+### Text Features
+ * TextFeatures
  
 ### Time Series
  * LagFeatures
diff --git a/docs/api_doc/index.rst b/docs/api_doc/index.rst
index 4e09a1a31..2a11913fc 100644
--- a/docs/api_doc/index.rst
+++ b/docs/api_doc/index.rst
@@ -25,6 +25,7 @@ Creation
 
    creation/index
    datetime/index
+   text/index
 
 
 Selection
diff --git a/docs/api_doc/text/TextFeatures.rst b/docs/api_doc/text/TextFeatures.rst
new file mode 100644
index 000000000..7b2b4f76f
--- /dev/null
+++ b/docs/api_doc/text/TextFeatures.rst
@@ -0,0 +1,6 @@
+TextFeatures
+============
+
+.. autoclass:: feature_engine.text.TextFeatures
+    :members:
+
diff --git a/docs/api_doc/text/index.rst b/docs/api_doc/text/index.rst
new file mode 100644
index 000000000..f87392fdd
--- /dev/null
+++ b/docs/api_doc/text/index.rst
@@ -0,0 +1,13 @@
+.. -*- mode: rst -*-
+
+Text Features
+=============
+
+Feature-engine's text transformers extract numerical features from text/string
+variables.
+
+.. toctree::
+   :maxdepth: 1
+
+   TextFeatures
+
diff --git a/docs/index.rst b/docs/index.rst
index a04f8d4bb..371827505 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -65,6 +65,7 @@ Feature-engine includes transformers for:
 - Creation of new features
 - Feature selection
 - Datetime features
+- Text features
 - Time series
 - Preprocessing
 - Scaling
@@ -260,6 +261,11 @@ extract many new features from the date and time parts of the datetime variable:
 - :doc:`api_doc/datetime/DatetimeSubtraction`: computes subtractions between datetime variables
 - :doc:`api_doc/datetime/DatetimeOrdinal`: converts datetime variables into ordinal numbers
 
+Text:
+~~~~~
+
+- :doc:`api_doc/text/TextFeatures`: extracts numerical features from text/string variables
+
 Feature Selection:
 ~~~~~~~~~~~~~~~~~~
 
diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst
index c786e77e1..52c33a8f4 100644
--- a/docs/user_guide/index.rst
+++ b/docs/user_guide/index.rst
@@ -28,6 +28,7 @@ Creation
 
    creation/index
    datetime/index
+   text/index
 
 
 Selection
diff --git a/docs/user_guide/text/TextFeatures.rst b/docs/user_guide/text/TextFeatures.rst
new file mode 100644
index 000000000..84d1b4e22
--- /dev/null
+++ b/docs/user_guide/text/TextFeatures.rst
@@ -0,0 +1,365 @@
+.. _text_features:
+
+.. currentmodule:: feature_engine.text
+
+Extracting Features from Text
+=============================
+
+Short pieces of text are often found among the variables in our datasets. For example,
+in insurance, a text variable can describe the circumstances of an accident. Customer
+feedback is also stored as a text variable.
+
+While text data as such can't be used to train machine learning models, we can extract
+a lot of numerical information from these texts, which can provide predictive features
+to train machine learning models.
+
+Feature-engine allows you to quickly extract numerical features from short pieces of
+text, to complement your predictive models. These features aim to capture a piece of
+text’s complexity by looking at some statistical parameters of the text, such as the
+word length and count, the number of words and unique words used, the number of
+sentences, and so on.
+
+:class:`TextFeatures()` extracts many numerical features from text out-of-the-box.
+
+TextFeatures
+------------
+
+:class:`TextFeatures()` extracts numerical features from text/string variables.
+This transformer is useful for extracting basic text statistics that can be used
+as features in machine learning models. Users must explicitly specify which columns
+contain text data via the `variables` parameter.
+
+Unlike scikit-learn's CountVectorizer or TfidfVectorizer which create sparse matrices,
+:class:`TextFeatures()` extracts metadata features that remain in DataFrame format
+and can be easily combined with other Feature-engine or sklearn transformers in a pipeline.
+
+Text Features
+-------------
+
+:class:`TextFeatures()` can extract the following features from a text piece:
+
+- **char_count**: Number of characters in the text
+- **word_count**: Number of words (whitespace-separated tokens)
+- **sentence_count**: Number of sentences (based on .!? punctuation)
+- **avg_word_length**: Average length of words
+- **digit_count**: Number of digit characters
+- **letter_count**: Number of alphabetic characters (a-z, A-Z)
+- **uppercase_count**: Number of uppercase letters
+- **lowercase_count**: Number of lowercase letters
+- **special_char_count**: Number of special characters (non-alphanumeric)
+- **whitespace_count**: Number of whitespace characters
+- **whitespace_ratio**: Ratio of whitespace to total characters
+- **digit_ratio**: Ratio of digits to total characters
+- **uppercase_ratio**: Ratio of uppercase to total characters
+- **has_digits**: Binary indicator if text contains digits
+- **has_uppercase**: Binary indicator if text contains uppercase
+- **is_empty**: Binary indicator if text is empty
+- **starts_with_uppercase**: Binary indicator if text starts with uppercase
+- **ends_with_punctuation**: Binary indicator if text ends with .!?
+- **unique_word_count**: Number of unique words (case-insensitive)
+- **lexical_diversity**: Ratio of unique words to total words
+
+The **number of sentences** is inferred by :class:`TextFeatures()` by counting blocks of
+sentence-ending punctuation (., !, ?) as a proxy for sentence boundaries. This means that
+multiple consecutive punctuation marks (e.g., "!!!" or "??") are counted as a single
+sentence-ending, which avoids overestimating the count in emphatic text.
+
+However, this is still a simple heuristic. It won't handle edge cases like abbreviations
+(e.g., 'Dr.', 'U.S.', 'e.g.', 'i.e.') or text without punctuation. These abbreviations
+will be counted as sentence endings, resulting in an overestimate of the actual sentence
+count.
+
+The features **number of unique words** and **lexical diversity** are intended to
+capture the complexity of the text. Simpler texts have few unique words and tend to
+repeat them. More complex texts use a wider array of words and tend not to repeat them.
+Hence, in more complex texts, both the number of unique words and the lexical diversity
+are greater.
+
+Handling missing values
+-----------------------
+
+By default, :class:`TextFeatures()` ignores missing values by treating them as empty
+strings (`missing_values='ignore'`). You can change this behavior by setting the
+parameter to `'raise'` if you prefer the transformer to raise an error when encountering
+missing data.
+
+In this case, missing values will be treated as empty strings, and the numerical features
+will be calculated accordingly (e.g., word count and character count will be 0) as shown
+in the following example:
+
+.. code:: python
+
+    import pandas as pd
+    import numpy as np
+    from feature_engine.text import TextFeatures
+
+    # Create sample data with NaN
+    X = pd.DataFrame({
+        'text': ['Hello', np.nan, 'World']
+    })
+
+    # Set up the transformer (defaults to ignore missing values)
+    tf = TextFeatures(
+        variables=['text'],
+        features=['char_count']
+    )
+
+    # Transform
+    X_transformed = tf.fit_transform(X)
+
+    print(X_transformed)
+
+In the resulting dataframe, we see that the row with NaN returned 0 in the character
+count:
+
+.. code-block:: none
+
+    text  text_char_count
+    0  Hello                5
+    1    NaN                0
+    2  World                5
+
+Python demo
+-----------
+
+In this section, we'll show how to use :class:`TextFeatures()`.
+Let's create a dataframe with text data:
+
+.. code:: python
+
+    import pandas as pd
+    from feature_engine.text import TextFeatures
+
+    # Create sample data
+    X = pd.DataFrame({
+        'review': [
+            'This product is AMAZING! Best purchase ever.',
+            'Not great. Would not recommend.',
+            'OK for the price. 3 out of 5 stars.',
+            'TERRIBLE!!! DO NOT BUY!',
+        ],
+        'title': [
+            'Great Product',
+            'Disappointed',
+            'Average',
+            'Awful',
+        ]
+    })
+
+    print(X)
+
+The input dataframe looks like this:
+
+.. code-block:: none
+
+                                           review          title
+    0  This product is AMAZING! Best purchase ever.  Great Product
+    1             Not great. Would not recommend.   Disappointed
+    2       OK for the price. 3 out of 5 stars.        Average
+    3                     TERRIBLE!!! DO NOT BUY!          Awful
+
+Now let's extract 5 specific text features: the number of words, the number of
+characters, the number of sentences, whether the text has digits, and the ratio of
+upper- to lowercase:
+
+.. code:: python
+
+    # Set up the transformer with specific features
+    tf = TextFeatures(
+        variables=['review'],
+        features=[
+            'word_count',
+            'char_count',
+            'sentence_count',
+            'has_digits',
+            'uppercase_ratio',
+        ])
+
+    # Fit and transform
+    X_transformed = tf.fit_transform(X)
+
+    print(X_transformed)
+
+In the following output, we see the resulting dataframe containing the numerical
+features extracted from the pieces of text:
+
+.. code-block:: none
+
+                                             review          title  review_word_count  review_char_count
+    0  This product is AMAZING! Best purchase ever.  Great Product                  7                 38
+    1               Not great. Would not recommend.   Disappointed                  5                 27
+    2           OK for the price. 3 out of 5 stars.        Average                  9                 27
+    3                       TERRIBLE!!! DO NOT BUY!          Awful                  4                 20
+
+       review_sentence_count  review_has_digits  review_uppercase_ratio
+    0                      2                  0                0.236842
+    1                      2                  0                0.074074
+    2                      2                  1                0.074074
+    3                      2                  0                0.800000
+
+Extracting all features
+~~~~~~~~~~~~~~~~~~~~~~~
+
+By default, if no text features are specified, all available features will be extracted:
+
+.. code:: python
+
+    # Extract all features from a single text column
+    tf = TextFeatures(variables=['review'])
+    X_transformed = tf.fit_transform(X)
+
+    print(X_transformed.head())
+
+The output dataframe contains all 20 text features extracted from the `review` column:
+
+.. code-block:: none
+
+                                             review          title  review_char_count  review_word_count
+    0  This product is AMAZING! Best purchase ever.  Great Product                 38                  7
+    1               Not great. Would not recommend.   Disappointed                 27                  5
+    2           OK for the price. 3 out of 5 stars.        Average                 27                  9
+    3                       TERRIBLE!!! DO NOT BUY!          Awful                 20                  4
+
+       review_sentence_count  review_avg_word_length  review_digit_count  review_letter_count
+    0                      2                6.285714                   0                   36
+    1                      2                6.200000                   0                   25
+    2                      2                3.888889                   2                   23
+    3                      2                5.750000                   0                   16
+
+       review_uppercase_count  review_lowercase_count  review_special_char_count  review_whitespace_count
+    0                       9                      27                          2                        6
+    1                       2                      23                          2                        4
+    2                       2                      21                          2                        8
+    3                      16                       0                          4                        3
+
+       review_whitespace_ratio  review_digit_ratio  review_uppercase_ratio  review_has_digits
+    0                 0.136364            0.000000                0.236842                  0
+    1                 0.129032            0.000000                0.074074                  0
+    2                 0.228571            0.074074                0.074074                  1
+    3                 0.130435            0.000000                0.800000                  0
+
+       review_has_uppercase  review_is_empty  review_starts_with_uppercase  review_ends_with_punctuation
+    0                     1                0                             1                             1
+    1                     1                0                             1                             1
+    2                     1                0                             1                             1
+    3                     1                0                             1                             1
+
+       review_unique_word_count  review_lexical_diversity
+    0                         7                       1.0
+    1                         4                      1.25
+    2                         9                       1.0
+    3                         4                       1.0
+
+Dropping original columns
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can drop the original text columns after extracting features, by setting the
+parameter `drop_original` to `True`:
+
+.. code:: python
+
+    tf = TextFeatures(
+        variables=['review'],
+        features=['word_count', 'char_count'],
+        drop_original=True
+    )
+
+    X_transformed = tf.fit_transform(X)
+
+    print(X_transformed)
+
+The original `'review'` column has been removed, and only the `'title'` column and the
+extracted features remain:
+
+.. code-block:: none
+
+              title  review_word_count  review_char_count
+    0  Great Product                  7                 38
+    1  Disappointed                   5                 27
+    2       Average                   9                 27
+    3         Awful                   4                 20
+
+Combining with scikit-learn Bag-of-Words
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In most NLP tasks, it is common to use bag-of-words (e.g., `CountVectorizer`) or TF-IDF
+(e.g., `TfidfVectorizer`) to represent the text. :class:`TextFeatures()` can be used
+alongside these transformers to provide additional metadata that might improve model
+performance.
+
+In the following example, we compare a baseline model using only TF-IDF with a model
+that combines TF-IDF and :class:`TextFeatures()` metadata:
+
+.. code:: python
+
+    import pandas as pd
+    from sklearn.datasets import fetch_20newsgroups
+    from sklearn.model_selection import train_test_split
+
+    from feature_engine.text import TextFeatures
+
+    # Load and split data
+    data = fetch_20newsgroups(subset='train', categories=['sci.space', 'rec.sport.hockey'])
+    df = pd.DataFrame({'text': data.data, 'target': data.target})
+    X_train, X_test, y_train, y_test = train_test_split(
+        df[['text']], df['target'], test_size=0.3, random_state=42
+    )
+
+    print(X_train.head())
+
+The input dataframe contains the raw text of newsgroup posts:
+
+.. code-block:: none
+
+                                                     text
+    562  From: xxx@yyy.zzz (John Smith)\nSubject: Re:...
+    459  From: aaa@bbb.ccc (Jane Doe)\nSubject: Shutt...
+    21   From: ddd@eee.fff\nSubject: Space Station Fr...
+    892  From: ggg@hhh.iii\nSubject: NHL Scores\nOrga...
+    317  From: jjj@kkk.lll (Bob Wilson)\nSubject: Re:...
+
+Now let's set up two pipelines to compare a baseline model using only TF-IDF with a
+model that combines TF-IDF and :class:`TextFeatures()` metadata:
+
+.. code:: python
+
+    from sklearn.pipeline import Pipeline
+    from sklearn.feature_extraction.text import TfidfVectorizer
+    from sklearn.compose import ColumnTransformer
+    from sklearn.linear_model import LogisticRegression
+    from sklearn.preprocessing import StandardScaler
+
+    # 1. Baseline: TF-IDF only
+    tfidf_pipe = Pipeline([
+        ('vec', ColumnTransformer([
+            ('tfidf', TfidfVectorizer(max_features=500), 'text')
+        ])),
+        ('clf', LogisticRegression())
+    ])
+    tfidf_pipe.fit(X_train, y_train)
+    print(f"TF-IDF Accuracy: {tfidf_pipe.score(X_test, y_test):.3f}")
+
+    # 2. Combined: TextFeatures + TF-IDF
+    combined_pipe = Pipeline([
+        ('features', ColumnTransformer([
+            ('text_meta', TextFeatures(variables=['text']), 'text'),
+            ('tfidf', TfidfVectorizer(max_features=500), 'text')
+        ])),
+        ('scaler', StandardScaler()),
+        ('clf', LogisticRegression())
+    ])
+    combined_pipe.fit(X_train, y_train)
+    print(f"Combined Accuracy: {combined_pipe.score(X_test, y_test):.3f}")
+
+Below we see the accuracy of a model trained using only the bag of words, respect to a
+model trained using both the bag of words and the additional meta data:
+
+.. code-block:: none
+
+    TF-IDF Accuracy: 0.957
+    Combined Accuracy: 0.963
+
+By adding statistical metadata through :class:`TextFeatures()`, we provided the model
+with information about text length, complexity, and style that is not explicitly
+captured by a word-count-based approach like TF-IDF, leading to a small but noticeable
+improvement in performance.
diff --git a/docs/user_guide/text/index.rst b/docs/user_guide/text/index.rst
new file mode 100644
index 000000000..0a7ce55bb
--- /dev/null
+++ b/docs/user_guide/text/index.rst
@@ -0,0 +1,18 @@
+.. -*- mode: rst -*-
+
+Text Feature Extraction
+=======================
+
+Feature-engine's text module includes transformers to extract numerical features
+from text/string variables.
+
+Text feature extraction is useful for machine learning problems where you have
+text data but want to derive numerical statistics without, or in addition to, creating sparse
+bag-of-words or TF-IDF representations.
+
+**Transformers**
+
+.. toctree::
+   :maxdepth: 1
+
+   TextFeatures
diff --git a/feature_engine/text/__init__.py b/feature_engine/text/__init__.py
new file mode 100644
index 000000000..14626b79c
--- /dev/null
+++ b/feature_engine/text/__init__.py
@@ -0,0 +1,9 @@
+"""
+The module text includes classes to extract features from text/string variables.
+"""
+
+from .text_features import TextFeatures
+
+__all__ = [
+    "TextFeatures",
+]
diff --git a/feature_engine/text/text_features.py b/feature_engine/text/text_features.py
new file mode 100644
index 000000000..8299863ff
--- /dev/null
+++ b/feature_engine/text/text_features.py
@@ -0,0 +1,332 @@
+# Authors: Ankit Hemant Lade (contributor)
+# License: BSD 3 clause
+from typing import List, Optional, Union, cast
+
+import pandas as pd
+from sklearn.base import BaseEstimator, TransformerMixin
+from sklearn.utils.validation import check_is_fitted
+
+from feature_engine._base_transformers.mixins import GetFeatureNamesOutMixin
+from feature_engine._check_init_parameters.check_init_input_params import (
+    _check_param_drop_original,
+    _check_param_missing_values,
+)
+from feature_engine.dataframe_checks import (
+    _check_optional_contains_na,
+    _check_X_matches_training_df,
+    check_X,
+)
+
+# Available text features and their computation functions
+TEXT_FEATURES = {
+    "char_count": lambda x: x.str.replace(r"\s+", "", regex=True).str.len(),
+    "word_count": lambda x: x.str.strip().str.split().str.len(),
+    "sentence_count": lambda x: x.str.count(r"[.!?]+"),
+    "avg_word_length": lambda x: x.str.strip().str.len()
+    / x.str.strip().str.split().str.len(),
+    "digit_count": lambda x: x.str.count(r"\d"),
+    "letter_count": lambda x: x.str.count(r"[a-zA-Z]"),
+    "uppercase_count": lambda x: x.str.count(r"[A-Z]"),
+    "lowercase_count": lambda x: x.str.count(r"[a-z]"),
+    "special_char_count": lambda x: x.str.count(r"[^a-zA-Z0-9\s]"),
+    "whitespace_count": lambda x: x.str.count(r"\s"),
+    "whitespace_ratio": lambda x: x.str.count(r"\s") / x.str.len().replace(0, 1),
+    "digit_ratio": lambda x: x.str.count(r"\d")
+    / x.str.replace(r"\s+", "", regex=True).str.len().replace(0, 1),
+    "uppercase_ratio": lambda x: x.str.count(r"[A-Z]")
+    / x.str.replace(r"\s+", "", regex=True).str.len().replace(0, 1),
+    "has_digits": lambda x: x.str.contains(r"\d", regex=True).astype(int),
+    "has_uppercase": lambda x: x.str.contains(r"[A-Z]", regex=True).astype(int),
+    "is_empty": lambda x: (x.str.len() == 0).astype(int),
+    "starts_with_uppercase": lambda x: x.str.match(r"^[A-Z]").astype(int),
+    "ends_with_punctuation": lambda x: x.str.match(r".*[.!?]$").astype(int),
+    "unique_word_count": lambda x: (x.str.lower().str.split().apply(set).str.len()),
+    "lexical_diversity": lambda x: x.str.strip().str.split().str.len()
+    / x.str.lower().str.split().apply(set).str.len(),
+}
+
+
+class TextFeatures(TransformerMixin, BaseEstimator, GetFeatureNamesOutMixin):
+    """
+    TextFeatures() extracts numerical features from text/string variables. This
+    transformer is useful for extracting basic text statistics that can be used
+    as features in machine learning models.
+
+    A list with the text variables must be passed as an argument.
+
+    More details in the :ref:`User Guide <text_features>`.
+
+    Parameters
+    ----------
+    variables: string, list
+        The list of text/string variables to extract features from.
+
+    features: list, default=None
+        List of text features to extract. Available features are:
+
+        - 'char_count': Number of characters in the text
+        - 'word_count': Number of words (whitespace-separated tokens)
+        - 'sentence_count': Number of sentences (based on .!? punctuation)
+        - 'avg_word_length': Average length of words
+        - 'digit_count': Number of digit characters
+        - 'letter_count': Number of alphabetic characters (a-z, A-Z)
+        - 'uppercase_count': Number of uppercase letters
+        - 'lowercase_count': Number of lowercase letters
+        - 'special_char_count': Number of special characters (non-alphanumeric)
+        - 'whitespace_count': Number of whitespace characters
+        - 'whitespace_ratio': Ratio of whitespace to total characters
+        - 'digit_ratio': Ratio of digits to total characters
+        - 'uppercase_ratio': Ratio of uppercase to total characters
+        - 'has_digits': Binary indicator if text contains digits
+        - 'has_uppercase': Binary indicator if text contains uppercase
+        - 'is_empty': Binary indicator if text is empty
+        - 'starts_with_uppercase': Binary indicator if text starts with uppercase
+        - 'ends_with_punctuation': Binary indicator if text ends with .!?
+        - 'unique_word_count': Number of unique words (case-insensitive)
+        - 'lexical_diversity': Ratio of unique words to total words
+
+        If None, extracts all available features.
+
+    missing_values: string, default='ignore'
+        If 'ignore', NaNs will be filled with an empty string before feature
+        extraction. If 'raise', the transformer will raise an error if missing data
+        is found.
+
+    drop_original: bool, default=False
+        Whether to drop the original text columns after transformation.
+
+    Attributes
+    ----------
+    variables_:
+        The list of text variables that will be transformed.
+
+    features_:
+        The list of features that will be extracted.
+
+    feature_names_in_:
+        List with the names of features seen during fit.
+
+    n_features_in_:
+        The number of features in the train set used in fit.
+
+    Methods
+    -------
+    fit:
+        This transformer does not learn parameters.
+
+    fit_transform:
+        Fit to data, then transform it.
+
+    transform:
+        Extract text features and add them to the dataframe.
+
+    get_feature_names_out:
+        Get output feature names for transformation.
+
+    See Also
+    --------
+    feature_engine.encoding.StringSimilarityEncoder :
+        Encodes categorical variables based on string similarity.
+
+    Examples
+    --------
+
+    >>> import pandas as pd
+    >>> from feature_engine.text import TextFeatures
+    >>> X = pd.DataFrame({
+    ...     'text': ['Hello World!', 'Python is GREAT.', 'ML rocks 123']
+    ... })
+    >>> tf = TextFeatures(
+    ...     variables=['text'],
+    ...     features=['char_count', 'word_count', 'has_digits']
+    ... )
+    >>> tf.fit(X)
+    TextFeatures(features=['char_count', 'word_count', 'has_digits'],
+                 variables=['text'])
+    >>> X = tf.transform(X)
+    >>> pd.options.display.max_columns = 10
+    >>> print(X)
+                   text  text_char_count  text_word_count  text_has_digits
+    0      Hello World!               11                2                0
+    1  Python is GREAT.               14                3                0
+    2      ML rocks 123               10                3                1
+    """
+
+    def __init__(
+        self,
+        variables: Union[str, List[str]],
+        features: Optional[List[str]] = None,
+        missing_values: str = "ignore",
+        drop_original: bool = False,
+    ) -> None:
+
+        # Validate variables
+        if isinstance(variables, str):
+            variables = [variables]
+        if not isinstance(variables, list) or not all(
+            isinstance(v, str) for v in variables
+        ):
+            raise ValueError(
+                "variables must be a string or a list of strings. "
+                f"Got {type(variables).__name__} instead."
+            )
+
+        # Validate features
+        if features is not None:
+            if not isinstance(features, list) or not all(
+                isinstance(f, str) for f in features
+            ):
+                raise ValueError(
+                    "features must be None or a list of strings. "
+                    f"Got {type(features).__name__} instead."
+                )
+            invalid_features = set(features) - set(TEXT_FEATURES.keys())
+            if invalid_features:
+                raise ValueError(
+                    f"Invalid features: {invalid_features}. "
+                    f"Available features are: {list(TEXT_FEATURES.keys())}"
+                )
+
+        _check_param_drop_original(drop_original)
+        _check_param_missing_values(missing_values)
+
+        self.variables = variables
+        self.features = features
+        self.missing_values = missing_values
+        self.drop_original = drop_original
+
+    def fit(self, X: pd.DataFrame, y: Optional[pd.Series] = None):
+        """
+        This transformer does not learn any parameters.
+
+        Parameters
+        ----------
+        X: pandas dataframe of shape = [n_samples, n_features]
+            The training input samples. Can be the entire dataframe, not just the
+            variables to transform.
+
+        y: pandas Series, or np.array. Defaults to None.
+            The target. It is not needed in this transformer. You can pass y or None.
+        """
+
+        # check input dataframe
+        X = check_X(X)
+
+        # Validate user-specified variables exist
+        missing = set(self.variables) - set(X.columns)
+        if missing:
+            raise ValueError(f"Variables {missing} are not present in the dataframe.")
+
+        # Validate that the variables are object or string
+        non_text = [
+            col
+            for col in self.variables
+            if not (
+                pd.api.types.is_string_dtype(X[col])
+                or pd.api.types.is_object_dtype(X[col])
+            )
+        ]
+        if non_text:
+            raise ValueError(
+                f"Variables {non_text} are not object or string. "
+                "Please provide text variables only."
+            )
+
+        self.variables_ = self.variables
+
+        # check if dataset contains na
+        if self.missing_values == "raise":
+            _check_optional_contains_na(X, cast(list[Union[str, int]], self.variables_))
+
+        # Set features to extract
+        if self.features is None:
+            self.features_ = list(TEXT_FEATURES.keys())
+        else:
+            self.features_ = self.features
+
+        # save input features
+        self.feature_names_in_ = X.columns.tolist()
+
+        # save train set shape
+        self.n_features_in_ = X.shape[1]
+
+        return self
+
+    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
+        """
+        Extract text features and add them to the dataframe.
+
+        Parameters
+        ----------
+        X: pandas dataframe of shape = [n_samples, n_features]
+            The data to transform.
+
+        Returns
+        -------
+        X_new: Pandas dataframe
+            The dataframe with the original columns plus the new text features.
+        """
+
+        # Check method fit has been called
+        check_is_fitted(self)
+
+        # check that input is a dataframe
+        X = check_X(X)
+
+        # Check if input data contains same number of columns as dataframe used to fit.
+        _check_X_matches_training_df(X, self.n_features_in_)
+
+        # check if dataset contains na
+        if self.missing_values == "raise":
+            _check_optional_contains_na(X, cast(list[Union[str, int]], self.variables_))
+        else:
+            X[self.variables_] = X[self.variables_].fillna("")
+
+        # reorder variables to match train set
+        X = X[self.feature_names_in_]
+
+        # Extract features for each text variable
+        for var in self.variables_:
+            for feature_name in self.features_:
+                new_col_name = f"{var}_{feature_name}"
+                feature_func = TEXT_FEATURES[feature_name]
+                X[new_col_name] = feature_func(X[var])
+
+                # Fill any NaN values resulting from computation with 0
+                X[new_col_name] = X[new_col_name].fillna(0)
+
+        if self.drop_original:
+            X = X.drop(columns=self.variables_)
+
+        return X
+
+    def get_feature_names_out(self, input_features=None) -> List[str]:
+        """
+        Get output feature names for transformation.
+
+        Parameters
+        ----------
+        input_features : array-like of str or None, default=None
+            Input features. If ``None``, uses ``feature_names_in_``.
+
+        Returns
+        -------
+        feature_names_out : list of str
+            Output feature names.
+        """
+        check_is_fitted(self)
+
+        # Start with original features
+        if self.drop_original:
+            feature_names = [
+                f for f in self.feature_names_in_ if f not in self.variables_
+            ]
+        else:
+            feature_names = list(self.feature_names_in_)
+
+        # Add new text feature names
+        for var in self.variables_:
+            for feature_name in self.features_:
+                feature_names.append(f"{var}_{feature_name}")
+
+        return feature_names
diff --git a/tests/test_text/__init__.py b/tests/test_text/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/tests/test_text/test_text_features.py b/tests/test_text/test_text_features.py
new file mode 100644
index 000000000..ba906885f
--- /dev/null
+++ b/tests/test_text/test_text_features.py
@@ -0,0 +1,780 @@
+import pandas as pd
+import pytest
+
+from feature_engine.text import TextFeatures
+from feature_engine.text.text_features import TEXT_FEATURES
+
+# ==============================================================================
+# INIT TESTS
+# ==============================================================================
+
+
+@pytest.mark.parametrize(
+    "invalid_variables",
+    [
+        123,
+        True,
+        [1, 2],
+        ["text", 123],
+        {"text": 1},
+    ],
+)
+def test_invalid_variables_raises_error(invalid_variables):
+    with pytest.raises(ValueError, match="variables must be a string or a list of"):
+        TextFeatures(variables=invalid_variables)
+
+
+@pytest.mark.parametrize(
+    "invalid_features, err_msg",
+    [
+        ("some_string", "features must be"),
+        ([1, 2], "features must be"),
+        (123, "features must be"),
+        (True, "features must be"),
+        (["some_string", True], "features must be"),
+        ({"some_string": 1}, "features must be"),
+        (["invalid_feature"], "Invalid features"),
+        (["char_count", "invalid_feature"], "Invalid features"),
+    ],
+)
+def test_invalid_features_raises_error(invalid_features, err_msg):
+    with pytest.raises(ValueError, match=err_msg):
+        TextFeatures(variables=["text"], features=invalid_features)
+
+
+# ==============================================================================
+# FIT TESTS
+# ==============================================================================
+
+
+@pytest.mark.parametrize(
+    "variables, features",
+    [
+        ("text", None),
+        (["string"], ["char_count"]),
+        (["text", "string"], ["sentence_count", "avg_word_length"]),
+    ],
+)
+def test_fit_stores_attributes(variables, features):
+    X = pd.DataFrame({"text": ["Hello"], "string": ["Bye"]})
+    transformer = TextFeatures(variables=variables, features=features)
+    transformer.fit(X)
+
+    assert (
+        transformer.variables_ == variables
+        if isinstance(variables, list)
+        else transformer.variables_ == [variables]
+    )
+    assert (
+        transformer.features_ == list(TEXT_FEATURES.keys())
+        if features is None
+        else transformer.features_ == features
+    )
+    assert transformer.feature_names_in_ == ["text", "string"]
+    assert transformer.n_features_in_ == 2
+
+
+def test_missing_variable_raises_error():
+    X = pd.DataFrame({"text": ["Hello"]})
+    transformer = TextFeatures(variables=["nonexistent"])
+    with pytest.raises(ValueError, match="not present in the dataframe"):
+        transformer.fit(X)
+
+
+@pytest.mark.parametrize("variables", ["Age", "Marks", "dob"])
+def test_no_text_columns_raises_error(df_vartypes, variables):
+    transformer = TextFeatures(variables=variables)
+    with pytest.raises(ValueError, match="not object or string"):
+        transformer.fit(df_vartypes)
+
+
+def test_nan_handling_raise_error_fit(df_na):
+    transformer = TextFeatures(
+        variables=["City"], features=["char_count"], missing_values="raise"
+    )
+    msg = "`missing_values='ignore'` when initialising this transformer"
+    with pytest.raises(ValueError, match=msg):
+        transformer.fit(df_na)
+
+
+# ==============================================================================
+# TRANSFORM TESTS - GENERAL
+# ==============================================================================
+
+
+def test_transform_on_new_data():
+    X_train = pd.DataFrame({"text": ["Hello World", "Foo Bar"]})
+    X_test = pd.DataFrame({"text": ["New Data", "Test 123"]})
+
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count", "has_digits"]
+    )
+    transformer.fit(X_train)
+    X_tr = transformer.transform(X_test)
+
+    assert X_tr["text_char_count"].tolist() == [7, 7]
+    assert X_tr["text_has_digits"].tolist() == [0, 1]
+
+
+def test_nan_handling_raise_error_transform():
+    X_train = pd.DataFrame({"text": ["Hello", "World"]})
+    X_test = pd.DataFrame({"text": ["Hello", None, "World"]})
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count"], missing_values="raise"
+    )
+    transformer.fit(X_train)
+    msg = "`missing_values='ignore'` when initialising this transformer"
+    with pytest.raises(ValueError, match=msg):
+        transformer.transform(X_test)
+
+
+def test_nan_handling():
+    X = pd.DataFrame({"text": ["Hello", None, "World"]})
+    transformer = TextFeatures(variables=["text"], features=["char_count"])
+    X_tr = transformer.fit_transform(X)
+
+    # NaN should be filled with empty string, resulting in char_count of 0
+    assert X_tr["text_char_count"].tolist() == [5, 0, 5]
+
+
+def test_default_all_features():
+    """Test extracting all features with default parameters."""
+    X = pd.DataFrame({"text": ["Hello World!", "Python 123", "AI"]})
+    transformer = TextFeatures(variables=["text"])
+    X_tr = transformer.fit_transform(X)
+
+    # Spot check a few features to ensure they were added and computed
+    assert X_tr["text_char_count"].tolist() == [11, 9, 2]
+    assert X_tr["text_word_count"].tolist() == [2, 2, 1]
+    assert X_tr["text_digit_count"].tolist() == [0, 3, 0]
+
+
+def test_specific_features():
+    """Test extracting specific features only."""
+    X = pd.DataFrame({"text": ["Hello", "World"]})
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count", "word_count"]
+    )
+    X_tr = transformer.fit_transform(X)
+
+    # Check only specified features are extracted
+    assert X_tr.columns.tolist() == ["text", "text_char_count", "text_word_count"]
+
+
+def test_specific_variables():
+    """Test extracting features from specific variables only."""
+    X = pd.DataFrame(
+        {"text1": ["Hello", "World"], "text2": ["Foo", "Bar"], "numeric": [1, 2]}
+    )
+    transformer = TextFeatures(variables=["text1"], features=["char_count"])
+    X_tr = transformer.fit_transform(X)
+
+    # Only text1 should have features extracted
+    assert X_tr.columns.tolist() == ["text1", "text2", "numeric", "text1_char_count"]
+
+
+def test_drop_original():
+    """Test drop_original parameter."""
+    X = pd.DataFrame({"text": ["Hello", "World"], "other": [1, 2]})
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count"], drop_original=True
+    )
+    X_tr = transformer.fit_transform(X)
+
+    assert X_tr.columns.tolist() == ["other", "text_char_count"]
+
+
+def test_string_variable_input():
+    """Test that passing a single string variable works (auto-converted to list)."""
+    X = pd.DataFrame({"text": ["Hello", "World"], "other": ["A", "B"]})
+    transformer = TextFeatures(variables="text", features=["char_count"])
+    X_tr = transformer.fit_transform(X)
+
+    assert transformer.variables_ == ["text"]
+    assert X_tr.columns.tolist() == ["text", "other", "text_char_count"]
+    assert X_tr["text_char_count"].tolist() == [5, 5]
+
+
+def test_multiple_text_columns():
+    """Test extracting features from multiple text columns."""
+    X = pd.DataFrame({"a": ["Hello", "World"], "b": ["Foo", "Bar"]})
+    transformer = TextFeatures(
+        variables=["a", "b"], features=["char_count", "word_count"]
+    )
+    X_tr = transformer.fit_transform(X)
+
+    assert X_tr.columns.tolist() == [
+        "a",
+        "b",
+        "a_char_count",
+        "a_word_count",
+        "b_char_count",
+        "b_word_count",
+    ]
+
+
+# ==============================================================================
+# TRANSFORM - TEST TEXT FEATURES
+# ==============================================================================
+
+
+@pytest.fixture(scope="module")
+def df_text():
+    df = pd.DataFrame(
+        {
+            "text": [
+                "Hello World!",
+                "HELLO",
+                "12345",
+                "e.g. i.e.",
+                "   ",
+                " trailing ",
+                "abc...",
+                "",
+                None,
+                "A? B! C.",
+                "HeLLo",
+                "Hi! @#",
+                "A1b2 C3d4!@#$",
+                "???",
+                "i.e., this is wrong",
+                "Is 1 > 2? No, 100%!",
+                "Hello. World",
+                "Hello. World.",
+                "Hello... World!?!",
+                "This is a proper sentence containing "
+                "supercalifragilisticexpialidocious and exceptionally long words.",
+            ]
+        }
+    )
+    return df
+
+
+def test_whitespace_features(df_text):
+    text_features = ["whitespace_count", "whitespace_ratio"]
+    transformer = TextFeatures(variables=["text"], features=text_features)
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_whitespace_count"].tolist() == [
+        1,
+        0,
+        0,
+        1,
+        3,
+        2,
+        0,
+        0,
+        0,
+        2,
+        0,
+        1,
+        1,
+        0,
+        3,
+        5,
+        1,
+        1,
+        1,
+        10,
+    ]
+    assert X_tr["text_whitespace_ratio"].tolist() == [
+        0.08333333333333333,
+        0.0,
+        0.0,
+        0.1111111111111111,
+        1.0,
+        0.2,
+        0.0,
+        0.0,
+        0.0,
+        0.25,
+        0.0,
+        0.16666666666666666,
+        0.07692307692307693,
+        0.0,
+        0.15789473684210525,
+        0.2631578947368421,
+        0.08333333333333333,
+        0.07692307692307693,
+        0.058823529411764705,
+        0.09900990099009901,
+    ]
+
+
+def test_digit_features(df_text):
+    transformer = TextFeatures(
+        variables=["text"], features=["digit_count", "digit_ratio", "has_digits"]
+    )
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_digit_count"].tolist() == [
+        0,
+        0,
+        5,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        4,
+        0,
+        0,
+        5,
+        0,
+        0,
+        0,
+        0,
+    ]
+    assert X_tr["text_digit_ratio"].tolist() == [
+        0.0,
+        0.0,
+        1.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.3333333333333333,
+        0.0,
+        0.0,
+        0.35714285714285715,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+    ]
+    assert X_tr["text_has_digits"].tolist() == [
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+    ]
+
+
+def test_uppercase_features(df_text):
+    transformer = TextFeatures(
+        variables=["text"],
+        features=[
+            "uppercase_count",
+            "uppercase_ratio",
+            "has_uppercase",
+            "starts_with_uppercase",
+        ],
+    )
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_uppercase_count"].tolist() == [
+        2,
+        5,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        3,
+        3,
+        1,
+        2,
+        0,
+        0,
+        2,
+        2,
+        2,
+        2,
+        1,
+    ]
+    assert X_tr["text_uppercase_ratio"].tolist() == [
+        0.18181818181818182,
+        1.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.0,
+        0.5,
+        0.6,
+        0.2,
+        0.16666666666666666,
+        0.0,
+        0.0,
+        0.14285714285714285,
+        0.18181818181818182,
+        0.16666666666666666,
+        0.125,
+        0.01098901098901099,
+    ]
+    assert X_tr["text_has_uppercase"].tolist() == [
+        1,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        1,
+        1,
+        1,
+        0,
+        0,
+        1,
+        1,
+        1,
+        1,
+        1,
+    ]
+    assert X_tr["text_starts_with_uppercase"].tolist() == [
+        1,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        1,
+        1,
+        1,
+        0,
+        0,
+        1,
+        1,
+        1,
+        1,
+        1,
+    ]
+
+
+def test_punctuation_features(df_text):
+    transformer = TextFeatures(
+        variables=["text"], features=["special_char_count", "ends_with_punctuation"]
+    )
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_special_char_count"].tolist() == [
+        1,
+        0,
+        0,
+        4,
+        0,
+        0,
+        3,
+        0,
+        0,
+        3,
+        0,
+        3,
+        4,
+        3,
+        3,
+        5,
+        1,
+        2,
+        6,
+        1,
+    ]
+    assert X_tr["text_ends_with_punctuation"].tolist() == [
+        1,
+        0,
+        0,
+        1,
+        0,
+        0,
+        1,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        1,
+        0,
+        1,
+        0,
+        1,
+        1,
+        1,
+    ]
+
+
+def test_word_features(df_text):
+    transformer = TextFeatures(
+        variables=["text"],
+        features=[
+            "word_count",
+            "unique_word_count",
+            "lexical_diversity",
+            "avg_word_length",
+        ],
+    )
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_word_count"].tolist() == [
+        2,
+        1,
+        1,
+        2,
+        0,
+        1,
+        1,
+        0,
+        0,
+        3,
+        1,
+        2,
+        2,
+        1,
+        4,
+        6,
+        2,
+        2,
+        2,
+        11,
+    ]
+    assert X_tr["text_unique_word_count"].tolist() == [
+        2,
+        1,
+        1,
+        2,
+        0,
+        1,
+        1,
+        0,
+        0,
+        3,
+        1,
+        2,
+        2,
+        1,
+        4,
+        6,
+        2,
+        2,
+        2,
+        11,
+    ]
+    assert X_tr["text_lexical_diversity"].tolist() == [
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        0.0,
+        1.0,
+        1.0,
+        0.0,
+        0.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+        1.0,
+    ]
+    assert X_tr["text_avg_word_length"].tolist() == [
+        6.0,
+        5.0,
+        5.0,
+        4.5,
+        0.0,
+        8.0,
+        6.0,
+        0.0,
+        0.0,
+        2.6666666666666665,
+        5.0,
+        3.0,
+        6.5,
+        3.0,
+        4.75,
+        3.1666666666666665,
+        6.0,
+        6.5,
+        8.5,
+        9.181818181818182,
+    ]
+
+
+def test_basic_features(df_text):
+    transformer = TextFeatures(
+        variables=["text"],
+        features=[
+            "char_count",
+            "sentence_count",
+            "letter_count",
+            "lowercase_count",
+            "is_empty",
+        ],
+    )
+    X_tr = transformer.fit_transform(df_text)
+    assert X_tr["text_char_count"].tolist() == [
+        11,
+        5,
+        5,
+        8,
+        0,
+        8,
+        6,
+        0,
+        0,
+        6,
+        5,
+        5,
+        12,
+        3,
+        16,
+        14,
+        11,
+        12,
+        16,
+        91,
+    ]
+    assert X_tr["text_sentence_count"].tolist() == [
+        1,
+        0,
+        0,
+        4,
+        0,
+        0,
+        1,
+        0,
+        0,
+        3,
+        0,
+        1,
+        1,
+        1,
+        2,
+        2,
+        1,
+        2,
+        2,
+        1,
+    ]
+    assert X_tr["text_letter_count"].tolist() == [
+        10,
+        5,
+        0,
+        4,
+        0,
+        8,
+        3,
+        0,
+        0,
+        3,
+        5,
+        2,
+        4,
+        0,
+        13,
+        4,
+        10,
+        10,
+        10,
+        90,
+    ]
+    assert X_tr["text_lowercase_count"].tolist() == [
+        8,
+        0,
+        0,
+        4,
+        0,
+        8,
+        3,
+        0,
+        0,
+        0,
+        2,
+        1,
+        2,
+        0,
+        13,
+        2,
+        8,
+        8,
+        8,
+        89,
+    ]
+    assert X_tr["text_is_empty"].tolist() == [
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+    ]
+
+
+# ==============================================================================
+# OTHER METHOD TESTS
+# ==============================================================================
+
+
+def test_get_feature_names_out():
+    X = pd.DataFrame({"text": ["Hello"], "other": [1]})
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count", "word_count"]
+    )
+    transformer.fit(X)
+
+    feature_names = transformer.get_feature_names_out()
+    expected_features = ["text", "other", "text_char_count", "text_word_count"]
+    assert feature_names == expected_features
+
+
+def test_get_feature_names_out_with_drop():
+    """Test get_feature_names_out with drop_original=True."""
+    X = pd.DataFrame({"text": ["Hello"], "other": [1]})
+    transformer = TextFeatures(
+        variables=["text"], features=["char_count"], drop_original=True
+    )
+    transformer.fit(X)
+
+    feature_names = transformer.get_feature_names_out()
+    expected_features = ["other", "text_char_count"]
+    assert feature_names == expected_features