Skip to content

InsightExtractor: An AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export

License

Notifications You must be signed in to change notification settings

messi10tom/InsightExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InsightExtractor

GitHub issues GitHub forks GitHub stars GitHub license GitHub contributors GitHub last commit InsightExtractor Banner

InsightExtractor is an AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export.

Video Tutorial

For a detailed walkthrough on how to use InsightExtractor, check out our YouTube video tutorial.

Supported Models

InsightExtractor supports multiple AI models for data extraction and processing. Users can choose from the following models:

  • Gemini
  • ChatGPT
  • Ollama

Each model offers unique capabilities and can be selected based on the specific requirements of your data extraction tasks.

Features

  • Automated Data Retrieval: Seamlessly connect CSVs or Google Sheets and extract data.
  • Customizable Prompts: Define queries and prompts to tailor the data extraction process.
  • Web Scraping: Scrape web data and integrate it with your structured data.
  • LLM Integration: Utilize Language Models to process and extract insights from the data.
  • Intuitive Dashboard: User-friendly interface for managing data and exporting results.
  • API Integration: Easily integrate with other tools and services via APIs.

Installation

To install InsightExtractor, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/messi10tom/InsightExtractor.git
    cd InsightExtractor

    Windows

    python -m venv IE
    IE\Scripts\activate
    

    macOS and Linux

    python3 -m venv IE
    source IE/bin/activate
    
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Set Up Environment Variables:

  • Create BD_AUTH Token:

    • Visit Bright Data and access the dashboard.
    • Choose "Scraping Browser" from the "Add" dropdown menu.
    • Name your scraping browser (e.g., "InsightExtractor") and create it.
    • Go to "Playground" in your Scraping Browser and toggle to "Code Examples".
    • Select "Python, Selenium" and copy the AUTH key from the example script.
    • Paste the AUTH key into the BD_AUTH field in your .env file.
  • Create Google Application Credentials:

    • Visit Google Cloud Console and select your Google account.
    • Create and select a new project.
    • Navigate to "API & Services" and enable the Google Sheets API.
    • Create credentials, set the service account role to Editor, and generate a JSON key.
    • Download the JSON key file and move it to the project directory.
    • Copy the file path and paste it into the GOOGLE_APPLICATION_CREDENTIALS field in your .env file.
  • Create Google API Key:

    • Visit Google AI Studio and sign in with your Google account.
    • Click on the "Get API key".
    • Click on "Create API Key" to generate a new API key.
    • Copy the generated API key and paste it into the GOOGLE_API_KEY field in your .env file.
  • Setting Up ChatGPT API Key:

    1. Create an Account:

      • Sign up for a free account on ChatGPT here.
    2. Generate an API Key:

      • Log in, go to "API Keys", click "+ Create new secret key", name your key, and copy the API key.
    3. Set Up Billing:

      • Go to 'Billing', add payment details, choose user type, enter payment info, and configure payment options.
    4. Set Usage Limits:

      • Go to 'Limits', set hard and soft usage caps, and click 'Save'.
    5. Save API Key in .env File:

      • Add the following line to your .env file:
        OPENAI_API_KEY=your_api_key_here
  • Set Up Ollama:

    • Visit Ollama GitHub and download the appropriate version for your operating system.
    • After downloading, open your terminal and run the following command:
      ollama run llama3.2
  • Set up streamlit secrets

    • create .streamlit\secrets.toml and paste all the API keys in the following format
    BD_AUTH=""
    GOOGLE_APPLICATION_CREDENTIALS="path/to/your/credentials.json"
    GOOGLE_API_KEY=""
    OPENAI_API_KEY=""
  1. Run the Application:
    streamlit run src/main.py

Usage

  1. Upload CSV or Google Sheets:

    • Choose to upload a CSV file or connect to a Google Sheet.
    • Ensure the CSV file contains a column named "Links" with the URLs of the webpages you want to scrape.
  2. Define Your Query:

    • Enter a prompt to define what data you want to extract.
  3. Extract Data:

    • The tool will scrape the web data, process it using LLMs, and present the extracted insights.
  4. Export Results:

    • Download the results as a CSV file for further analysis.

Example

Here is an example of how to use InsightExtractor:

  1. Sample CSV File:

    Links,company
    example1.com,company_1
    example2.com,company_2
  2. User Prompt:

    Extract the names, emails, and companies of the professionals mentioned in the text {professional}.
    

Contributing

We welcome contributions to InsightExtractor! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

About

InsightExtractor: An AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages