InsightExtractor is an AI-powered tool for automated data retrieval. Connect CSVs or Google Sheets, define queries, and extract structured insights via web search and LLMs. Features include customizable prompts, API integration, and an intuitive dashboard for data export.
For a detailed walkthrough on how to use InsightExtractor, check out our YouTube video tutorial.
InsightExtractor supports multiple AI models for data extraction and processing. Users can choose from the following models:
- Gemini
- ChatGPT
- Ollama
Each model offers unique capabilities and can be selected based on the specific requirements of your data extraction tasks.
- Automated Data Retrieval: Seamlessly connect CSVs or Google Sheets and extract data.
- Customizable Prompts: Define queries and prompts to tailor the data extraction process.
- Web Scraping: Scrape web data and integrate it with your structured data.
- LLM Integration: Utilize Language Models to process and extract insights from the data.
- Intuitive Dashboard: User-friendly interface for managing data and exporting results.
- API Integration: Easily integrate with other tools and services via APIs.
To install InsightExtractor, follow these steps:
-
Clone the Repository:
git clone https://github.com/messi10tom/InsightExtractor.git cd InsightExtractorWindows
python -m venv IE IE\Scripts\activatemacOS and Linux
python3 -m venv IE source IE/bin/activate -
Install Dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables:
-
Create BD_AUTH Token:
- Visit Bright Data and access the dashboard.
- Choose "Scraping Browser" from the "Add" dropdown menu.
- Name your scraping browser (e.g., "InsightExtractor") and create it.
- Go to "Playground" in your Scraping Browser and toggle to "Code Examples".
- Select "Python, Selenium" and copy the AUTH key from the example script.
- Paste the AUTH key into the
BD_AUTHfield in your.envfile.
-
Create Google Application Credentials:
- Visit Google Cloud Console and select your Google account.
- Create and select a new project.
- Navigate to "API & Services" and enable the Google Sheets API.
- Create credentials, set the service account role to Editor, and generate a JSON key.
- Download the JSON key file and move it to the project directory.
- Copy the file path and paste it into the
GOOGLE_APPLICATION_CREDENTIALSfield in your.envfile.
-
Create Google API Key:
- Visit Google AI Studio and sign in with your Google account.
- Click on the "Get API key".
- Click on "Create API Key" to generate a new API key.
- Copy the generated API key and paste it into the
GOOGLE_API_KEYfield in your.envfile.
-
Setting Up ChatGPT API Key:
-
Create an Account:
- Sign up for a free account on ChatGPT here.
-
Generate an API Key:
- Log in, go to "API Keys", click "+ Create new secret key", name your key, and copy the API key.
-
Set Up Billing:
- Go to 'Billing', add payment details, choose user type, enter payment info, and configure payment options.
-
Set Usage Limits:
- Go to 'Limits', set hard and soft usage caps, and click 'Save'.
-
Save API Key in
.envFile:- Add the following line to your
.envfile:OPENAI_API_KEY=your_api_key_here
- Add the following line to your
-
-
Set Up Ollama:
- Visit Ollama GitHub and download the appropriate version for your operating system.
- After downloading, open your terminal and run the following command:
ollama run llama3.2
-
Set up streamlit secrets
- create
.streamlit\secrets.tomland paste all the API keys in the following format
BD_AUTH="" GOOGLE_APPLICATION_CREDENTIALS="path/to/your/credentials.json" GOOGLE_API_KEY="" OPENAI_API_KEY=""
- create
- Run the Application:
streamlit run src/main.py
-
Upload CSV or Google Sheets:
- Choose to upload a CSV file or connect to a Google Sheet.
- Ensure the CSV file contains a column named "Links" with the URLs of the webpages you want to scrape.
-
Define Your Query:
- Enter a prompt to define what data you want to extract.
-
Extract Data:
- The tool will scrape the web data, process it using LLMs, and present the extracted insights.
-
Export Results:
- Download the results as a CSV file for further analysis.
Here is an example of how to use InsightExtractor:
-
Sample CSV File:
Links,company example1.com,company_1 example2.com,company_2
-
User Prompt:
Extract the names, emails, and companies of the professionals mentioned in the text {professional}.
We welcome contributions to InsightExtractor! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
