{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "5223b1b7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING:tensorflow:From c:\\Users\\Omar\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\tf_keras\\src\\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.\n", "\n" ] } ], "source": [ "from web2json.preprocessor import *\n", "from web2json.ai_extractor import *\n", "from web2json.postprocessor import *\n", "from web2json.pipeline import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "ae4e7f03", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import dotenv\n", "dotenv.load_dotenv()" ] }, { "cell_type": "code", "execution_count": 3, "id": "9e6b0eb9", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Some weights of Qwen3ForSequenceClassification were not initialized from the model checkpoint at Qwen/Qwen3-Reranker-0.6B and are newly initialized: ['score.weight']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] } ], "source": [ "llm = NvidiaLLMClient(config={'api_key': os.getenv('NVIDIA_API_KEY'),'model_name': 'qwen/qwen2.5-7b-instruct'})\n", "# reranker = NvidiaRerankerClient(config={'api_key': os.getenv('NVIDIA_API_KEY'),'model_name': 'nv-rerank-qa-mistral-4b:1'})\n", "reranker = HFRerankerClient()" ] }, { "cell_type": "code", "execution_count": 4, "id": "3bc223d0", "metadata": {}, "outputs": [], "source": [ "prompt_template = \"\"\"\n", "You are a helpful assistant that extracts structured data from web pages.\n", "You will be given a web page and you need to extract the following information:\n", "{content}\n", "\n", "schema: {schema}\n", "Please provide the extracted data in JSON format.\n", "WITH ONLY THE FIELDS THAT ARE IN THE SCHEMA.\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 5, "id": "475fccd2", "metadata": {}, "outputs": [], "source": [ "classification_prompt_template = \"\"\"\n", "{\n", " \"title\": {\"type\": \"string\", \"description\": \"Page title\"},\n", " \"price\": {\"type\": \"number\", \"description\": \"Product price\"},\n", " \"description\": {\"type\": \"string\", \"description\": \"Product description\"}\n", "}\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 6, "id": "974417de", "metadata": {}, "outputs": [], "source": [ "# classification_prompt_template = \"\"\"\n", "# # HTML Chunk Relevance Classification Prompt\n", "\n", "# You are an HTML content classifier. Your task is to analyze an HTML chunk against a given schema and determine if the content is relevant.\n", "\n", "# ## Instructions:\n", "# 1. Carefully examine the provided HTML chunk\n", "# 2. Compare it against the given schema/criteria\n", "# 3. Determine if the HTML chunk contains content that matches or is relevant to the schema\n", "# 4. Respond with ONLY a JSON object containing a single field \"relevant\" with value 1 (relevant) or 0 (not relevant)\n", "\n", "# ## Input Format:\n", "# **Schema/Criteria:**\n", "# {schema}\n", "\n", "# **HTML Chunk:**\n", "# ```html\n", "# {content}\n", "# ```\n", "\n", "# ## Output Format:\n", "# Your response must be ONLY a valid JSON object with no additional text:\n", "\n", "# ```json\n", "# {{\n", "# \"relevant\": 1\n", "# }}\n", "# ```\n", "\n", "# OR\n", "\n", "# ```json\n", "# {{\n", "# \"relevant\": 0\n", "# }}\n", "# ```\n", "\n", "# ## Classification Rules:\n", "# - Output 1 if the HTML chunk contains content that matches the schema criteria\n", "# - Output 0 if the HTML chunk does not contain relevant content\n", "# - Consider semantic meaning, not just exact keyword matches\n", "# - Look at text content, attributes, structure, and context\n", "# - Ignore purely structural HTML elements (like divs, spans) unless they contain relevant content\n", "# - Be STRICT in your evaluation - only mark as relevant (1) if there is clear, meaningful content that directly relates to the schema\n", "# - Empty elements, placeholder text, navigation menus, headers/footers, and generic UI components should typically be marked as not relevant (0)\n", "# - The HTML chunk does not need to contain ALL schema information, but it must contain SUBSTANTIAL and SPECIFIC content related to the schema\n", "\n", "# CRITICAL: Your entire response MUST be exactly one JSON object. DO NOT include any explanations, reasoning, markdown formatting, code blocks, or additional text. Output ONLY the raw JSON object.\n", "# \"\"\"" ] }, { "cell_type": "code", "execution_count": 7, "id": "58436d65", "metadata": {}, "outputs": [], "source": [ "pre = BasicPreprocessor(config={'keep_tags':True})\n", "# llm = GeminiLLMClient(config={'api_key': os.getenv('GEMINI_API_KEY'),})\n", "# ai = AIExtractor(llm_client=llm ,prompt_template=prompt_template)\n", "ai = LLMClassifierExtractor(reranker=reranker, llm_client=llm, prompt_template=prompt_template, classifier_prompt=classification_prompt_template)\n", "post = PostProcessor()" ] }, { "cell_type": "code", "execution_count": 8, "id": "c1c43f7c", "metadata": {}, "outputs": [], "source": [ "# ai.extract(chunks=[\"the price is $1000\", \"the title is 'NVIDIA H100 SXM'\"])" ] }, { "cell_type": "code", "execution_count": 9, "id": "9c78eec9", "metadata": {}, "outputs": [], "source": [ "pipe = Pipeline(preprocessor=pre, ai_extractor=ai, postprocessor=post)" ] }, { "cell_type": "code", "execution_count": 10, "id": "0b324a01", "metadata": {}, "outputs": [], "source": [ "from pydantic import BaseModel, Field, constr, condecimal\n", "\n", "class ProductModel(BaseModel):\n", " productTitle: constr(min_length=1, max_length=200) = Field(\n", " ...,\n", " title=\"Product Title\",\n", " description=\"The full title of the product\"\n", " )\n", " price: condecimal(gt=0, decimal_places=2) = Field(\n", " ...,\n", " title=\"Product Price\",\n", " description=\"Unit price (must be > 0, two decimal places).\"\n", " )\n", " manufacturer: constr(min_length=1, max_length=1000) = Field(\n", " ...,\n", " title=\"Manufacturer\",\n", " description=\"Name of the product manufacturer.\"\n", " )\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 11, "id": "92a5fc23", "metadata": {}, "outputs": [], "source": [ "config = {\n", " 'keep_tags': True,\n", "}" ] }, { "cell_type": "code", "execution_count": 12, "id": "d2cfb033", "metadata": {}, "outputs": [], "source": [ "url = \"https://www.amazon.com/Instant-Pot-Multi-Use-Programmable-Pressure/dp/B00FLYWNYQ?_encoding=UTF8&content-id=amzn1.sym.2f889ce0-246f-467a-a086-d9a721167240&dib=eyJ2IjoiMSJ9.2EzBddTDEktLY8ckTsraM_cZ6pzKuNkA6y_gLR0-Uz1ekttQU6tuQEcjb8PThy0PfhvxLqeYWh3N7pQrGgRxAWzapVklC_aU6xBzD-3Wwqx3qyQRHsmOhPRsSpeCOIIZqS3SKDowZEPYrGnCbRMt5vxnsYMW-fD-zBbgeoeGYmbsN2U6_HNhLjrpePKCbQPmnZBJ9UhgYE4fE3DVuYm8xlJe9l5GixDLVFtZUq4m5FE.Ol-jiuu9P6mQie0yXLJj-Ht5-TXmIXuRPije85p_YVo&dib_tag=se&keywords=cooker&pd_rd_r=2cede598-f3ae-49ca-8a46-e5945a9c2631&pd_rd_w=2HLSC&pd_rd_wg=ZyUUn&qid=1749508157&sr=8-3\"\n", "schema = ProductModel # pydantic class\n", "\n", "# read html file \n", "# with open(r'C:\\Users\\abdfa\\Desktop\\UNI STUFFING\\GRADUATION PROJECT\\Group Work\\MCP_WEB2JSON\\0000.htm', 'r', encoding='utf-8') as file:\n", "# content = file.read()\n", "\n", "# with open(r'C:\\Users\\abdfa\\Desktop\\UNI STUFFING\\GRADUATION PROJECT\\Group Work\\MCP_WEB2JSON\\Amazon.com_ Instant Pot Duo 7-in-1 Electric Pressure Cooker, Slow Cooker, Rice Cooker, Steamer, Sauté, Yogurt Maker, Warmer & Sterilizer, Includes App With Over 800 Recipes, Stainless Steel, 6 Quart.htm', 'r', encoding='utf-8') as file:\n", "# content = file.read()\n" ] }, { "cell_type": "code", "execution_count": 13, "id": "f07e1aca", "metadata": {}, "outputs": [], "source": [ "# import os\n", "\n", "# content = \"\"\"\n", "#
\n", "# \"\"\"\n", "\n", "# from web2json.ai_extractor import HFRerankerClient, LLMClassifierExtractor, NvidiaLLMClient\n", "\n", "# hf_reranker = HFRerankerClient()\n", "# llm_client = NvidiaLLMClient(config={\"api_key\": os.environ.get('NVIDIA_API_KEY')})\n", "# extractor = LLMClassifierExtractor(\n", "# reranker=hf_reranker,\n", "# llm_client=llm_client,\n", "# prompt_template=\"Extract from: {content} using schema: {schema}\",\n", "# classifier_prompt=\"What is the price?\"\n", "# )\n", "\n", "# # Run using HuggingFace reranker\n", "# result = extractor.extract(content=content, schema=schema, hf=True)\n" ] }, { "cell_type": "code", "execution_count": 15, "id": "79cf2321", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n", "Content successfully chunked into 11.\n", "Content successfully chunked: [\"\\n\\n\\n\\nThis item has been tested to certify it can ship safely in its original box or bag to avoid unnecessary packaging. Since 2015, we have reduced the weight of outbound packaging per shipment by 41% on average, that’s over 2 million tons of packaging material.
If you still require Amazon packaging for this item, choose \"Ship in Amazon packaging\" at checkout. Learn moreBrand | Instant Pot |
---|---|
Capacity | 5.68 Liters |
Material | Stainless steel |
Finish Type | Stainless Steel |
Product Dimensions | 12.2\"D x 13.38\"W x 12.48\"H |
Special Feature | Programmable |
Wattage | 1000 watts |
Item Weight | 11.8 Pounds |
Control Method | Touch |
Controller Type | Push Button |
Operation Mode | Automatic |
Is Dishwasher Safe | Yes |
Voltage | 120 Volts |
Closure Type | Outer Lid, Inner Lid |
UPC | 810028585201 |
Item Weight | 11.8 pounds |
Manufacturer | Instant Pot |
ASIN | B00FLYWNYQ |
Country of Origin | China |
Item model number | 112-0170-01 |
Customer Reviews | \\n \\n 4.6 4.6 out of 5 stars \\n 130,203 ratings \\n \\n 4.6 out of 5 stars |
Best Sellers Rank |
|
Is Discontinued By Manufacturer | No |
Date First Available | December 2, 2013 |
Easy to use, easy to clean, fast, versatile, and convenient, the Instant Pot® Duo™ is the one that started it all. It replaces 7 kitchen appliances: pressure cooker, slow cooker, rice cooker, steamer, sauté pan, yogurt maker & warmer. With 13 built-in smart programs, cook your favorite dishes with the press of a button. The tri-ply, stainless steel inner pot offers quick, even heating performance. Redefine cooking and enjoy quick and easy meals anywhere, any time. The Instant Pot Duo offers the quality, convenience and versatility you’ve come to expect from Instant – discover amazing.
This Item Buying optionsInstant Pot\\xa0Duo 7-in-1 Electric Pressure Cooker, Slow Cooker, Rice Cooker, Steamer, Sauté, Yogurt Maker, Warmer & Sterilizer, Includes App With Over 800 Recipes, Stainless Steel, 6 Quart | Recommendations Instant Pot\\xa0Duo Crisp 11-in-1 Air Fryer and Electric Pressure Cooker Combo with Multicooker Lids that Air Fries, Steams, Slow Cooks, Sautés, Dehydrates, & More, Free App With Over 800 Recipes, 6 Quart | carori\\xa0CARORI 9-in-1 Electric Pressure Cooker 6 Qt, Programmable Multi-Function Cooker with Safer Vent, Olla de Presion, Rice Cooker, Slow Cooker, Steamer, Sauté, Warmer & Sterilizer, 1000W, Stainless Steel | Midea\\xa012-in-1 Electric Pressure Cooker, 8 Quarts, 12 Presets, Multi-Functional Programmable Slow Cooker, Rice Cooker, Steamer, Sauté Pan, Yogurt Maker, and More, Stainless Steel |
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers find the pressure cooker works well, particularly praising its sauté feature and accurate cooking times. They appreciate its ease of use, with one customer noting the intuitive controls, and consider it a great kitchen appliance that makes meal prep convenient. The appliance receives positive feedback for its cooking ability, with one customer highlighting its versatility in transforming into a pressure cooker, and customers find it easy to clean with a stainless steel pot that cleans well. Customers enjoy the complex flavors produced, though opinions on build quality are mixed, with some finding it well-made while others describe it as wimpy.
AI Generated from the text of customer reviews
Customers find that the pressure cooker works well, with the sauté feature performing particularly effectively.
\"...This works with new potatoes, and regular potatoes! Happy Instant Potting!\" Read more
\"...It was excellent. I did 6 minutes per pound + 2 minutes. I also cook chicken thighs for dinner about once a week, which I had never cooked before....\" Read more
\"...Most programs work just fine on full automatic, but some small exceptions may demand more online flexibility....\" Read more
\"...occasional mishaps, the Instant Pot Duo has consistently delivered incredible results....\" Read more
Customers find the pressure cooker simple to use, with clear operating instructions in the booklet, making meal preparation a breeze.
\"...make in your Instant Pot that will change your life: incredibly easy perfectly poached eggs in 2-3 minutes, and baked potatoes in 12 minutes....\" Read more
\"...credit as most automatic settings work well, automating it for ease of use and safety. Cooking is part Science, but, I think, more Art than Science....\" Read more
\"...crockpot extensively over the past years and while I appreciate the ease of use and the ability to put a meal on the table soon after I got home in...\" Read more
\"...of pressure cookers anymore, the time , energy bills saved n convenience is worth it!...\" Read more