How to Extract Text from PDFs using Foxit’s REST APIs

Want to extract text from PDF files with just a few lines of Python? This guide shows how to use Foxit's REST Extract API to pull text content from PDFs, ideal for search, automation, or AI workflows. From setting up credentials to searching for keywords across multiple files, this post walks through the full process with example code and GitHub demos.
How to Extract Text from PDFs using Foxit's REST APIs

How to Extract Text from PDFs using Foxit’s REST APIs

PDFs are an excellent way to store information—they combine text, images, and more in a perfectly laid-out, eye-catching design that fulfills every marketer’s wildest dreams. But sometimes you just need the text! There’s a variety of reasons you may want to convert a rich PDF document into plain text:

  • For indexing in a search engine
  • To search documents for keywords
  • To pass to generative AI services for introspection

Let’s take a look at the Extract API to see just how easy this is.

Start Here: Obtain Free Credentials to Use the Foxit API

Before we go any further, head over to our developer portal and grab a set of free credentials. This will include a client ID and secret values – you’ll need both to make use of the API.

Foxit PDF API Workflow Overview with Python

The API follows the same format as the rest of our PDF Services in that you upload your input, kick off the job, check the job’s status, and download the result. As we’ve covered this a few times now on the blog (see my introductory post, we’ll skip over the details of uploading the document and loading in credentials. Here’s the Python code we’ve demonstrated before showing this in action:

CLIENT_ID = os.environ.get('CLIENT_ID')
CLIENT_SECRET = os.environ.get('CLIENT_SECRET')
HOST = os.environ.get('HOST')

def uploadDoc(path, id, secret):
	
	headers = {
		"client_id":id,
		"client_secret":secret
	}

	with open(path, 'rb') as f:
		files = {'file': (path, f)}

		request = requests.post(f"{HOST}/pdf-services/api/documents/upload", files=files, headers=headers)
		return request.json()

doc = uploadDoc("../../inputfiles/input.pdf", CLIENT_ID, CLIENT_SECRET)
print(f"Uploaded pdf to Foxit, id is {doc['documentId']}")

Now let's get into the meat of the Extract API. The API takes three arguments:

  • The ID of the previously uploaded document.
  • The type of information to extract—either TEXT, IMAGE, or PAGE. In theory, it should be pretty obvious what these do, but just in case: TEXT returns the text contents of the PDF. IMAGE gives you a ZIP file of images from the PDF. PAGE returns a new PDF containing just the page you requested.
  • You can also pass in a page range, which can be a combo of specific pages and ranges. If you don’t include one, the entire PDF gets processed for extraction.

To make this simple to use, I've built a wrapper function that lets you pass these arguments:

def extractPDF(doc, type, id, secret, pageRange=None):
    
    headers = {
        "client_id":id,
        "client_secret":secret,
        "Content-Type":"application/json"
    }

    body = {
        "documentId":doc,
        "extractType":type
    }

    if pageRange:
        body["pageRange"] = pageRange 

    request = requests.post(f"{HOST}/pdf-services/api/documents/modify/pdf-extract", json=body, headers=headers)
    return request.json()

Literally, that's it. At this point, you get a task object back that – like with our other APIs – can be checked for completion, and once it’s done, the results can be downloaded. Since we're working with text, though, let's simplify and just grab the text as a variable:

def getResult(doc, id, secret):
    
    headers = {
        "client_id":id,
        "client_secret":secret
    }

    return requests.get(f"{HOST}/pdf-services/api/documents/{doc}/download", headers=headers).text
This utility method takes a document ID value and gets the textual content. Here’s how that code looks:
doc = uploadDoc("../../inputfiles/input.pdf", CLIENT_ID, CLIENT_SECRET)
print(f"Uploaded pdf to Foxit, id is {doc['documentId']}")

task = extractPDF(doc["documentId"], "TEXT", CLIENT_ID, CLIENT_SECRET)
print(f"Created task, id is {task['taskId']}")

result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)
print(f"Final result: {result}")

text = getResult(result["resultDocumentId"], CLIENT_ID, CLIENT_SECRET)
print(text)
You can see the entire script on our GitHub. Running it will just give you a wall of text. Not terribly exciting. So, let’s make it exciting!

Searching PDFs for Keywords

Let’s iterate on the previous example for something that could be a bit more useful – given a set of input PDFs, extract the text from each and report if a certain keyword, or keywords are found. I’ll start by gathering a list of PDFs from a source directory. But you could imagine this coming from new files in a cloud storage provider, attachments in new emails, and so forth:
# Get PDFs from our input directory
inputFiles = list(filter(lambda x: x.endswith('.pdf'), os.listdir('../../inputfiles')))
Now, I’ll define a keyword. A more complex version of this would probably use a list of keywords, but we’ll keep it simple for now:
# Keyword to match on: 
keyword = "Shakespeare"
And now to actually do the work. Remember, we’ve already defined our methods, so the only thing changing here is the code calling them:
for file in inputFiles:
    
    doc = uploadDoc(f"../../inputfiles/{file}", CLIENT_ID, CLIENT_SECRET)
    print(f"Uploaded pdf, {file}, to Foxit, id is {doc['documentId']}")

    task = extractPDF(doc["documentId"], "TEXT", CLIENT_ID, CLIENT_SECRET)
    result = checkTask(task["taskId"], CLIENT_ID, CLIENT_SECRET)

    text = getResult(result["resultDocumentId"], CLIENT_ID, CLIENT_SECRET)
    if keyword in text:
        print(f"\033[32mThe pdf, {file}, matched on our keyword: {keyword}\033[0m")
    else:
        print(f"The pdf, {file}, did not match on our keyword: {keyword}")
    
    print("")
Given my set of inputs, there’s only one match. Here’s the output I received:
Output from the script showing documents that contained the keyword | Foxit APIs
You can find the complete source code for this on our GitHub repo.

What’s Next?

The demo here is fairly simple, but you could imagine it being expanded to include things like automatic routing of PDFs with matching keywords, email alerts, and so forth. As a reminder, when working with any process like this, you can cache the result of the extraction. Imagine a scenario where the important keywords may change in the future. Your code could store the result of the text extract to the file system (perhaps with the same name as the PDF but using `.txt` as the extension instead) and simply skip calling our API when the cache exists. Our API will miss you, but that’s ok.

If this all sounds exciting, be sure to check the docs for more information about the template language and API. Sign up for some free developer credentials and reach out on our developer forums with any questions.
Explore More Blogs
How to Extract Text from PDFs using Foxit's REST APIs
Extract

How to Extract Text from PDFs using Foxit’s REST APIs

Want to extract text from PDF files with just a few lines of Python? This guide shows how to use Foxit’s REST Extract API to pull text content from PDFs, ideal for search, automation, or AI workflows. From setting up credentials to searching for keywords across multiple files, this post walks through the full process with example code and GitHub demos.

Generate Dynamic PDFs from JSON using Foxit APIs

Generate Dynamic PDFs from JSON using Foxit APIs

See how easy it is to generate PDFs from JSON using Foxit’s Document Generation API. With Word as your template engine, you can dynamically build invoices, offer letters, and agreements—no complex setup required. This tutorial walks through the full process in Python and highlights the flexibility of token-based document creation.

Introducing PDF APIs from Foxit

Introducing PDF APIs from Foxit

Get started with Foxit’s new PDF APIs—convert Word to PDF, generate documents, and embed files using simple, scalable REST APIs. Includes sample Python code and walkthrough.

API Webinars

Explore Real-World Use Cases, Live Demos, and Best Practices.
Our technical team walks through practical applications of Foxit APIs with live Q&A, hands-on demos, and clear integration strategies. Whether you're comparing tools or actively building, these sessions are designed to help you move faster with fewer roadblocks

What You'll Learn