UnstructuredMarkdownLoader | ๐Ÿฆœ๏ธ๐Ÿ”— LangChain
Skip to main content
Open In ColabOpen on GitHub

UnstructuredMarkdownLoader

This notebook provides a quick overview for getting started with UnstructuredMarkdown document loader. For detailed documentation of all __ModuleName__Loader features and configurations head to the API reference.

Overviewโ€‹

Integration detailsโ€‹

ClassPackageLocalSerializableJS support
UnstructuredMarkdownLoaderlangchain_communityโŒโŒโœ…

Loader featuresโ€‹

SourceDocument Lazy LoadingNative Async Support
UnstructuredMarkdownLoaderโœ…โŒ

Setupโ€‹

To access UnstructuredMarkdownLoader document loader you'll need to install the langchain-community integration package and the unstructured python package.

Credentialsโ€‹

No credentials are needed to use this loader.

To enable automated tracing of your model calls, set your LangSmith API key:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installationโ€‹

Install langchain_community and unstructured

%pip install -qU langchain_community unstructured

Initializationโ€‹

Now we can instantiate our model object and load documents.

You can run the loader in one of two modes: "single" and "elements". If you use "single" mode, the document will be returned as a single Document object. If you use "elements" mode, the unstructured library will split the document into elements such as Title and NarrativeText. You can pass in additional unstructured kwargs after mode to apply different unstructured settings.

from langchain_community.document_loaders import UnstructuredMarkdownLoader

loader = UnstructuredMarkdownLoader(
"./example_data/example.md",
mode="single",
strategy="fast",
)

Loadโ€‹

docs = loader.load()
docs[0]
Document(metadata={'source': './example_data/example.md'}, page_content='Sample Markdown Document\n\nIntroduction\n\nWelcome to this sample Markdown document. Markdown is a lightweight markup language used for formatting text. It\'s widely used for documentation, readme files, and more.\n\nFeatures\n\nHeaders\n\nMarkdown supports multiple levels of headers:\n\nHeader 1: # Header 1\n\nHeader 2: ## Header 2\n\nHeader 3: ### Header 3\n\nLists\n\nUnordered List\n\nItem 1\n\nItem 2\n\nSubitem 2.1\n\nSubitem 2.2\n\nOrdered List\n\nFirst item\n\nSecond item\n\nThird item\n\nLinks\n\nOpenAI is an AI research organization.\n\nImages\n\nHere\'s an example image:\n\nCode\n\nInline Code\n\nUse code for inline code snippets.\n\nCode Block\n\n\`\`\`python def greet(name): return f"Hello, {name}!"\n\nprint(greet("World")) \`\`\`')
print(docs[0].metadata)
{'source': './example_data/example.md'}

Lazy Loadโ€‹

page = []
for doc in loader.lazy_load():
page.append(doc)
if len(page) >= 10:
# do some paged operation, e.g.
# index.upsert(page)

page = []
page[0]
Document(metadata={'source': './example_data/example.md', 'link_texts': ['OpenAI'], 'link_urls': ['https://www.openai.com'], 'last_modified': '2024-08-14T15:04:18', 'languages': ['eng'], 'parent_id': 'de1f74bf226224377ab4d8b54f215bb9', 'filetype': 'text/markdown', 'file_directory': './example_data', 'filename': 'example.md', 'category': 'NarrativeText', 'element_id': '898a542a261f7dc65e0072d1e847d535'}, page_content='OpenAI is an AI research organization.')

Load Elementsโ€‹

In this example we will load in the elements mode, which will return a list of the different elements in the markdown document:

from langchain_community.document_loaders import UnstructuredMarkdownLoader

loader = UnstructuredMarkdownLoader(
"./example_data/example.md",
mode="elements",
strategy="fast",
)

docs = loader.load()
len(docs)
29

As you see there are 29 elements that were pulled from the example.md file. The first element is the title of the document as expected:

docs[0].page_content
'Sample Markdown Document'

API referenceโ€‹

For detailed documentation of all UnstructuredMarkdownLoader features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.markdown.UnstructuredMarkdownLoader.html

TMZ Celebrity News โ€“ Breaking Stories, Videos & Gossip

Looking for the latest TMZ celebrity news? You've come to the right place. From shocking Hollywood scandals to exclusive videos, TMZ delivers it all in real time.

Whether itโ€™s a red carpet slip-up, a viral paparazzi moment, or a legal drama involving your favorite stars, TMZ news is always first to break the story. Stay in the loop with daily updates, insider tips, and jaw-dropping photos.

๐ŸŽฅ Watch TMZ Live

TMZ Live brings you daily celebrity news and interviews straight from the TMZ newsroom. Donโ€™t miss a beatโ€”watch now and see whatโ€™s trending in Hollywood.