SharePoint Online

Connect to SharePoint Online to import files from specified SharePoint libraries into Foundry.

Supported capabilities

Capability	Status
Exploration	🟢 Generally available
Bulk import	🟢 Generally available
Incremental	🟢 Generally available
Export tasks	🟡 Sunset
File exports	🟢 Generally available

Data model

The connector can transfer files of any type into Foundry datasets. File formats are preserved, and no schemas are applied during or after the transfer. Apply any necessary schema to the output dataset, or write a downstream transformation to access the data.

Performance and limitations

There is no limit to the size of transferable files. However, network issues can result in failures of large-scale transfers. In particular, Foundry syncs that take more than two days to run will be interrupted. To avoid network issues, we recommend using smaller file sizes and limiting the number of files that are ingested in every execution of the sync. Syncs can be scheduled to run frequently.

Connections to on-premise SharePoint servers are not supported. Use a REST API source type to connect to on-premise SharePoint.

Setup

Open the Data Connection application and select + New Source in the upper right corner of the screen.
Select SharePoint Online from the available connector types.
Choose to run the source capabilities on a Foundry worker or on an agent worker.
Follow the additional configuration prompts to continue the setup of your connector using the information in the sections below.

Learn more about setting up a connector in Foundry.

Authentication

Authentication for the SharePoint Online source requires an application in Microsoft Entra ID (formerly known as Azure Active Directory). If you are not an Entra ID administrator, contact your IT department to request access.

Follow the initial steps below to access Azure application credentials:

Create an application registration in Azure by following the instructions in the Microsoft documentation ↗.
- At Step 5, select Accounts in this organizational directory only and skip Redirect URL (optional).
Note the client ID and tenant ID once registration is complete.

Then, choose between two available authentication method:

Client credentials: Recommended when a wide range of access is required for every SharePoint site.
Username/password: Recommended for limiting access to one or a few SharePoint sites.

Client credentials

In your Microsoft Entra admin center, complete the following steps:

Go to API Permissions in the left sidebar.
Select Add a Permission.
Select Microsoft Graph.
Select Application Permissions.
- If you would like your application to read all SharePoint sites add Sites.Read.All.
  - If you plan to configure export tasks, use Sites.ReadWrite.All instead.
- If you would like your application to read selected SharePoint sites add Sites.Selected.
If you are an Entra Administrator, select Grant admin consent for [tenant].
If you added Sites.Selected above, add your application to specific sites ↗.
- The available options for the "roles" array parameter are "write" and/or "read". The "read" option is sufficient to ingest files from the SharePoint site.
- To easily send a POST with proper authentication, use the Graph Explorer ↗.
- You can receive metadata about a site by sending a GET to https://graph.microsoft.com/v1.0/sites/[tenantName]:/sites/[siteName] (for example: https://graph.microsoft.com/v1.0/sites/contoso.sharepoint.com:/sites/mySite). This request will return an ID that is a composite of several values: Site collection hostname, Site collection unique ID, and Site unique ID where the middle value is the siteId needed to run the permissions POST.
Generate a client secret. ↗.

Set the following source configurations in Data Connection:

Option	Required?	Description
`Azure Client ID`	Yes	The ID of the app registration; also called Application ID.
`Azure Tenant ID`	Yes	the unique identifier of the Microsoft Entra ID instance.
`Client secret`	Yes	The secret generated in the app registration.

Username/password

The username/password flow involves creating a user account that can sign in to Microsoft 365. The Graph API does not support two-factor authentication for the username/password authentication method. Because of this, we strongly recommend creating a randomly generated password of at least 32 characters in length.

In your Entra admin center, complete the following steps:

Go to API Permissions in the left sidebar.
Select Add a Permission.
Select Microsoft Graph.
Select Delegated Permissions.
Add the Sites.Read.All permission;.
- If you plan to configure export tasks, use Sites.ReadWrite.All instead.
If you are an Azure Administrator, select Grant admin consent for [tenant].
Go to Authentication in the left sidebar.
Change Allow public client flows to Yes.
Create a user in Microsoft Entra ID with a randomly generated password of at least 32 characters.
Add that user to any SharePoint sites that you would like it to read or write.

Set the following source configurations in Data Connection:

Option	Required?	Description
`Azure Client ID`	Yes	The ID of the app registration; also called Application ID.
`Username`	Yes	The user's email address.
`Password`	Yes	The generated password.

XML-based permissioning for SharePoint Add-ins

If you are using SharePoint Add-ins for authorization and authentication ↗, and your SharePoint Add-in uses XML for permission management, you must ensure that the correct scope is set in the scope URI to avoid access issues when connecting to SharePoint.

Follow the steps below to verify and configure the correct scope:

Locate the AppManifest.xml file containing the permission settings for your SharePoint Add-in.
In the AppManifest.xml file, identify the scope URI within the XML file, which should look similar to this:

<AppPermissionRequests AllowAppOnlyPolicy="true"> <AppPermissionRequest Scope="http://sharepoint/content/sitecollection/web" Right="FullControl" /> </AppPermissionRequests>.

Verify that the scope value (in this example, http://sharepoint/content/sitecollection/web) matches the SharePoint site to which you are connecting; if the scope value does not match, adjust the scope value accordingly.

Networking

The SharePoint Online connector requires network access to the following domains on port 443:

login.microsoftonline.com
graph.microsoft.com
Your SharePoint URL; for example, contoso.sharepoint.com

If you are using a GovCloud SharePoint instance, use the following domains on port 443 instead:

login.microsoftonline.us
graph.microsoft.us
Your SharePoint URL; for example, contoso.sharepoint.us

Configuration options

The following configuration options are available for the SharePoint Online connector:

Option	Required?	Description
`SharePoint Library URL`	Yes	A single SharePoint site may have several document libraries; your URL must point to a specific library. Must be in the format `https://[tenant].sharepoint.com/sites/[site]/[library]`.
`Credentials settings`	Yes	Configure using the Authentication guidance shown above.
`Proxy settings`	No	Enable to use a proxy while connecting to SharePoint Online.

Sync data from SharePoint Online

The SharePoint Online connector uses the file-based sync interface.

Export data to SharePoint Online

To export to a SharePoint site, first enable exports for your SharePoint Online connector. Then, create a new export.

Export configuration options

Option	Required?	Default	Description
`Directory path`	Yes	/	The path to the folder in the SharePoint library where files should be exported. The full path for an exported file is calculated as `<SharePoint Library URL>/Directory Path>/<Exported File Path>`

Use SharePoint sources in code

The example below demonstrates how to upload a file to a SharePoint source using the Python client for SharePoint ↗ Office365-REST-Python-Client in an external transform. Note that this example uses client certificate authentication.

Review more examples from SharePoint ↗.

Copied!1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from pyspark.sql import DataFrame
from transforms.api import Input, Output, transform, lightweight
from transforms.external.systems import external_systems, Source
import pandas as pd
import polars as pl
from office365.sharepoint.client_context import ClientContext

@lightweight
@external_systems(
    sharepoint_source=Source("<source_rid>")
)
@transform(
    output=Output("<dataset_rid>"),
    input_df=Input("<dataset_rid>"), # Dataset containing a list of files to export to SharePoint
)
def compute(ctx, input_df: DataFrame, output, sharepoint_source) -> DataFrame:

    # 1. Connect to SharePoint using client certificate authentication.
    client = ClientContext("<sharepoint_url>").with_client_certificate(
        tenant="<tenant_id>",
        client_id="<client_id>",
        thumbprint="<thumbprint>",
        private_key=sharepoint_source.get_secret("clientSecret"),
    )

    current_web = client.web
    client.load(current_web)
    client.execute_query()

    target_folder = client.web.lists.get_by_title("<document_library_name>").root_folder

    # 2 Upload files from input_df, store URL in dataset
    upload_urls = []
    fs = source_df.filesystem()
    input_files = fs.ls()
    for f in input_files:
        with fs.open(f.path) as fileobj:
            uploaded_file = target_folder.upload_file(f.path, fileobj).execute_query()
            upload_urls.append({'file_name': f.path, 'upload_url': uploaded_file.serverRelativeUrl})


    # 3. Return dataset of uploaded URLs
    output.write_table(pl.from_pandas(pd.DataFrame.from_records(upload_urls)))