Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support calling WebDriver Classic commands through WebDriver Bidi bridge #701

Closed
christian-bromann opened this issue Apr 20, 2024 · 8 comments

Comments

@christian-bromann
Copy link
Member

christian-bromann commented Apr 20, 2024

Hey,

with Bidi and its support for managing different contexts more easily it may be worth considering building a bridge between WebDriver Bidi and Classic allowing developers to call Classic commands via Bidi. I could imagine the following, we could introduce a webdriver module that has a execute command, e.g.:

webdriver.Execute = (
  method: "webdriver.execute",
  params: webdriver.ExecuteParameters
)

webdriver.ExecuteParameters = {
  url: text
  context: browsingContext.BrowsingContext
  ? payload?: text / null
}

text may be the stringified body payload for the command. I don't have any strong opinions on how this should look like in detail. This is intentionally kept simple.

This could enable us to simplify user interactions across multiple browsing contexts. Imagine a page with 3 nested iframes (e.g. root > iframeA > iframeB > iframeC) , to interact with elements in iframeC a user would need to call a set of commands to identify all iframes and switch into them accordingly. Then to continue on the root page, the user would have to switch back. This has been known to be a tedious and error prone process. Framework authors like me could make this very easy and essentially remove the need to care about switching contexts completely.

Technically I could already switch to e.g. a certain context with browsingContext.activate and execute a classic command, however this scenario would only work if I run one command at time. I would like to be able to do this in parallel though.

EDIT: it seems like browsingContext.activate only works for top level contexts, I probably missing something here then as I am not sure how the context id of an iframe helps me then. While I can locate elements via browsingContext.locateNodes in those iframes, it couldn't further interact with them.

This has been discussed before in #546 and I wonder if we may know more about this type of limitation.

@OrKoN
Copy link
Contributor

OrKoN commented Apr 22, 2024

While I can locate elements via browsingContext.locateNodes in those iframes, it couldn't further interact with them.

Could you clarify this part? if you locate them using browsingContext.locateNodes you can pass results to script.* API, screenshots and other APIs of WebDriver BiDi.

@jgraham
Copy link
Member

jgraham commented Apr 22, 2024

Conceptually this doesn't really work, or at least what could work doesn't seem very interesting.

In general the stuff you can do in classic should also be possible in BiDi (possibly with some extra effort). To the extent that it isn't, that's a missing feature and we should figure out a plan to fix the gap. However in the meantime one can use both classic and BiDi in the same session, with the limitation that one has to talk to classic using HTTP.

We could make it possible to send classic commands over websockets, but it would come with all the same limitations of classic: one context at a time, one command at a time.

That's because it isn't simple to just do away with the global shared state of classic. The model of having one running command at a time, talking to one context, isn't really a limitation of the wire protocol, it's a fundamental part of the specification design. It also shows up in implementations. For example in gecko a lot of the spec-level shared state (e.g. the current browsing context) looks like shared state in the code. Trying to run multiple commands in parallel would lead to buggy and unpredictable behaviour.

So I think the best one could achieve here would be a BiDi module reflecting classic commands, where commands are queued to run one at a time, and with the possibility of an implicit switch-to-window/frame to set the right browsing context before running each command. That's not nothing, but it also doesn't obviously seem worth prioritising over expanding the feature set of BiDi.

@christian-bromann
Copy link
Member Author

Could you clarify this part?

@OrKoN that is correct, WebDriver Bidi specific APIs work.

@jgraham thanks for the clarification. I understand from this that it would be possible if every WebDriver Classic command would be implemented again via Bidi with a multi-context model in mind. But since browser have implemented WebDriver classic with a global shared state, it can't just be adopted to that.

While on the one side it would be nice to provide this flexibility , I think iframes are more an exception these days and scenarios where one would like to do parallel operations on multiple browsing context are rare, if existing at all. Therefor closing this. Thanks for your input.

@whimboo
Copy link
Contributor

whimboo commented Apr 29, 2024

Also please note that not all browsers implement classic directly in the browser itself or the driver. Like for Firefox we have the WebDriver protocol implementation in geckodriver, while BiDi purely runs within Firefox. That means we could even not run any classic command via a WebSocket connection because Marionette uses a custom protocol.

@whimboo whimboo closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024
@martinpitt
Copy link

martinpitt commented Jul 22, 2024

In general the stuff you can do in classic should also be possible in BiDi (possibly with some extra effort).

Yes, agreed. I just started learning/using BiDi, and currently these seem to be two completely isolated worlds. BiDi can currently locate elements (browsingContext.locateNodes), but there's no API to actually do something with them, such as retrieving their text value, clicking them, etc. -- this still requires a classic webdriver HTTP request. But I don't see a way to retrieve the POST /session's initial browser context -- the result doesn't tell you:

{'capabilities': {'acceptInsecureCerts': False, 'browserName': 'chrome-headless-shell', 'browserVersion': '126.0.6478.182', 'chrome': {'chromedriverVersion': '126.0.6478.182 (5b5d8292ddf182f8b2096fa665b473b6317906d5-refs/branch-heads/6478@{#1776})', 'userDataDir': '/tmp/.org.chromium.Chromium.WD6ulP'}, 'fedcm:accounts': True, 'goog:chromeOptions': {'debuggerAddress': 'localhost:44179'}, 'networkConnectionEnabled': False, 'pageLoadStrategy': 'normal', 'platformName': 'linux', 'proxy': {}, 'setWindowRect': True, 'strictFileInteractability': False, 'timeouts': {'implicit': 0, 'pageLoad': 300000, 'script': 30000}, 'unhandledPromptBehavior': 'dismiss and notify', 'webSocketUrl': 'ws://localhost:12345/session/8d0f6d4fd2a27a5e86d2112e82115fa4', 'webauthn:extension:credBlob': True, 'webauthn:extension:largeBlob': True, 'webauthn:extension:minPinLength': True, 'webauthn:extension:prf': True, 'webauthn:virtualAuthenticators': True}, 'sessionId': '8d0f6d4fd2a27a5e86d2112e82115fa4'}

So in BiDi you pretty much have to use browsingContext.create before you can do anything interesting, and then this runs in a different context than classic webdriver -- so the element IDs retrieved from e.g. browsingContext.locateNodes can't be used for e.g. POST .../session/ID/elements. This in turn means that without BiDi methods for click, text etc. , locateNodes or browsingContext.navigate don't seem very useful, unless you only want to execute JS in them.

To the extent that it isn't, that's a missing feature and we should figure out a plan to fix the gap.

That's nice to hear! It would indeed be great to be able to send only BiDi commands over the websocket, and do away with the parallel "classic webdriver" HTTP requests, given how easily they go out of sync.

However in the meantime one can use both classic and BiDi in the same session

I'd really appreciate a hint how to do that -- i.e. how to retrieve the current browsing context from the webdriver session, or some magic value of browsingContext.context to say "current webdriver session"?

Thanks!

@martinpitt
Copy link

how to retrieve the current browsing context from the webdriver session

Ah sorry, found it: script.getRealms() returns the current context.

@jgraham
Copy link
Member

jgraham commented Jul 22, 2024

Yes, agreed. I just started learning/using BiDi, and currently these seem to be two completely isolated worlds. BiDi can currently locate elements (browsingContext.locateNodes), but there's no API to actually do something with them

That isn't true, however the APIs in BiDi (intentionally) tend to be lower-level than classic. For example to get the text of an element you can execute script to read its innerText, and to click on an element you can use the input.performActions command.

I'd really appreciate a hint how to do that -- i.e. how to retrieve the current browsing context from the webdriver session, or some magic value of browsingContext.context to say "current webdriver session"?

I think what you're trying to do is more like run https://w3c.github.io/webdriver/#get-window-handle which will give the window handle of the current top level browsing context in classic. This is expected to also function as a context in BiDi.

In any case this issue is closed, so I don't suggest using it for support questions. Please use https://matrix.to/#/#webdriver:mozilla.org if you have further questions.

@martinpitt
Copy link

Ooh, thanks for the input.performActions pointer!

(Also, I got the parallel http webdriver + bidi working -- it's just inelegant and slower)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants